Posts about Web Development, Java, Magnolia CMS and beyond

6/19/2026

Vibe Coding Magnolia: Creating a Page and Content App with Claude

6/19/2026 Posted by Edwin Guilbert , , , , , No comments
Everyone is talking about vibe coding these days — that loose process where you describe what you want and let an AI write most of the code while you steer. For backend or React devs, plenty of demos exist. But what does it look like when you are a Magnolia frontend developer whose day involves YAML definitions, FreeMarker scripts, and JCR properties?

In this post I want to share a recent experiment: building an FAQ page template and a matching content app in Magnolia 6.4 using Claude code. The output is a working module sitting next to travel-demo. The interesting part is how we got there — three techniques that turned a vague idea into a clean, testable light module without me typing a single line of YAML by hand: spec-driven development, loop engineering, and browser-driven validation with Playwright.


The big picture

The naive approach to AI-assisted coding is to type "build me an FAQ page" and pray. It almost works. But "almost" hides a lot of broken YAML, hallucinated field types, and template scripts that compile in your head but explode at render time. The trick is to give the AI more structure, not less.

The three moving parts that made this experiment work were:

The Superpowers plugin — a Claude Code plugin that wires up a brainstorming → spec → plan → execute workflow. Instead of jumping into code, you negotiate the design first, then a written plan, then bite-sized tasks.
Loop engineering — short, named iterations where each step has a single goal and you review the result before moving on. The opposite of "go off for an hour and bring me back 500 lines".
Playwright as the verifier — Claude opens a real browser, logs into Magnolia, clicks through the Pages app, and reads back the rendered HTML to confirm the goal. Not unit tests — actual end-to-end checks against the running instance.

Spec-driven with Superpowers

The session started not with a prompt but with /superpowers:brainstorming. The skill walks the user through clarifying questions — one at a time, never a wall of text. Should the FAQ items be subnodes of the page, or a first-class content type? How should categories work? Accordion or expanded by default? Each answer narrowed the design.

Once we agreed, Claude wrote a spec to docs/superpowers/specs/2026-06-18-faq-page-template-design.md and waited for review. Only then did it produce an implementation plan with 12 tasks (light module module, define content type, register app, add page dialog, write FreeMarker, etc.). The plan was committed and reviewed before any code happened.

Why this matters for Magnolia? 

Mangolia's stack is configuration-heavy. A wrong field type in contentTypes yaml doesn't fail until the app loads. A wrong template path doesn't fail until you create a page. Putting the design in writing before the YAML eliminates a whole class of round-trips.

Loop engineering: small steps, big payoff

With the plan in hand, execution used Claude's subagent-driven development: each task is dispatched to a fresh agent that only sees what it needs. light module → review → commit. Define content type → review → commit. Add app → review. The main session never bloats with implementation noise, and a bad step is contained to one task.

In parallel, I was iterating on the surrounding skills. The repo has a .claude/skills/ folder with three light-module-specific skills: magnolia-content-apps, magnolia-dialogs, and magnolia-freemarker. Every time Claude got a convention wrong (legacy 5.x app structure, wrong i18n key shape, square vs. angle brackets in FTL), I updated the skill. The next iteration got it right, and so will the next session, and the one after that.

This is the loop: small change → run → observe → adjust the rule, not the code. Skills are how you teach the AI your project's actual conventions, not the ones it absorbed from the public internet.




Playwright: the AI's own QA

YAML and FTL fail in interesting ways the moment Magnolia reads them. This is when Playwright skill in Claude can come handy, let it drive the URL of Magnolia instance and it will navigate to perform any kind of test or validation you would normally do yourself, like creating content, rendering a page, etc.

A single test session — log in as superuser, open the new FAQs app, create a page under /travel with the FAQ template, fetch the rendered HTML — surfaced four real bugs in minutes:

• A ?then(...) ternary in the FreeMarker template that threw NonBooleanException at render time.
[@cms.page /] emitted literally in the response because the file used angle-bracket tag syntax instead of square brackets.
• The page properties dialog showing faq.pages.faqPage.label instead of a translated title — a missing i18n key.
• Rich-text content rendering as escaped HTML (<p>) because the template didn't call cmsfn.decode(content).

Every one of those was fixed inside the same session, with Claude reading its own Magnolia error log, finding the offending line, editing it, and re-rendering. No human-in-the-loop except to nod.

Tips if you want to try this

Write the skills first. A 50-line markdown file with your conventions ("always use cmsfn.decode for rich text", "node type must match the content type name", "i18n key pattern is <module>.<dialog-path>.<property>.label") saves you ten iterations later. Skills are project-local and version with the repo. The skills are use-case focused context you can reuse on any related project and are triggered automatically as related tasks are requested. The description of the skill is key for this autonomous match to work.
Don't ask for "the whole thing". Ask for the design first, then the plan, then one task. You will catch mistakes earlier and waste fewer tokens.
Keep Magnolia running. Hot reload + a Playwright-equipped Claude is a magical pairing — the AI can verify its own work end-to-end without you switching contexts.
Read the error log first. When something doesn't render, point Claude at logs/magnolia-error.log. It will diagnose faster than you can scroll.
Don't merge what you didn't read. Vibe coding is not no-review coding. Every commit went through a quick diff review — that's how the second-pass refactor stayed cheap.

Quick note about Worktrees and Superpowers

A git worktree is a second working copy of your repo, on a different branch, living in a different folder, but sharing the same .git history.

What Superpowers does with them?

The using-git-worktrees skill activates right after the design is approved, before any code gets written. The flow is:

  1. Brainstorming finishes → you have an approved design doc.
  2. Superpowers creates a new branch for the work.
  3. It creates a worktree for that branch in a sibling directory.
  4. All implementation happens in that worktree.
  5. When done, the finishing-a-development-branch skill offers to merge back, open a PR, or discard.
Your main checkout never gets touched. If the whole experiment goes sideways, you delete the worktree folder and it's like it never happened. A loop is only as good as its ability to recover from a bad step. Worktrees give you cheap recovery. Cheap recovery enables ambitious loops.

What are the benefits for a dev?

The honest answer is boilerplate disappears. Defining a content type, a content app, the admincentral decoration, two i18n bundles, a page dialog, a template definition, and the FTL — that's eight or nine files of mostly-mechanical YAML. Claude handles that in minutes, and once your skills are good, it handles it correctly.

What it does not do is replace your judgement. Picking the right boundary between page properties and content type properties, deciding whether categories deserve their own content type, understanding why anonymous read access on a workspace matters — that is still you. But you get to spend your time on those decisions instead of hunting a missing comma in apps/faqs.yaml.

Wrapping up

If you build templates and content apps in Magnolia for a living, give this workflow a serious try. Install Claude Code, pull the Superpowers plugin, write two or three project-specific skills based on your team's conventions, and brainstorm your next light module with the AI instead of starting from a blank YAML file. The first hour feels weird. The second hour you stop wanting to go back.

Happy vibe coding — This post was mainly AI-generated from a Claude session and human augmented by me ;)

6/11/2026

Content Delivery Network with Fastly on Magnolia

6/11/2026 Posted by Edwin Guilbert , , , , , No comments


If you are running Magnolia in production, sooner or later you will need a Content Delivery Network sitting in front of your public instances. A CDN takes care of caching static assets close to your end users, serving dynamic content from Magnolia public instances and, on top of that, protecting your site with a Web Application Firewall. In this post we will go through how a CDN like Fastly is set up in any Magnolia setup and the best practices around caching.

The big picture

Before going into the details, let's quickly review what sits between the end user and Magnolia:

End users hit the CDN, which either serves a cached response or forwards the request to the load balancer. The load balancer then routes traffic to the Magnolia Author and Magnolia Public instances. The CDN provided by Magnolia DX Cloud is Fastly but you can also bring your own if you already have one in place. In the case of Fastly, it also comes with WAF protection, but lets focus on the CDN for this post.

The goals are simple:

• Cache static content for faster delivery
• Serve dynamic content from Magnolia public instances
• Protect everything behind a WAF

CDN set-up

Setting up the CDN for a project is mostly a matter of wiring a few things together:

• A Fastly service is created together with a WAF site (optional).
• A certificate needs to be uploaded to the Fastly.
• The domain (or domains) needs to point to the Fastly servers.

But lets focus on the Magnolia side of things, what to take into consideration to get most out of the CDN.

Magnolia best practices

There is one thing to keep in mind when working with a CDN in front of Magnolia: caching happens in more than one place. Magnolia has its own backend cache, the browser has its cache, and the CDN has yet another one. The trick is to make them work together and not against each other.

A few rules of thumb we follow:

• Aim for a hit ratio and CDN coverage over 70%. Anything below that and your CDN is more decoration than acceleration.
• Define caching rules for images, REST endpoints and anything else that does not change very often.
• Use cache-control for both the browser and the CDN, and surrogate-control when you want a rule that only applies to the CDN.
• The default rule cache-control: private invalidates surrogate-control, which is something easy to forget when troubleshooting.
• Pages are not cached by default. If you do want to cache them, remove the dontCachePages rule.
• Every URL with different parameters creates a different entry in the CDN cache, so be mindful of query strings.
• Personalization (P18n) creates different cache entries per variation via the Vary header.

Cache config in Magnolia

In config:/modules/cache/config/contentCaching/defaultPageCache you can find cachePolicy and browserPolicy, first one is Magnolia internal caching at server level using Ehcache (not covered in this post) and second one is the one setting cache-control headers. 

Lets see an example:


This would set cache-control: max-age=3600, public header for js/css files under /.resources path.

If you want to set cachesurrogate-control headers you would need to add an AddHeaders filter.

More best practices

Once the basics are in place, there are a few extra goodies you can take advantage of:

• A static error page can be configured for 404, 503 or really any status you want.
Rate limiting and any other custom CDN rule can be added on top.
Segmented caching for videos can be activated when needed.

Watch out for cookies: any response that sets a cookie will not be cached. The usual suspects in Magnolia are:

• The country trait setting JSESSIONID
• The csrfTokenSecurity filter setting a csrf cookie
• The ingress sticky sessions
If you see your pages stubbornly missing the cache, this is the first place to look.

HTTP statuses cached by Fastly

Not every HTTP status is cacheable. Fastly will cache the following:

• 200 OK
• 203 Non-Authoritative Information
• 300 Multiple Choices
• 301 Moved Permanently
• 302 Moved Temporarily
• 404 Not Found
• 410 Gone

Everything else will go through to the origin.

REST caching

REST endpoints can be cached too, and in most projects they should be: the typical "give me the latest news" or "list of products" endpoint is a great candidate for a CDN cache. Apply the same headers as you would for any other resource and remember that, again, any Set-Cookie in the response will bypass the cache.

Debugging with Fastly

When things do not behave as expected, Fastly gives you a few tools to figure out what is going on.

Fastly cache status

Every cached request comes with a status in x-cache header that tells you what happened:

HIT → the response was served from the cache.
MISS → the response was not in the cache yet, but it will be next time.
PASS → the response is not cacheable and will never go into the cache (usually because of a Set-Cookie or a cache-control: private).

If you expect a HIT and you keep seeing PASS, that is your hint to go and review the response headers coming out of Magnolia.

Wrapping up

A CDN in front of Magnolia is a lot more than "free speed". It is a piece of your architecture that interacts with how you build pages, how you set headers, how you handle cookies and how you secure your site. Spending a bit of time on the caching rules and keeping an eye on the logs pays off very quickly — both in performance and in your end users' experience.

Happy caching!

7/14/2020

Redirect A/B Testing with Google Optimize

7/14/2020 Posted by Edwin Guilbert , 1 comment
It has been a while since my last post on A/B testing. Since then, a lot of things have changed, specially with Google Analytics API and how Google wants us to create experiments with their platform.

To cut to the chase, Google Experiments has been shut down in favor of Google Optimize. This has direct impact on the way we were integrating A/B testing on Magnolia, since we were using Analytics Management API to programmatically create and modify test experiments and this is not possible anymore with Optimize.

But not everything is lost since we can still take advantage of Magnolia Personalization and page variants in order to manually (not automagically) create and manage redirect experiments with Google Optimize.

Let's review what ab testing is for, from my previous post:

"In ab testing you could improve your conversions by comparing different versions of the same page. The idea is to change key elements that might improve your conversion rate. You usually have a original page (so called control page) and a variation of it. The users will get any of these versions randomly and after a period of time you can compare which version did better according to a goal (or conversion)."

... and how can be implemented in Magnolia with Google Optimize and the same example as before.

We are going to work with the demo website travel, specifically with the "about" page which contains a video. The goal here would be get this video played more times, so the metric we are going to use will be the play event on this element of the dom.

The original page looks like this:


And the variation will have a wider video, removing the left side links which might distract the visitor and prevent him from playing the video (our goal in this case).


To create a variation of a page, you just have to select "Add page variant" in the pages app of Magnolia.

          
Open the variant created and update the video component:


After publishing the variant, you need to connect these pages to Google Analytics using a Javascript snippet for tracking page views. You can embed this snippet with Marketing Tags in Magnolia (don't forget to include it in all your pages):


After publishing the snippet (or marketing tag in Magnolia) you need to send an event for every time a user plays the video included in the pages we want to test, so we need to create another marketing tag for that:



<script>
$(".video-wrapper video").on('play',function(){
  ga('send','event','video','play');
});
</script>

After publishing the snippet you need to actually record the number of play video events as conversions or "goals" in Google Analytics:




After the goal is configured, you might want to test it in the real time tab of Google Analytics, so every time you play a video in your page, an event gets registered as a conversion:


Additionally in order to use Optimize, you need to install it as a Javascript Snippet, including your container ID, in a similar way to what you did with Google Analytics:



At this point you finally have all prepared to start your ab testing on the page variants measuring video played metrics as a goal with Google Optimize:

"A redirect test is a type of A/B test that allows you to test different web pages against each other. A redirect test contains different URLs for each variant. Redirect tests are useful when you want to test multiple different landing pages, or a complete redesign of a page."

So lets create a redirect experiment with the following steps:

  • Create an experience of type "redirect test". Give it a name and provide the public URL of the page you want to test, in this case: http://yourserver/travel/about.html
  • Add as variant of the test, the variant of the page in Magnolia. The trick here is to use the internal URL of the page variant, in this case: http://yourserver/travel/about/variants/variant-0


  • Add as an objective the goal previously configured in Google Analytics, in this case, everytime a video is played:

  • Finally start the experiment! (don't worry about installation checked we are handling it with the marketing tags)


And thats all! Yu can always check how your test is going in the reporting tab of the Optimize dashboard. Remember you have to wait at least 1 day in order to see results.



After you are done testing you can always pick the winner and publish it as the final page in Magnolia :)

5/31/2020

Trying out Docker Compose with Magnolia and DB containers

5/31/2020 Posted by Edwin Guilbert , , , , , 4 comments

Single container approach

As we discussed in the first post of this docker series, when you define and run containers, all the software required to run your magnolia server is already installed and configured so launching a magnolia app is just a matter of building your image once and then run it as many times as you wish with the required params.

Multi-container approach

Running containers helps you with the logistics of deploying, configuring and managing different versions of the same software stack. But when you have an app that consists of multiple containers that depend on each other, taking care of the loading order and the specific configs needed for each one could be tough and messy.

In the case of Magnolia, you need at least two different containers running, one for the author and one for the public instances. But if you also want to run Magnolia on top of a DB, then you would need to run two additional containers for each instance's DB. All those instances share credentials, networks, volumes, and have a specific loading order, i.e the DB has to be run before the web app server.

This is where docker-compose can help you to have everything declared in one place and easily reproducible whenever you need a new setup for your app.

Docker Compose


Docker compose is a separate tool that gets installed along with docker. It helps to startup multiple docker containers at the same time and automatically connect them together with networking, health-check and volume management. The main purpose of docker-compose is to function as docker CLI but allows you to issue more commands quickly.

In docker compose you have a YAML file where you define how you would like your multi-container application to be structured. This YAML file will then be used to automate the launch of the containers as defined. 

Let's create a docker-compose.yaml file step by step trying to achieve the previous post Magnolia setup of one author and one public attached to a couple of postgres DBs.

Services

In order to define the configuration and params needed to run the containers, you need to provide a services element. Let's take a look to the postgres container as the first service:

version: '3.7'

services:
mgnlauthor-postgres:
image: postgres:12
restart: unless-stopped
healthcheck:
test: pg_isready -U magnolia || exit 1
interval: 10s
timeout: 5s
retries: 5
volumes:
- "~/docker/pgdata-author:/var/lib/postgresql/data"
networks:
- mgnlnet
environment:
POSTGRES_USER: "magnolia"
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?password empty}
POSTGRES_DB: "magnolia"
PGDATA: "/var/lib/postgresql/data/pgdata"
mgnlpublic-postgres: ...

Reviewing the options provided:
  • version: the service (yaml) definition format is compatible with docker compose version 3.7.
  • mgnlauthor-postgres: the name of the container in the network. This will be used by the magnolia author instance to connect to this DB.
  • image: the name of the image to be pulled from the docker registry.
  • restart: the restart policy of the container. In this case we want docker to restart it if it gets killed somehow.
  • healthcheck: a test command to be run in order to check the container status. exit code 0 is healthy, exit code 1 unhealthy/failed. The interval, timeout and retry number of the test can be configured as sub-options. For PostgreSQL we used this tool.
  • volumes: a host mounted volume to be used inside the container.
  • networks: the network where this container is going to be registered as mgnlauthor-postgres.
  • environment: All the environment variables needed by the container to run. In this case the database credentials and PGDATA folder.
Note: the POSTGRES_PASSWORD is provided by a compose env variable, which must be provided by a .env file in the same folder as the docker-compose.yaml file.
Now let's continue with the magnolia author container:

mgnlauthor:
build:
context: ./
args:
MGNL_AUTHOR: "true"
image: ebguilbert/magnolia-cms-postgres:6.2.1-author
restart: unless-stopped
depends_on:
- mgnlauthor-postgres
volumes:
- mgnl:/opt/magnolia
networks:
- mgnlnet
ports:
- 8080:8080
environment:
DB_ADDRESS: "mgnlauthor-postgres"
DB_PORT: "5432"
DB_SCHEMA: "magnolia"
DB_USERNAME: "magnolia"
DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set!}
healthcheck:
test: curl -f http://localhost:8080/.rest/status || exit 1
interval: 1m
timeout: 10s
retries: 5
mgnlpublic: ...

The options are very similar to the postgres service, but we have a couple worth mentioning:
  • build: This option lets you build your own local image if you don't have one registered in a public docker server. You can define the context where to look the Dockerfile for and provide the build_args.
  • depends_on: This is a very important option since you can control the order in which the containers are run. In this case we want the DB to be started before the the magnolia author container.
  • volumes: A named volume, which is managed by docker compose, more on this in the next section.
  • ports:expose the host port 8080.
  • environment: Credentials as env variables. Note the password is the same compose env variable used in the postgres service.
  • healthcheck: The test command for magnolia is a rest endpoint we can invoke with curl. The interval is 1 min and the retry is 5, since the first time the rest endpoint might not be available yet and the healthcheck might need to be tested more than once (until 5).

Networks

Docker compose handles the creation and deletion of networks every-time you start up or shutdown the setup defined in the compose file.

For magnolia we only need one network:

networks:
mgnlnet:
name: mgnlnet

Volumes

Docker compose also handles the creation of named volumes and optionally the prune of volumes if needed.

We want one named volume for each magnolia container (author and public):

volumes:
mgnl:
name: mgnl
mgnlp1:
name: mgnlp1

docker-compose.yaml

The whole file including all services, networks and volumes for magnolia author and public instances would look like this:

version: '3.7'

services:
mgnlauthor-postgres:
image: postgres:12
restart: unless-stopped
healthcheck:
test: pg_isready -U magnolia || exit 1
interval: 10s
timeout: 5s
retries: 5
volumes:
- "~/docker/pgdata-author:/var/lib/postgresql/data"
networks:
- mgnlnet
environment:
POSTGRES_USER: "magnolia"
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?password empty}
POSTGRES_DB: "magnolia"
PGDATA: "/var/lib/postgresql/data/pgdata"
mgnlpublic-postgres:
image: postgres:12
restart: unless-stopped
healthcheck:
test: pg_isready -U magnolia || exit 1
interval: 10s
timeout: 5s
retries: 5
volumes:
- "~/docker/pgdata-public:/var/lib/postgresql/data"
networks:
- mgnlnet
environment:
POSTGRES_USER: "magnolia"
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?password empty}
POSTGRES_DB: "magnolia"
PGDATA: "/var/lib/postgresql/data/pgdata"
mgnlauthor:
build:
context: ./
args:
MGNL_AUTHOR: "true"
image: ebguilbert/magnolia-cms-postgres:6.2.1-author
restart: unless-stopped
depends_on:
- mgnlauthor-postgres
volumes:
- mgnl:/opt/magnolia
networks:
- mgnlnet
ports:
- 8080:8080
environment:
DB_ADDRESS: "mgnlauthor-postgres"
DB_PORT: "5432"
DB_SCHEMA: "magnolia"
DB_USERNAME: "magnolia"
DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set!}
healthcheck:
test: curl -f http://localhost:8080/.rest/status || exit 1
interval: 1m
timeout: 10s
retries: 5
mgnlpublic:
build:
context: ./
args:
MGNL_AUTHOR: "false"
image: ebguilbert/magnolia-cms-postgres:6.2.1-public
restart: unless-stopped
depends_on:
- mgnlpublic-postgres
volumes:
- mgnlp1:/opt/magnolia
networks:
- mgnlnet
ports:
- 8090:8080
environment:
DB_ADDRESS: "mgnlpublic-postgres"
DB_PORT: "5432"
DB_SCHEMA: "magnolia"
DB_USERNAME: "magnolia"
DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set!}
healthcheck:
test: curl -f http://localhost:8080/.rest/status || exit 1
interval: 1m
timeout: 10s
retries: 5

networks:
mgnlnet:
name: mgnlnet

volumes:
mgnl:
name: mgnl
mgnlp1:
name: mgnlp1

As you can see the file is self-explanatory and clearly defines what services are needed, the dependency order and what is needed by each service.

Docker compose up and down

One of the docker compose features I like the most is the possiblilty to compile and run everything at once with one single command:

docker-compose -f "docker-compose.yaml" up -d

The above is running in detached mode so the containers output is not streamed to the terminal.

If you also want to build the local images (if they weren't built before), you can add --build:

docker-compose -f "docker-compose.yaml" up -d --build

To check the status of your containers you can always use the ps command:

docker-compose ps

And finally you can shut everything down, including the removal of containers and networks with the following command:

docker-compose -f "docker-compose.yaml" down

Docker Swarm or Kubernetes

The next and final step is the orchestration of these containers, managing the auto-scaling and recovery of containers in clusters of nodes. For this, other tools like Docker Swarm or Kubernetes are needed. 

Good news is that docker compose is fully compatible with docker swarm, so just few more steps are needed. For kubernetes, the docker-compose file can't be reused "as is" but the structure and concepts will be very similar so it shouldn't need big efforts.

Edit: There's a tool called kompose which translates docker-compose files into kubernetes-compatible files ready to be used by Kubernetes :)


5/24/2020

Trying out Docker, Magnolia and Postgres

5/24/2020 Posted by Edwin Guilbert , , , , , , 2 comments

Deploying without a DB

This post is a follow-up of a previous post explaining how to deploy Magnolia CMS as a docker container using Debian slim, OpenJDK and Tomcat.

Although this is a very light weight and simple setup since you only need to worry about one container per magnolia instance, the data storage is file system based which might be fine for your public/disposable instances but its definitely not a good choice for the author instance. 

Deploying with a DB

The author acts as the master of contents where all versions created are stored. It needs a more robust storage with features like data integrity, concurrency, performance, disaster recovery and so on, just what a RDBMS can offer. 

Magnolia is officially compatible with MySQL, Oracle and PostgreSQL, so we can pick any of the official docker images they offer. For this post we will use Postgres.

Why Postgres? 


Well, it is open source, which implies many benefits so this would leave Oracle out.

Although MySQL is the most popular open source database out there, and probably the most widely used with Magnolia. Postgres on the other hand is "The World's Most Advanced Open Source Relational Database" according to their website

Since in our case with docker deployments, we want to store everything in the DB, including Magnolia's datastore, this means that we will make use of BLOBs extensively, for reading and also for storing. MySQL is historically known for its lack of performance with this kind of escenario, and even in latest versions is still something to take care of, specially when using InnoDB storage engine and its buffer pool. So, in short, for performance reasons, we'll pick Postgres.

Lets take a look on how to run the official postgres docker image:
docker run --rm -d \
--name mgnlauthor-postgres \
--network mgnlnet \
-e POSTGRES_USER=magnolia \
-e POSTGRES_PASSWORD=mysecretpassword \
-e POSTGRES_DB=magnolia \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v /Users/ebguilbert/docker/pgdata-author:/var/lib/postgresql/data postgres:12
Reviewing the options provided:
  • Network mgnlnet is going to be used by magnolia containers (mgnlauthor and mgnlpublic) where they can contact the database as mgnlauthor-postgres.
  • Database credentials are provided as environment variables. These credentials are going to be used later on by the magnolia containers.
  • The PGDATA folder is linked to a local folder on the host. This is required to preserve the data stored by the DB even after the container is stopped o recreated.
Since we are going to have at least two instances of Magnolia running, lets start a second Postgres container for the public instance, changing the name, credentials and data local volume:

docker run --rm -d \
    --name mgnlpublic-postgres \
    --network mgnlnet \
    -e POSTGRES_USER=magnolia \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -e POSTGRES_DB=magnolia \
    -e PGDATA=/var/lib/postgresql/data/pgdata \
    -v /Users/ebguilbert/docker/pgdata-public:/var/lib/postgresql/data postgres:12

Notice we are using the same network created in the previous blog post. If you haven't created it yet, you need to do it before running the image:

docker network create --subnet=192.168.42.0/24 mgnlnet


Magnolia-Postgres Image

In order to have Magnolia configured with a database we will need to enhance the Dockerfile we used in the previous post, with the postgres JDBC lib and DB connection params for Tomcat to use. Also we will need to copy our own war file containing custom DB-Magnolia config (which we'll cover in detail in the next section):

# Tomcat debian-slim image (any official image would do)
FROM ebguilbert/tomcat-slim:9

LABEL maintainer="Edwin Guilbert"

# ENV variables for Magnolia
ENV MGNL_VERSION 6.2.1
ENV MGNL_APP_DIR /opt/magnolia
ENV MGNL_REPOSITORIES_DIR ${MGNL_APP_DIR}/repositories
ENV MGNL_LOGS_DIR ${MGNL_APP_DIR}/logs
ENV MGNL_RESOURCES_DIR ${MGNL_APP_DIR}/light-modules
ENV JDBC_VERSION=postgresql-42.2.12

# ARGS
ARG MGNL_AUTHOR=true
ARG MGNL_WAR_PATH=docker-bundle/docker-bundle-webapp/target/docker-bundle-webapp-6.2.1.war
ARG MGNL_HEAP=2048M
ARG MGNL_ENV=tomcat/setenv.sh
ARG JDBC_URL=https://jdbc.postgresql.org/download

# JVM PARAMS
ENV CATALINA_OPTS -Xms64M -Xmx${MGNL_HEAP} -Djava.awt.headless=true \
-Dmagnolia.bootstrap.authorInstance=${MGNL_AUTHOR} \
-Dmagnolia.repositories.home=${MGNL_REPOSITORIES_DIR} \
-Dmagnolia.author.key.location=${MGNL_APP_DIR}/magnolia-activation-keypair.properties \
-Dmagnolia.logs.dir=${MGNL_LOGS_DIR} \
-Dmagnolia.resources.dir=${MGNL_RESOURCES_DIR} \
-Dmagnolia.update.auto=true

# VOLUME for Magnolia
VOLUME [ "${MGNL_APP_DIR}" ]

# JDBC lib
RUN wget -q ${JDBC_URL}/${JDBC_VERSION}.jar -O $CATALINA_HOME/lib/${JDBC_VERSION}.jar

# Database runtime config
# - DB_ADDRESS
# - DB_PORT
# - DB_SCHEMA
# - DB_USERNAME
# - DB_PASSWORD
COPY ${MGNL_ENV} $CATALINA_HOME/bin/setenv.sh

# MGNL war
COPY ${MGNL_WAR_PATH} ${DEPLOYMENT_DIR}/ROOT.war

The dockerfile is self-explanatory but its worth pointing out the differences from the original version:
  • JDBC_VERSION is an environment variable containing the name of the jar file to be added to Tomcat libs.
  • JDBC_URL is an argument variable containing the URL used to download the JDBC jar from.
  • MGNL_WAR_PATH is an argument variable containing the path of the custom war to be copied/deployed to Tomcat. Note this variable replaces the MGNL_WAR in the original version.
  • MGNL_ENV is an argument variable containing the path of a setenv.sh file which is going to configure database credentials as env variables in Tomcat.
  • The last three lines downloads the JDBC jar, copies the setenv.sh and war file into Tomcat.
Lets build the new image for author and public instances from the Dockerfile:

docker build -t ebguilbert/magnolia-cms-postgres:6.2.1-author --build-arg MGNL_AUTHOR=true .

docker build -t ebguilbert/magnolia-cms-postgres:6.2.1-public --build-arg MGNL_AUTHOR=false .

Note: If you checkout the project from git and try to compile with src folders present, it will take a long time since docker will "load" all the subfolders even though just the compiled war file is needed. So, it is strongly recommended to delete the src and target folders and only preserve the compiled war file before building the image.

Magnolia Persistent Manager and Datastore

Magnolia uses JCR to store all its contents through a Persistent Manager handling the persistent storage of content nodes and properties, with the exception of large binary values which are handled by a Datastore.

Why should I care?

Since we are using PostgreSQL to store the contents, we need to configure a postgres persistent manager for Magnolia, providing the credentials needed to connect to the database and configuring a datastore to store the large binary values.

It's important to notice that we want to store everything in the DB, so this will include all the components of the persistent manager like the datastore and the filesystem for versions and cache.

Why everything in the DB?

Since we are using Docker the idea is to have self-contained containers, so the storage should be handled by the postgres container and the server/app should be handled by the magnolia container, so they could be replaced and moved freely.

This persistent manager is configured by an xml file (jackrabbit-bundle-postgres-search.xml) which is loaded by Magnolia usually from the folder "WEB-INF/config/repo-conf/"

Let's take a look at the relevant sections of the file:

Datasource

<DataSources>
<DataSource name="magnolia">
<param name="driver" value="org.postgresql.Driver" />
<param name="url" value="jdbc:postgresql://${db.address}:${db.port}/${db.schema}" />
<param name="user" value="${db.username}" />
<param name="password" value="${db.password}" />
<param name="databaseType" value="postgresql"/>
</DataSource>
</DataSources>

This is where the DB credentials are set. All the needed params are configured by environment variables  that we will pass as arguments to the running Magnolia container (we'll see how in the next section). This is why a setenv.sh file is needed.

Filesystem

<FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="dataSourceName" value="magnolia"/>
<param name="schemaObjectPrefix" value="fs_"/>
</FileSystem>

This is an interface that acts as a file system abstraction for storing the global repository state. Since we want to store everything in the DB, we are using a db-filesystem.

This db-filesystem configuration is also used for the workspace filesystem and the versioning filesystem for things like search indexes and versions.

Datastore

<DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
<param name="dataSourceName" value="magnolia"/>
<param name="schemaObjectPrefix" value="ds_"/>
</DataStore>

Normally all node and property data is stored in a persistence manager, but for large binaries the datastore is used. This is usually stored in the local-server storage but since we want to store everything in the DB, we are using a db-datastore.

SearchIndex

This section was added recently due to a confusion with the idea of everything in the DB. The search index is actually stored in the filesystem (the main idea of an index is to prevent querying the DB). This means JCR still stores files in the instance filesystem:

# ls ${MGNL_REPOSITORIES_DIR}/magnolia/workspaces/website/
index workspace.xml

As you can see only the index entries and index configuration file (per workspace) are stored in the filesystem. 

Magnolia-Postgres Container

Based on the image we just built with the postgres persistent manager configured, let's run the image in the same network as the database (postgres container):

docker run --rm -d -p 8080:8080/tcp --mount source=mgnlauthor,target=/opt/magnolia \
--network mgnlnet --name mgnlauthor \
-e DB_ADDRESS=mgnlauthor-postgres \
-e DB_PORT=5432 \
-e DB_SCHEMA=magnolia \
-e DB_USERNAME=magnolia \
-e DB_PASSWORD=mysecretpassword \
ebguilbert/magnolia-cms-postgres:6.2.1-author

Looking at the params provided we can see the database credentials being configured dynamically at running time.

Let's run a public container with the credentials of the mgnlpostgres-public db container:

docker run --rm -d -p 8090:8080/tcp --mount source=mgnlpublic,target=/opt/magnolia \
--network mgnlnet --name mgnlpublic \
-e DB_ADDRESS=mgnlpublic-postgres \
-e DB_PORT=5432 \
-e DB_SCHEMA=magnolia \
-e DB_USERNAME=magnolia \
-e DB_PASSWORD=mysecretpassword \
ebguilbert/magnolia-cms-postgres:6.2.1-public

Note: We have used the same network and volumes created in the previous blog post. If you haven't created the volumes yet, you need to do it before running the images:
 
docker volume create mgnlauthor

docker volume create mgnlpublic


Multi-container docker Magnolia app

As you can see the whole setup for Magnolia in docker with a DB involves many configurations and quite some containers for the different databases and webserver apps. There are things like db password secrecy and container health-check (to relaunch public instances automatically) that could be automatically managed by docker tools like Docker Compose. But these improvements will be covered by the following post

As a general note, all the files involved in this post, including the source project for the custom Magnolia war configured for PostgreSQL are available in this git project.