Run Pelican Server with Docker Image
This document explains how to run a Pelican server using Pelican docker image. If you want to use Pelican client functionalities, such as to download or upload an object, refer to the install page to download and install a Pelican binary instead.
Before starting
Pelican builds separate images for each Pelican server components, e.g. origin, cache, director, and registry. Depending on which Pelican server you want to run, you need to select a different Docker image from the list below.
- Pelican origin server:
hub.opensciencegrid.org/pelican_platform/origin:latest
- Pelican cache server:
hub.opensciencegrid.org/pelican_platform/cache:latest
- Pelican director server:
hub.opensciencegrid.org/pelican_platform/director:latest
- Pelican registry server:
hub.opensciencegrid.org/pelican_platform/registry:latest
Note: The
latest
tag will pull the Pelican server image with the latest released version. You may pull an image with explicit Pelican version by passing the version as the tag (e.g.v7.6.4
) instead. For a list of available tags, refer to Pelican Harbor repository (opens in a new tab).
Run Pelican server via Docker CLI
This section describes how to run various Pelican server images using the Docker CLI. If you haven't installed Docker engine, follow the documentation from Docker (opens in a new tab) to install and start the Docker engine first.
Run Pelican origin server
To run the latest pelican origin server, run the following command:
docker run -it -p 8444:8444 -p 8443:8443 -v /path/to/your/data/:/tmp/pelican --name=pelican-origin hub.opensciencegrid.org/pelican_platform/origin:latest serve -v /tmp/pelican:/foo/bar -f <federation>
Where:
docker run
is a Docker CLI command that runs a new container from an image-it
(--interactive --tty
) runs the container in interactive mode and uses a tty terminal-p <host-port>:<container-port>
(--publish
) publishes a container’s port(s) to the host, allowing you to reach the container’s port via a host port. In this case, we can reach the container’s port8444
via the host’s port8444
. Note that the admin website of Pelican servers run on port8444
by default, and the objects transfer endpoint of the Pelican origin server runs on port8443
by default.-v <host-location>:<container-location>
(--volume
) binds mount a volume from the host location(s) to the container's location(s). This allow you to share files in your host machine to the container. In this case, we bind/path/to/your/data/
on your host machine to/tmp/pelican
in the container. You need to replace/path/to/your/data/
to the directory where your data to publish is located.--name
assigns a logical name to the container (e.g. pelican-origin). This allows you to refer to the container by name instead of by ID.hub.opensciencegrid.org/pelican_platform/origin:latest
is the image to runserve
is the command to run the Pelican binary-v /tmp/pelican:/foo/bar
is the Pelican argument to bind/tmp/pelican
directory in the container as namespace/foo/bar
in Pelican. You need to change/foo/bar
to a meaningful path that can represent your data, e.g./chtc/public-data
. You may pass additional arguments to Pelican server by appending them after this argument.-f <federation>
sets the federation discovery URL, where<federation>
is the URL to the federation the origin will be joining in. For instructions on how to find a federation to join, refer to Serve a Pelican Origin
Run Pelican cache server
To run the latest pelican cache server, run the following command:
docker run -it -p 8444:8444 -p 8442:8442 --name=pelican-cache hub.opensciencegrid.org/pelican_platform/cache:latest serve -f <federation>
Where most of the command overlaps with the one to run the Pelican origin server, with the following differences:
-p 8444:8444 -p 8442:8442
publishes port8444
and8442
from the container to the same port on the host machine. Note that the admin website of the Pelican server runs on port8444
by default, and the objects transfer endpoint of the Pelican cache server runs on port8442
by default.
Note: The Pelican Docker image currently does not support binding a directory on your host machine as the directory for a Pelican cache to store cached objects, e.g. using
-v /host/machine:/run/pelican/cache/location
. This feature is a work in progress and will be available in a future Pelican version.
Run Pelican director server
Note: To successfully run a Pelican director server, additional configuration is required. Follow Serve a Pelican Director for instructions. For how to pass configurations to Docker container, refer to the next section.
To run the latest pelican director server, run the following command:
docker run -it -p 8444:8444 --name=pelican-director hub.opensciencegrid.org/pelican_platform/director:latest serve -f <federation>
Run Pelican registry server
Note: To successfully run a Pelican registry server, additional configuration is required. Follow Serve a Pelican Registry for instructions. For how to pass configurations to Docker container, refer to the next section.
To run the latest pelican registry server, run the following command:
docker run -it -p 8444:8444 --name=pelican-registry hub.opensciencegrid.org/pelican_platform/registry:latest serve -f <federation>
Stop Pelican container
To stop the Pelican container, run the following command:
# The `docker ps` command shows the processes running in Docker
docker ps
# This will display a list of containers that looks like the following:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0be1a304b5d7 hub.opensciencegrid.org/pelican_platform/director:latest "/bin/sh" 1 hour ago Up 1 hour 0.0.0.0:8444->8444/tcp pelican-director
# To stop the pelican container run the command
# docker stop <container-ID> or use
# docker stop <container-name>, which is `pelican-director` as previously defined
docker stop pelican-director
Configure Pelican server in a container
There are two ways to configure a Pelican server running in a container. One is through the environment variables, the other is by passing a pelican.yaml
configuration file to the container. This section includes instructions for both.
Configure via environment variables
Most Pelican configuration parameters described in parameters page can be passed to a Pelican server as environment variables. For a configuration parameter, the corresponding environment variable is PELICAN_<PARAMETER_NAME>
, where PELICAN
is the prefix, and <PARAMETER_NAME>
is the name of the parameter, with dots .
replaced by underscores _
. For example, a Pelican configuration named Logging.Level
has a corresponding environment variable named PELICAN_LOGGING_LEVEL
.
For admins that use Pelican images built for OSDF (with
osdf
prefix to the image tag, such asosdf-origin
). The environment variables are prefixed byOSDF
instead ofPELICAN
. The example above should then beOSDF_LOGGING_LEVEL
.
To pass environment variables to the Docker container, append -e
flag to your docker run
command. For example, if you want to pass PELICAN_LOGGING_LEVEL
to the container, run:
docker run -it -e PELICAN_LOGGING_LEVEL=debug -p 8444:8444 -p 8443:8443 -v /path/to/your/data/:/tmp/pelican --name=pelican-origin hub.opensciencegrid.org/pelican_platform/origin:latest serve -v /tmp/pelican:/foo/bar -f <federation>
There are other ways to pass environment variables to a Docker container, see details in Docker documentation (opens in a new tab).
Some of Pelican configuration parameters with the
object
type can not be passed as environment variables, such as GeoIPOverrides and Origin.Exports. For these parameters, use a configuration file instead.
Configure via the configuration file
Pelican servers can also be configured via a configuration file in yaml
. The default location that Pelican looks for a configuration file is /etc/pelican/pelican.yaml
if Pelican runs as the root user (which is the case when the Pelican server is running in a container). If the Pelican server runs as a non-root user, the default location is ~/.config/pelican/pelican.yaml
.
It is recommended that admins bind a Pelican configuration file on the host machine to the container running the Pelican server. Follow the steps below for instructions.
-
Prepare a
yaml
file on the host machine:touch pelican.yaml
-
Modify and save the
pelican.yaml
file on host machine with the following content:pelican.yamlLogging: Level: debug
-
Bind the configuration file on the host machine to the container when running the Pelican server image:
docker run -it -p 8444:8444 -p 8443:8443 -v /path/to/your/data/:/tmp/pelican -v $PWD/pelican.yaml:/etc/pelican/pelican.yaml --name=pelican-origin hub.opensciencegrid.org/pelican_platform/origin:latest serve -v /tmp/pelican:/foo/bar -f <federation>
where
-v $PWD/pelican.yaml:/etc/pelican/pelican.yaml
is the flag to bind thepelican.yaml
file under current working directory$PWD
to/etc/pelican/pelican.yaml
directory inside the container
-
You should note that your running Pelican origin server is logging debug level messages.
Additional Configurations for Running Pelican in a Container
Server hostname and port
Pelican server has a built-in web server to handle various API requests. By default, it uses server hostname and a port number as the server address (e.g. https://127.0.0.1:8444
). In a container environment, the hostname is an alphanumeric value, e.g. 6ee28c5df997
, and the corresponding Pelican server address is https://6ee28c5df997:8444
. This value is for container internal communications and is not accessible from outside of the container. In local deployment, to access the container ports from the host machine, one can publish a port from inside the container to the host machine.
docker run -p 8444:8444 origin
However, Pelican server still recognize 6ee28c5df997
as its hostname, causing a mismatch.
In practice, there is usually a reverse-proxy service (such as Nginx, Traefik, etc.), to direct traffic from a human-readable domain name (e.g. https://example-origin.org (opens in a new tab)) to your container (e.g. 6ee28c5df997:8444
). If Pelican server doesn't catch this information and uses it as its server address, there will be a mismatch when the server advertises itself to other Pelican services.
Therefore, to keep things consistent, Pelican internally should use the same human-readable domain name as its web server's external address. To do so, configure Server.ExternalWebURL
to the domain name the outside world uses to access the Pelican web service, e.g. https://example-origin.org:8444 (opens in a new tab) for a production service or https://localhost:8444 (opens in a new tab) for a local deployment.
Notice that there is a port number, 8444
, following the hostname in the URLs above. This is the default port for Pelican web server. When publishing ports from the container, you may choose a different port on the host machine to publish to. Two conventional options are 443
or 8080
, which are default ports for a web service. If you publish 8444
port from the container to 443
, or 8080
on the host machine, i.e. docker run -p 8444:443
, you may remove the port number from Server.ExternalWebURL
, i.e. https://example-origin.org
. For other ports, you need to leave the port number in the URL.
For more information about how Docker networking works, refer to Docker Networking Documentation (opens in a new tab).
Troubleshooting
Error running Pelican Docker image on macOS with a M-series chip
When running Pelican Docker image on macOS with a M-series chip, you may get the following error message:
docker: no matching manifest for linux/arm64/v8 in the manifest list entries.
See 'docker run --help'
This is because Pelican Docker images are built for X86_64
architecture only, and your machine is in ARM64
. To fix the issue, run the image with --platform linux/amd64
instead:
docker run --platform linux/amd64 -it -p 8444:8444 --name=pelican-director hub.opensciencegrid.org/pelican_platform/director:latest serve -f <federation>
Note: Although you may run Pelican server containers on ARM64 machines, it is not recommended for production use due to performance limitations.