Getting Data with Pelican
Pelican Client

Getting Data Using Pelican Client

It is recommended that you start with our Quick Start Guide where you can learn the basics of Pelican Client commands and learn some useful terminology.

Before Starting

Assumptions

Before using the Pelican client to interact with objects from your federation, this guide makes several assumptions:

  • You are on a computer where you have access to a terminal. The Pelican client is a command line tool.

  • You've already installed the version of Pelican appropriate for your system, and Pelican is accessible via your path. To test this on Linux, you can run

    which pelican

    which should output a path to the executable. If there is no output to this command, refer to the Pelican installation docs to acquire a working installation.

  • You are roughly familiar with Pelican client commands and terminology as demonstrated in the Quick Start Guide.

Note on Federations

All object paths in a federation begin with a preceding /, and no relative paths are allowed. Here is the example URL we used in the Quick Start Guide:

/ospool/uc-shared/public/OSG-Staff/validation/test.txt

This is the full object path that we will be using in examples below.

Tokens and JWT

Some namespace prefixes are public, like /ospool/uc-shared/public, while others are protected (i.e. they require authorization). Objects in public namespaces can be downloaded by anybody, but downloading objects from protected namespaces requires you prove to the origin supporting that namespace that you are allowed to access the object. In Pelican, this is done using signed JSON Web Tokens, or JWTs for short. In many cases, these tokens can be generated automatically.

The Different Pelican URL Schemes

When running client commands with a request URL, there are two different URL schemes a user can use: pelican and osdf. This page uses examples with the pelican:// URL scheme. Below is a description of the two URL schemes and what to expect when they are used:

The pelican:// URL Scheme

URLs that begin with pelican:// allow you to interact with objects from any federation simply by knowing the hostname of the federation's discovery URL and the name of that object in that federation. These URLs take the form:

pelican://<federation discovery URL>/<path/to/object>

When a Pelican client encounters a Pelican URL, it uses the hostname in the federation discovery portion of the URL to find other services in the specified federation that will help the client discover the object's location

Note: Federation discovery URLs can also be specified with the -f flag rather than the hostname. Just be sure not to include the URL in both the -f flag and in the hostname of the object path, as this will lead to errors.

Note: Federation discovery URLs cannot contain a path component (e.g. https://my-federation.com/path/component (opens in a new tab)). Clients that encounter a discovery URL with a path component will incorrectly assume the path is part of the namespace prefix.

The osdf:/// URL Scheme:

When using the osdf:/// scheme, Pelican will load in the federation metadata automatically using the defaults for accessing the Open Science Data Federation (OSDF). This scheme should only be used if your federation or data can be accessed through the OSDF. When using this scheme, you should not specify the federation URL at all, as Pelican will handle this automatically. For example:

pelican object get osdf:///<namespace-prefix></path/to/file> <local/path/to/file>

This scheme has three slashes (///) after the osdf because the hostname is left empty to be automatically populated therefore, just start the URL with the namespace prefix. Pelican currently recognizes the osdf:// scheme (with two slashes) in case the user forgets to pass the third slash, but this is not recommended.

Get a Public Object from your Federation

To use the pelican client to get public objects from a federation, use Pelican's object get sub-command

Getting an Object Using the pelican:// URL Scheme:

To use the pelican:// URL scheme, you need to specify the federation URL within the request URL. To do this, you would format the object get command like so:

pelican object get pelican://<federation-url></namespace-prefix></path/to/file> <local/path/to/file>

You can try this yourself by getting the public file that was mentioned earlier from the OSDF. Using the object get sub command, and providing the federation URL for the OSDF:

pelican object get pelican://osg-htc.org/ospool/uc-shared/public/OSG-Staff/validation/test.txt downloaded-testfile.txt

This command will download the object /OSG-Staff/validation/test.txt from the OSDF's /ospool/uc-shared/public/ namespace and save it in your local directory with the name downloaded-testfile.txt.

More specifically, it breaks into these components:

  • Federation URL: osg-htc.org
  • Namespace: /ospool/uc-shared/public/
  • Object name: OSG-Staff/validation/test.txt
  • Destination: downloaded-testfile.txt

Get A Protected Object From Your Federation

Protected namespaces require that a Pelican client prove it is allowed to access objects from the namespace before the object can be downloaded. In many cases, Pelican clients can guide users through the process of acquiring a token by initiating an OpenID-Connect (OIDC) flow that uses an external log-in service (such as CILogon (opens in a new tab)) through your browser. In other cases, a token must be provided to the Pelican client manually.

For Issuers That Support CILogon Code Flow

Some origins support authentication with CILogon's OIDC client. In these cases, the Pelican client is capable of guiding users through authentication with the origin, once you logged in via CILogon. To download protected objects from origins that support CILogon, run the same command as for downloading a public object:

pelican object get pelican://<federation-url></namespace-prefix></path/to/file> <local/path/to/file>

If you're doing this for the very first time, Pelican will create an encrypted token wallet on your system and you will be required to provide Pelican with a password for the wallet. If this isn't your first time, you will be asked to provide your already-configured password to unlock the token wallet. The output looks like so:

The client is able to save the authorization in a local file.
This prevents the need to reinitialize the authorization for each transfer.
You will be asked for this password whenever a new session is started.
Please provide a new password to encrypt the local OSDF client configuration file:

Enter in a password and press enter to continue and you should see a response like so:

To approve credentials for this operation, please navigate to the following URL and approve the request:
 
<https://some/link/here>

Pelican will display a URL in your terminal and indicate that you should visit the URL in your browser. After copying/pasting the URL to your browser, follow all the instructions there for logging in with CILogon. Once everything is successful, you should see a page like this:

Screenshot of CILogon success page

Finally, if the login is successful, Pelican will automatically fetch the token from the CILogon service and continue with the download.

For Issuers Without CILogon Support

Some origins do not support authentication though CILogon. In this case, users must supply their own JWT that's signed by the origin.

Contact your federation administrator to get a token for downloading a protected object. For Origin and federation admins, follow the documentation to generate an object access token for your users. For OSDF users, contact support@osg-htc.org for help.

Once you have the token, copy and paste it into a file on your machine, and pass the path to the file containing the token to the Pelican client with -t flag.

# Pass the path to the file containing the token using -t flag
pelican object get pelican://<federation-url></namespace-prefix></path/to/file> </local/path/to/file> -t </path/to/token/file>

For example, if a token is saved in a file named my-token, it can be used to get the object /ospool/PROTECTED/auth-test.txt by running:

# Get the object with the path to token file provided via -t flag
$ pelican object get -f https://osg-htc.org /ospool/PROTECTED/auth-test.txt downloaded-auth-test.txt -t my-token

PUT an Object to a Data Repository via the Federation

Another powerful Pelican client command is the pelican object put command. This command does a simple PUT request to add your object to a data repository via the federation, and putting files into a data repository always requires a token. For the example, we will need a token to perform these requests (see the previous section for more information). Here is how you can use pelican object put:

pelican object put <path/to/local/file> pelican://<federation-url></namespace-prefix></path/to/file> -t </path/to/token/file>

Note: you can also specify the federation url here with the -f flag, just be sure not to include it in the request URL as the host name if you decide to do so.

Pelican Object Copy

Note: We are phasing out the object copy command and we recommend user use object get and object put command instead.

As well with Pelican's object get and object put commands, there is also an older command called pelican object copy. This functions the same as the pelican object get/put except that it works for both gets and puts. For example, to do an object get:

pelican object copy pelican://<federation-url></namespace-prefix></path/to/file> <local/path/to/file>

and to do an object put:

pelican object copy <path/to/local/file> pelican://<federation-url></namespace-prefix></path/to/destination> -t </path/to/token/file>

Utilizing Queries with your URL

The Pelican client allows users to modify the behavior of requests by passing URL query parameters in the remote path of an object. Currently supported queries include: ?pack, ?recursive, and ?directread.

Packing Objects with the ?pack Query

The Pelican client has the ability to automatically compress/unpack tarballs when uploading and downloading to and from a federation. The user just needs to supply their request URL with a query. For example, to automatically unpack a tarball on downloading, run:

pelican object get pelican://<federation-url></namespace-prefix></path/to/file/file.tar.gz>?pack=tar.gz <local/path/to/file>

To upload, specify a directory and Pelican will compress it for you:

pelican object put <local/path/to/directory> pelican://<federation-url></namespace-prefix></path/to/collection.tar.gz>?pack=tar.gz

Pelican accepts the following values for the pack query:

  • pack=auto:
    • For downloading, auto-detect the file format and unpack (throws error if it is not any detected format).
    • For uploading, compress using .tar.xz.
  • pack=tar, pack=tar.gz, pack=tar.xz, pack=zip :
    • For downloading, throws an error if the specified object is not in the specified format (tar, tar.gz, tar.xz, zip, respectively).
    • For uploading, create the object in the specified format (tar, tar.gz, tar.xz, zip, respectively).

Recursive Downloads and Uploads with the ?recursive Query

The ?recursive query can be utilized if the desired remote object is a collection. When this query is enabled, it indicates to Pelican that all sub paths at the level of the provided namespace should be copied recursively. To use this query, run:

pelican object get pelican://<federation-url></namespace-prefix></path/to/collection>?recursive <local/path/to/file>

To upload, you can run something similar but with an object put:

pelican object put <local/path/to/directory> pelican://<federation-url></namespace-prefix></path/to/collection>?recursive

Note: This query functions the same as specifying the -r flag described below.

Bypass Caches for Downloads with the ?directread Query

The ?directread query is used if you would like to download an object without utilizing a cache. This way, the object will come directly from the origin each time and never use the cache. To use this query, you can run:

pelican object get pelican://<federation-url></namespace-prefix></path/to/file>?directread <local/path/to/file>

This query does not make sense to use for uploads since uploads go directly to the origin anyway. If you use this query by mistake, you should not run into any issues and your upload will function as normal.

Note about Queries

The ?recursive and the ?directread queries do not require any sort of values assigned to them (e.g. pelican://some/object?recursive=true). If a value is assigned to these queries, that value will be ignored and Pelican will act as if there was no value assigned to that query (e.g. pelican://some/object?recursive=false acts the same as pelican://some/object?recursive meaning the recursive query will be set even if the value is set to false).

Additional Flags

The Pelican client supports a variety of command line flags that modify the client's behavior:

Global Flags:

  • -h or --help: Takes no argument and can be used with any Pelican sub command for more information about the sub command and additional supported flags.
  • -f or --federation: Takes a URL that indicates to Pelican which federation the request should be made to.
  • -d or --debug: Takes no argument, but runs Pelican in debug mode, which provides verbose output for debugging purposes.
  • --config: Takes a filepath and indicates to Pelican the location of the Pelican configuration file.
  • --json: Takes no argument and outputs results in JSON format.
  • -l or --log: Takes a string that specifies a file location to output the pelican logs to rather than the stderr.
  • --version: Prints the version of Pelican and exits.

Flags For object get/put/copy:

  • -c or --cache: Takes a cache URL and indicates to Pelican that only the specified cache should be used. When used, Pelican will not attempt to use other caches if the provided cache cannot provide the file.
  • --caches: Takes the path to a JSON file containing a list of caches. Similar to the -c flag, Pelican will attempt to use only these caches in the order they are listed.
  • -h or --help: Gives additional information on how to use the command as well as lists these flags with short descriptions for the object copy command.
  • --methods: Takes a comma seperated list of methods to try for downloads/uploads, the default is just http.
  • -r or --recursive: Takes no argument and indicates to Pelican that all sub paths at the level of the provided namespace should be copied recursively. This option is only supported if the origin supports the WebDav protocol.
  • -t or --token: Takes a path to a file containing a signed JWT, and is used to download protected objects.

Aliases of The Pelican Binary

The Pelican binary can change its behavior depending on what it is named. This feature serves two purposes; it allows Pelican to use a few convenient default settings in the case that the federation being interacted with is the OSDF, and it allows Pelican to run in legacy stashcp and stash_plugin modes.

Prefixing The Binary Name With osdf

When the name of the Pelican binary begins with osdf, Pelican will assume that all objects are coming from the OSDF which allows it to make several assumptions. The most immediate effect for users is that you no longer need to specify a federation URL, ONLY with no URL scheme or an osdf:/// URL scheme. The command to download a public file from above can then be simplified to:

osdf object copy /ospool/uc-shared/public/OSG-Staff/validation/test.txt downloaded-testfile.txt

Note: When using client commands with the OSDF binary, be careful when using it with pelican:// URL schemes. When using the pelican:// URL scheme, you are still required to provide a federation URL no matter what binary name you are using.

Naming The Binary stashcp Or stash_plugin

The Pelican Platform grew out of a command line tool called stashcp with an associated HTCondor plugin called stash_plugin, which were also used for interacting with objects in the OSDF. To support these legacy tools, Pelican has been built to behave similarly as stashcp and stash_plugin did whenever the Pelican binary is renamed to match the names of these tools.