Getting Data with Pelican

Getting Data With Pelican

Pelican is built on top of HTTP and uses the most common HTTP verbs (opens in a new tab) to interact with data: downloading objects happen with an HTTP "GET", uploading an object is an HTTP "PUT", and discovering data uses a combination of HTTP's "HEAD" and WebDav's "PROPFIND". This architecture means any tool that already speaks HTTP can integrate with Pelican. However, Pelican's official clients -- any of our tools designed for interacting with remote objects -- make intelligent use of special information provided by federation services like the Director to deliver the best experience. They also have built-in optimizations that help interact with data more efficiently, like multi-worker object streaming, automatic retry policies, and tools for packing/unpacking objects while they're in flight.

One of the Pelican Platform's core goals is enabling data access wherever it needs to happen -- whether that's from the the command line, from a browser, in an HTC workflow, a PyTorch training loop, or a Jupyter Notebook. To that end, we've been working hard to develop and maintain a wide range of clients that meet our users' diverse needs. Information about each of our clients can be found in this section and are laid out by client type.

Which Client Is Right For You

Picking a client starts with understanding what you want to accomplish and where you want to accomplish it.

Pelican's Command Line Client

Pelican's command line client (also referred to as the Pelican CLI) excels at broad object manipulation and management tasks, including writing/reading large collections of objects, syncing data between local and remote resources, and discovering data that's accessible through a given namespace/federation prefix. For details about Pelican's CLI, see Getting Data With Pelican/Command Line Client.

Pelican's Python Filesystem Specification

If your goal is to integrate Pelican with Python, you're looking for our Pelican Filesystem Specification (opens in a new tab), or "FSSpec" for short. This client lets you interact with Pelican objects at any level of your code, including by plugging Pelican directly into popular Python libraries like xarray (opens in a new tab) and PyTorch data loaders (opens in a new tab).

Pelican's HTCondor Plugin

Pelican maintains a plugin that acts as the preferred transfer tool for HTCondor (opens in a new tab), a software suite used in many distributed/clustered compute federations, such as the OSPool (opens in a new tab).

This plugin leverages HTCondor's existing plugin architecture, enabling HTCondor to manage all pelican:// and osdf:// file transfers as part of distributed, high-throughput computing workflows.

For more information on how file transfer plugins work in HTCondor, see HTCondor's documentation (opens in a new tab).