Space (the final frontier) is sending vast amounts of data back to Earth as we study both our planet and the rest of the universe from orbit. The European Space Agency (ESA) operates hundreds of satellites and receives petabytes of streaming data for analysis and retention, a truly planet-scale data storage problem. The French National Center for Space Studies (CNES) has partnered with Orange, HPE, Tealenium, and Scality to provide a solution for storing up to 6,000 Petabytes of data and making it widely available for a range of research purposes. The new system replaced a collection of different storage platforms, which were deployed over time for various satellite programs. The Scality platform unifies storage and access to all satellite data, offering added benefits in terms of future capacity and efficiency, including reduced costs and energy consumption.

One Ring to Store it All

At the centre is a Scality Ring, a distributed object storage system that stores data across three sites in a fault-tolerant and highly available architecture. The CNES ring has three locations, with data written to all three immediately. Approximately 1 PetaByte of data is received every month. Within the ring, new data is routed to a high-performance hot storage tier, where it is instantly available for analysis. After six months, older data is moved to a more cost-effective tier, specifically to two tape libraries. The current state is 53 PB of hot tier data on the Scality Ring and 150 PB on tape. The transition to tape, and back again when needed, is integrated with the API of the drive vendor’s HSM product. By integrating with the existing HSM, Scality is not required to develop direct tape support or lock clients to a specific hardware vendor. Tape is the most cost-effective way to achieve the future target of six Exabytes of data and to protect the existing data, which dates back to the 1960s. The tape libraries enable the storage of exabytes of data at a low cost per terabyte. With no power required to retain the data, they meet the project’s efficiency requirements.

Standard API to Access Exabytes of Data

Storing all this data is excellent, but the value truly comes when a research group utilizes the data, retrieving specific forest fire images, for example, and gaining valuable insights. Scality provides an S3-compatible API, allowing all data on the ring to be accessed through a well-known API optimized for distributed data access. The standard S3 API allows access to data in the hot tier for immediate access. The S3 Glacier API enables the retrieval of objects stored on tape, copying them to the hot tier. The Glacier API includes a process to flag an item for retrieval, which may take tens of minutes as the tape library retrieves the correct tape and seeks the proper location. One of the useful characteristics of the S3 API is that object names remain unchanged when moved between different storage technologies; the name used to access the object on hot storage is the same as when the object is on tape. The S3 APIs are well-known in the industry and enable software developers to leverage their existing knowledge of working with cloud storage to access on-premises data stored on Scality. 

Space Age Data

Scality impresses as a company that focuses on one thing: S3-compatible object storage. This project highlights the approach of specializing and partnering with other specialists, specifically in the areas of tape HSM and network connectivity. Using the S3 API allows data consumers to leverage well-established techniques and tools, thereby reducing barriers to deriving value from this public-good dataset. The potential to store up to 6 Exabytes of data in a private cloud deployment is outstanding. 

Scality presented: Leading Space Agency’s Long-Term Scientific Storage at Scale with Scality at Cloud Field Day, you can watch all the presentations on their appearance page.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Tech Field Day Events

SHARE THIS STORY