AWS, SolarWinds

Astronomer has made its Astro data orchestration framework for an instance of open source Apache Airflow software available for deployment in a private cloud.

Astro Private Cloud enables IT teams to deploy a data engineering framework anywhere. Previously, Astro was only available as a software-as-a-service (SaaS) application managed by Astronomer.

Carter Page, executive vice president for research and development at Astronomer, said Astro Private Cloud now makes it simpler for IT organizations to, for example, comply with data sovereignty requirements as needed.

Astro Private Cloud is designed to separate the control plane from the data plane, where Airflow software runs to provide a platform to centrally manage data engineering workflows. It’s not clear how many IT teams are using the SaaS version of Astro, but Astronomer notes there are more than 80,000 organizations already using Airflow, with 324 million Airflow downloads being made last year alone.

Astro provides those organizations with an opportunity to streamline the management of Airflow in a way that easily enables multi-team orchestration across data engineering workflows, said Page. The overall goal is to reduce friction in a way that makes it simpler to enforce governance policies, he added.

While IT teams, of course, have been managing data for decades, many of them are now adopting best data engineering practices to automate the management of workflows. That shift is especially crucial for IT teams looking to operationalize artificial intelligence (AI) models that require the right data to be in the right place at the right time.

Rather than building their own control plane to achieve that goal, Astronomer is making a case for an Astro platform that it develops and maintains on their behalf. IT teams can invoke Astro either via a graphical user interface, command line interface or application programming interface (API) for organizations that prefer to deploy Astro as code. An Astro Hypervisor then makes it possible to dynamically scale Kubernetes clusters running Apache Airflow as needed. That approach makes it significantly easier to create, delete, and modify instances of Apache Airflow, said Page.

Like most maintainers of open source software, the developers who contribute code to Apache Airflow are more focused on features and functions than manageability. It’s generally left up to enterprise IT organizations to determine how best to manage open source software, which, while free, can be challenging to manage.

Ultimately, each organization will need to determine how to automate data engineering workflows as the overall volume of data that needs to be managed continues to exponentially increase. That challenge, of course, is now just acquiring the frameworks needed to attain that goal, but also finding and retaining the data engineering expertise needed to use them. Given the size of the Apache Airflow community, one of the best places to start looking for that expertise is among the contributors to the core project itself.

Regardless of how organizations approach data engineering if it’s now more a question of to what degree they will need this capability to manage massive volumes of data, which are now increasingly strewn across the enterprise.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Tech Field Day Events

SHARE THIS STORY