Rates of data growth are high across the digital landscape and appear to have peaked since artificial intelligence (AI) has entered the scene.

Every day a fire hose of data is being generated across the AI ecosystem. The Forbes reported in 2025 34 million AI-generated images are being created daily.

โ€œThere is a lot of data being generated at every stage of the AI lifecycle,โ€ noted Sathya Sankaran, head of Cloud Products at HYCU, a provider of backup as a service focused on building data protection solutions tailored for the AI data foundation.

With the biggest tech companies pledged to pour a collective sum of $320 billion in AI development and the AI services market estimated to reach a whopping $243 billion in 2025, the data surge is going to get much worse.

โ€œ80% of the enterprises are expected to have an AI-based solution before end of the year,โ€ said Subbiah Sundaram, SVP of Products, at the recent Cloud Field Day event where HYCU appeared with Dell to present a new joint released last year. โ€œIf they’re going to come up with a production service, it means they’ve got to keep the data safeโ€ฆIt’s not a nice-to-have; it’s a must-have.โ€

https://www.youtube.com/watch?v=Bj8GbycX9Dw

But seemingly, the urgency hasnโ€™t fully sunk in with the guardians of data. โ€œIf you ask [companies], do you protect the AI data, one of the common answers you get back is you don’t need to protect the data because you can just recreate the answers,โ€ said David Noy, VP of Product Management at Dell.

โ€œIt’s kind of true but kind of not,โ€ he added.

Noy presented a growing list of reasons why that is not true, and why companies must have data protection top of mind.

One well-known reason is cyber resilience. Against a backdrop of escalating cyberattacks, companies are redoubling their efforts to fortify their digital assets, but AI data protection is still on the sidelines of their strategy.

Noy said, โ€œIf I take the training data or the results of inference and poison them in some way, shape or form, I can actually modify your models, your training data and your results.โ€

Even when all KPIs are met during development, engineers may still run into accidental exposure of sensitive information.

Volatility of AI workloads adds yet another reason to ensure that data is protected. He said, โ€œLetโ€™s say you start to get performance variations and drift in the way your models are behaving and you want to get back to a specific point in time, you want to revector potentiallyโ€ฆThat means you have to know what the state was, including all the configuration parameters that were used in the time that vector database was created and models were basically trained.โ€

Other equally pressing but often overlooked reasons Noy cited that prompt tightening of data protection measures are legal and compliance reasons, data loss from hardware failures or accidental deletion, and data reconstruction.

Broadly, AI data includes training data, vector data, prediction input and prediction output data, model artifacts, metadata and so on. Out of this staggering surge, a new solution has emerged that is gathering a lot of traction: Data lakehouse.

Data lakes are optimized for storage efficiency and flexibility to store high volumes of diverse data types. Data warehouses provide governance and database-like capabilities on top of data lakes allowing users to query and analyze data quickly. A lakehouse can handle both โ€“ store at scale and analyze at speed, explained Sankaran.

This allows users to have a unified platform to throw huge volumes of structured, unstructured and multimodal data into and run AI training engines on.

One of the largest data lakehouses today is Google Cloudโ€™s BigQuery. โ€œIt is a multi-billion-dollar ecosystem,โ€ he said. โ€œIt is estimated to be between a $3 billion and $4 billion ecosystem for Google Cloud and a significant portion of their data portfolio today.โ€

While there is value in this approach, the federation does not effectively mitigate the risks of sensitive data exposure, Sankaran said.

โ€œThere are some inbuilt capabilities with some of these solutions where you can go back to a seven-day window and change some of these things but what if it’s greater than seven days?โ€ he said.

โ€œFor some of these datasets, your only copy is in these lakehouses and if you don’t protect it, you don’t actually have a way of getting those data backโ€ฆ You do need backup,โ€ he added.

A simple way of recovering after a data breach or data loss is to roll back to a point in time to a known good state. HYCU auto-discovers and protects all data assets storing them in customer-owned backup systems.

โ€œThe platform is flexible enough to understand the dataset that we are protecting and ensure we deliver consistency, immutability, cross-regional project protection and support high level of granularity both in terms of what we back up and what we can restore within these datasets,โ€ told Sankaran.

HYCUโ€™s Atomic Backup Sets in BigQuery, a patent-pending solution, is designed to ensure rapid and consistent replication and mining of multi-terabyte datasets with interdependencies.

โ€œToday with cloud-native capabilities, [companies] are able to export one table at a time and they are all out of sync when there are dependencies.โ€

Related datasets often have broken sync when moved to backup systems, and segmentation makes it harder to retrace them back to a consistent state. To remedy that, HYCU leverages native Time Travel capabilities within BigQuery. This allows users to group datasets to a โ€œsingle reference pointโ€ and ensure that they are backed up at the same point in time across the set.

โ€œHaving a board platform that supports 80 plus workloads allows us to not just protect the lakehouse but also a whole lot of connecting ecosystem players that are out there in the market as well,โ€ Sankaran said.

โ€œThis is something that has never been built in the industry before,โ€ and it is a big advance towards โ€œend-to-end protection of AI,โ€ said Simon Taylor, founder of HYCU.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Tech Field Day Showcase

SHARE THIS STORY