Graph database and analytics company Neo4j has developed a new distributed graph architecture. Ambitiously named Infinigraph, the limits of this technology are more modest; this self-managed offering enables Neo4j’s database to run operational and analytical workloads together in a single system at 100TB+ scale.

That 100TB+ scope is important, primarily because it means workloads can reach massive sizes without fragmenting the graph or duplicating infrastructure. But why is graph database unity, integrity and wholeness (there is no comfortable opposite for fragmentation) so important?

Graph Integrity Matters

Keeping a graph database in a non-fragmented state is important because a more distributed deployment can lead to data aggregation issues and what graph fans like to call “cross-partition traversals”, which can compromise performance. Data systems at this level can also experience performance degradation because of wider data ingestion footprints, throughput challenges (in terms of how many operations it can juggle concurrently) and there is also additional load put on core system resources, including CPU, memory and storage.

Keen to demonstrate a higher level of functionality in the same vein as the above factors, Neo4j also says that the database guarantees full ACID compliance so that every read, write, and update is consistent, reliable and recoverable. This is the case, even with billions of relationships and thousands of concurrent queries run in real time.

Magical analyst house Gartner says that the convergence of operational and analytical systems is happening by both collaboration, or full integration. This (according to Gartner) is being achieved by three viable approaches: 

  1. One database and one copy of data.
  2. One database but with two engines, one row-based, one column-based, integrated and synchronized.
  3. Two or more databases that are designed to synchronize and work together.

GenAI Is Data-Hungry 

Clearly intended to be timely for generative AI deployments, Neo4j suggests that Infinigraph opens up use cases where deployments demand “unprecedented data scale” today. 

What is the scope of unprecedented scale, then, in this case? 

The company is talking about use cases where data teams are looking to embed tens of millions of documents as vectors, storing them directly in the graph to power context-aware assistants and semantic search. No one-trick pony, Infinigraph also powers global fraud intelligence, product graphs with hundreds of millions of stock keeping units (SKUs) and compliance analyses across decades of data. All of that… and Neo4j promises it’s “fully traversable” (there’s that term again) in real-time.

“Infinigraph sets a new standard for enterprise graph databases: One system that runs real-time operations and deep analytics together, at full fidelity and massive scale,” said Sudhir Hasbe, president, technology, Neo4j. “We’re giving [application and data science] builders the power to create intelligent systems that transform data into knowledge, scale without limits and solve their biggest data challenges—without added complexity or cost.”

Transactional & Analytical System Silos

Hasbe and team say that the inspiration for Infinigraph comes from the fact that enterprises are plagued by data silos separating transactional systems from analytical tools. This divide hampers AI applications, for reasons already illustrated. Complex integrations are costly. As already illustrated, organizations today are often forced to stitch together dual databases, synchronize multiple separate systems, or push a single engine beyond its limits.

“Infinigraph solves this problem directly. It enables teams to run both types of workloads in the same system at scale, without ETL pipelines, sync delays, or redundant infrastructure,” noted Hasbe. “It can power autonomous agents, compliance systems and transactional applications on one consistent source of connected truth. Teams can detect fraud and analyze fraud rings from the same dataset. They can generate real-time customer recommendations while analyzing decades of customer data and behavioral trends.”

Sharding, as a Whole

The architecture behind Infinigraph is the result of a multi-year engineering investment. It uses sharding that distributes the graph’s property data across different members of a cluster. The graph still stays logically whole, queries behave as expected, and applications scale without code changes or manual workarounds.

With Infinigraph, Neo4j says it provides users with the full spectrum of scale architectures so that they can deploy replicated graphs for high availability and read scalability. Users will also be able to operate federated graphs for querying disconnected graphs or sharded graphs with Infinigraph. Organizations can mix and match these architectures to drive various use cases across the business. 

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Tech Field Day Events

SHARE THIS STORY