Author:

More Blogs

Inside RonDB’s 2PC Protocol: Optimizing Latency, Availability, and Replication

Reaching 100M key lookups per second with REST API and Python clients

100M Key Lookups/Sec with REST API and Python - Short

What if your database could remember everything—every edit, every state, every version—across time, while scaling effortlessly beyond the limits of a single machine?

That’s exactly the kind of challenge we’ve been tackling with Dydra, a revisioned RDF graph store, and RonDB, a high-performance, clustered database built for scale.

What is Dydra?

At its core, Dydra is a flexible RDF graph storage system. It speaks SPARQL, GraphQL, Linked Data Platform (LDP), and other web-native data protocols. You can use it in the cloud, run it locally, or embed it in your application. Whether you’re querying a personal dataset or building a collaborative data platform, Dydra acts as the semantic backbone, allowing you to reason about your data—not just as projections, but as a versioned timeline.

And that’s where Dydra becomes special.

Dydra doesn’t just store the current state of your graph. It also retains previous store states—fully addressable, versioned snapshots—like a Git for graphs. These are accessible via a REVISION argument in SPARQL queries, or even streamed incrementally using MQTT with a REVISION-WINDOW. This turns the graph into a temporal data structure, enabling collaborative use cases where each client has a consistent, convergent view of the data over time.

Internally, this is implemented by annotating each RDF statement with temporal metadata—think of it as “time-traveling triples.” Combined with smart transfer protocols, Dydra can act like a conflict-free replicated data type (CRDT) for graphs. This makes it suitable for distributed collaboration and even disconnected operation.

In short: Dydra stores the what and the when of your data. But to do so efficiently, it needs a fast, scalable backend.

From LMDB to RonDB

Dydra’s default backend has long been LMDB, a memory-mapped key–value store. LMDB is elegant, efficient, and impressively fast on single-server setups. On large systems with many cores and multiple non-uniform memory access (NUMA) nodes, it handles billions of triples with ease—our biggest production repository reached 3.25 billion RDF statements.

We will certainly continue to use LMDB as a very reliable storage backend.

But LMDB has limits:

No built-in replication
No clustering
No native high-availability
Scaling beyond a single server and multiple-machine redundancy requires complex techniques like data federation and ZFS replication (both of which Dydra supports, by the way)

For a distributed service with long-term ambitions (think: trillion-triple graphs), that wasn’t going to keep up.

That’s where RonDB enters the picture.

RonDB: A Backend Built for Scale

RonDB is not your typical SQL database. Originally derived from MySQL NDB Cluster, it has been heavily optimized into a high-availability, low-latency, and cloud-native DBMS.

It features:

Horizontal scalability across multiple data nodes
Built-in high-performance replication (both local and global)
Real-time performance, even under massive parallel query workloads
Automatic provisioning via rondb-tools or integration with Hopsworks

In short, RonDB brings things which LMDB does not: replication, clustering, and cloud-native orchestration.

We’ve already migrated some Dydra repositories to RonDB clusters—both local and remote—and reached the same 3.25B triple milestone, now backed by a clustered system. That’s only the beginning. With RonDB, we’re building toward 100B, 500B, and eventually 1T RDF statement repositories.

This isn’t theoretical. It’s a concrete step toward a trillion-triple store, and we’re just getting started.

Executing Logic Where the Data Lives

A cluster backend like RonDB offers not just more storage and replication—it opens up a deeper possibility: pushing query logic down into the data layer itself.

For us, that meant extending RonDB’s internal interpreter—originally designed for lightweight filtering logic—and teaching it how to execute rich, client-defined logic in parallel on all data nodes.

Enter cl-ndbapi

We started by exposing the full RonDB NDBAPI to Common Lisp. This library, called cl-ndbapi, lets us talk to RonDB clusters directly and efficiently from Lisp, with everything that the NDBAPI offers. This gave us the low-level hooks we needed to generate and compile interpreted programs from Lisp code.

In other words, we built a cross-language compiler that translates Lisp functions into RonDB’s interpreted code language. The program represented by the generated code is deployed onto the data nodes and executed close to the data, massively reducing the volume of intermediate results transferred to the application.

Instead of pulling millions and billions of candidate rows to the client to figure out which statements were valid at a certain revision, we now resolve the revision logic inside RonDB.

That means:

Less bandwidth
Lower latency
And critically: data that never has to leave the data nodes

A SPARQL Use Case: Revision-Aware Queries

Let’s look at SPARQL query that collects the prevalence of descriptors across thesaurus revisions in a dataset of the Standard Thesaurus for Economics (STW):

prefix skos: <http://www.w3.org/2004/02/skos/core#>

prefix zbwext: <http://zbw.eu/namespaces/zbw-extensions/>

# Show the number of versions in which a descriptor is present

select ?prevalence (count(?prevalence) as ?frequency)

where {

select ?s (count(?r) as ?prevalence)

where {

revision ?r {?s a zbwext:Descriptor . }

} group by ? s

}

group by ?prevalence

order by desc (?frequency)

This query makes use of the REVISION clause, which is analogous to a GRAPH or SERVICE clause in SPARQL and which was introduced in our industry paper "Transaction-Time Queries in Dydra" by James Anderson and Arto Bendiken, presented at the MEPDaW workshop 2016. Thus this REVISION clause allows to query revisions with the same ease with which you can query graphs or formulate federated queries.

Traditionally, something like this would require acquiring the full set of candidate statements and filtering them post-hoc by validity intervals. But with revision filtering now inlined inside RonDB, each data node returns only the statements already filtered by temporal logic. We use RonDB's internal interpreter to evaluate revision intervals on each partition in parallel.

Result? The query is faster, and the system uses less memory and bandwidth.

And since the programs can now take arguments, we don’t even need to generate a new interpreted program for each revision—just reuse an interpreted program with different parameters.

Making It Happen: Extending RonDB’s Runtime

To support this tight integration, RonDB was extended in several key ways through collaboration between the teams behind RonDB and Dydra:

Memory Access Primitives
– Partial and full reads directly into interpreted programs
Custom Search Methods
– Allowing index-based access and fast scans tailored to RDF storage
Program Parameters
– Reusable interpreted programs with argument-passing support

These extensions turned RonDB from a passive store into an active execution platform—capable of hosting revision-aware SPARQL logic, running it in parallel across data nodes, and returning only what’s truly needed.

The Result: Smarter Queries, Smaller Transfers

By pushing the revision filtering logic into RonDB’s execution layer, we offload expensive temporal reasoning from the client. This avoids loading massive historical datasets into memory and lets Dydra behave more like a streaming, temporal graph engine—while scaling toward hundreds of billions of triples.

Dydra gains:

Reduced network traffic
Better locality
Lower client-side CPU usage
And a foundation for true distributed, revisioned collaboration at scale

What’s Next?

This is just the first phase. With RonDB’s parallelism and replication, and Dydra’s expressive revision model, we’re building a database that not only scales across machines, but also across time.

Next up:

More advanced SPARQL optimizations
Introduce interpreted code also in the write path when ingesting new data into Dydra
Possibly: pushing more SPARQL algebra and query logic into RonDB

And of course, a trillion triples.

This article was written by the team at Dydra, with support from Hopsworks.