Author:

More Blogs

How to Design a Self-Healing, Dynamic-Size Raft Cluster in Go

How to make an elastic, self-stabilizing, distributed database RonDB accessible to developers with Docker

The flagship feature in the new LTS version RonDB 22.10.0

RonDB is a fork of the MySQL NDB Cluster, which is one of the storage engines supported by the MySQL server. Most importantly, it is the storage engine which is distributed by nature to support high availability. By being an in-memory database, it is also optimized for low latency and high throughput. At Hopsworks, these are exactly the properties we need to support our Feature Store, so that e.g. querying features as input to ML models happens instantaneously.

The reason we forked the MySQL NDB Cluster was because we wanted the performance, but also to make it more easy to use. Using it requires knowledge of its architecture, and maxing out its capabilities asks for a deep technical understanding of its internals. At Hopsworks we saw the chance to build a managed service around it, so that not only the Feature Store and HopsFS could profit from it, but at some point any developer that required a database with its outstanding characteristics.

Meanwhile, RonDB is available as a managed service on managed.hopsworks.ai in conjunction with the Feature Store. Here, more features are continuously being added such as online scaling and online software upgrades. However, this still requires the user to have cloud credentials and sign up for Hopsworks. For us, the next logical step to make RonDB even more accessible to new developers, was therefore to work on an option to create a standalone RonDB cluster locally with Docker Compose. This is what we describe in this blog post, and the corresponding codebase is a separate repository logicalclocks/rondb-docker, which is a fork and complete rewrite of the mysql/mysql-docker repository.

At its core, this repository is a bash script, which creates a docker-compose file with a dynamic amount of management, data node, MySQL server and API/benchmarking containers. It generates and mounts the required configuration files and creates volumes for all the log and data directories. The RonDB version can be supplied by the user by referencing any of the RonDB tarballs available on repo.hops.works/master. The Dockerfile is identical for all containers and it mimics the directory structure of the VMs that we spawn for RonDB clusters on hopsworks.ai.

The following shows a sample generated docker-compose file for the configuration:

1 Management container
1 Node group x replication factor of 2 (= 2 data node containers)
1 MySQL server container
1 API container

Sample docker-compose file:

version: '3.8'

# RonDB-Docker version: 0.1
services:

    mgmd_1:
      image: rondb-standalone:21.04.9
      container_name: mgmd_1
      command: ["ndb_mgmd", "--ndb-nodeid=65", "--initial"]
      deploy:
        resources:
          limits:
            cpus: '0.2'
            memory: 50M
          reservations:
            memory: 20M
      volumes:
      - type: bind
        source: /rondb-docker/autogenerated_files/v21049_m1_g1_r2_my1_api1/config.ini
        target: /srv/hops/mysql-cluster/config.ini
      - dataDir_mgmd_1:/srv/hops/mysql-cluster/mgmd
      - logDir_mgmd_1:/srv/hops/mysql-cluster/log

    ndbd_1:
      image: rondb-standalone:21.04.9
      container_name: ndbd_1
      command: ["ndbmtd", "--ndb-nodeid=1", "--initial", "--ndb-connectstring=mgmd_1:1186"]
      deploy:
        resources:
          limits:
            cpus: '2'
            memory: 3000M
          reservations:
            memory: 2000M
      volumes:
      - dataDir_ndbd_1:/srv/hops/mysql-cluster/ndb_data
      - logDir_ndbd_1:/srv/hops/mysql-cluster/log

    ndbd_2:
      image: rondb-standalone:21.04.9
      container_name: ndbd_2
      command: ["ndbmtd", "--ndb-nodeid=2", "--initial", "--ndb-connectstring=mgmd_1:1186"]
      deploy:
        resources:
          limits:
            cpus: '2'
            memory: 3000M
          reservations:
            memory: 2000M
      volumes:
      - dataDir_ndbd_2:/srv/hops/mysql-cluster/ndb_data
      - logDir_ndbd_2:/srv/hops/mysql-cluster/log

    mysqld_1:
      image: rondb-standalone:21.04.9
      container_name: mysqld_1
      command: ["mysqld"]
      cap_add:
        - SYS_NICE
      deploy:
        resources:
          limits:
            cpus: '2'
            memory: 1400M
          reservations:
            memory: 650M
      volumes:
      - type: bind
        source: /rondb-docker/autogenerated_files/v21049_m1_g1_r2_my1_api1/my.cnf
        target: /srv/hops/mysql-cluster/my.cnf
      - dataDir_mysqld_1:/srv/hops/mysql-cluster/mysql
      - mysqlFilesDir_mysqld_1:/srv/hops/mysql-cluster/mysql-files
      environment:
      - MYSQL_ALLOW_EMPTY_PASSWORD=true
      - MYSQL_USER=mysql
      - MYSQL_PASSWORD=Abc123?e
      - MYSQL_SETUP_APP=1

    api_1:
      image: rondb-standalone:21.04.9
      container_name: api_1
      command: bash -c "tail -F anything"
      deploy:
        resources:
          limits:
            cpus: '2'
            memory: 100M
          reservations:
            memory: 100M
      volumes:
      - type: bind
        source: /rondb-docker/autogenerated_files/v21049_m1_g1_r2_my1_api1/sysbench_single
        target: /home/mysql/benchmarks/sysbench_single
      - type: bind
        source: /rondb-docker/autogenerated_files/v21049_m1_g1_r2_my1_api1/dbt2_single
        target: /home/mysql/benchmarks/dbt2_single
      environment:
      - MYSQL_PASSWORD=Abc123?e

volumes:
    dataDir_mgmd_1:
    logDir_mgmd_1:
    dataDir_ndbd_1:
    logDir_ndbd_1:
    dataDir_ndbd_2:
    logDir_ndbd_2:
    dataDir_mysqld_1:
    mysqlFilesDir_mysqld_1:

Evidently, memory management for the containers was one of the challenges when creating this repository. The memory allocated is both kept in check by the Docker Compose fields “deploy.resources”, but also by the auto-generated configuration file for RonDB (config.ini).

The following image shows the resource usage of this cluster using the Docker Desktop extension “Resource Usage”. Note that the total CPU percentage is out of 14.61/1000%, corresponding to 10 CPU cores. Also note that this cluster is currently not under load and therefore the resource usage is low.

We have three main use cases in mind for outside developers using this repository:

to help developing applications against RonDB
to test RonDB’s high availability properties - simply start and stop a data node container, given a replication factor of >1
to become accustomed to benchmarking RonDB

In terms of benchmarking, we have supplied all necessary files and commands to easily test RonDB with Sysbench or DBT2 and will in the future also support YCSB. Once again, the directory structure is identical to the structure on hopsworks.ai, so that the RonDB documentation on benchmarking can also be followed here. Whilst benchmarking performance will be far better on RonDB clusters with large, separate VMs per node, a developer can now quickly become acquainted with benchmarking before paying for VMs.

For the RonDB team itself, this repository has become a building block to accelerate testing of standalone as well as managed RonDB. In the coming iterations, we will use this repository to showcase new applications we have developed towards the database, as well as let users experiment with managed RonDB locally.

The flagship application we are currently working on is the REST API server, which is an alternative to the MySQL server and allows users to do batched operations towards RonDB in a key-value manner. A managed RonDB Docker cluster on the other hand will allow users to evaluate online scaling, software upgrades and reconciliation locally for themselves. Reconciliation enables a self-healing cluster, which always strives towards a desired state, similar to how Kubernetes operates.

To follow up on this blog post, we have created a quick demo, which shows how to use this repository to create a cluster and run benchmarks on it: