Tech Deep Dives

Differential Datalog: A New Programming Language for Computing Changes

As the pace of growth in computing efficiency slows and Moore’s law loses relevance, experts are forecasting a new era, where increases in performance emerge from more efficient designs that have been tailored to specific solutions. This trend has already become evident in the plethora of domain-specific hardware devices appearing on the market (such as the chips optimized for artificial-intelligence applications).

Another important ingredient for efficiency will be domain-specific languages: programming languages that have been customized for specific purposes. This blog post details one such language — Differential Datalog (DDlog). DDlog was born out of a VMware Research project to simplify the implementation of control planes for virtual networks. The project is led by senior researcher Leonid Ryzhyk in the VMware Office of the CTO. He, along with several other VMware researchers and employees (including myself), are contributing, as are members of the open-source community.

DDlog is an enhanced version of Datalog, an old programming language. It has many modern features that are very important for programmer productivity, including garbage collection, strong typing, concurrent execution, and interoperation with other programming languages. Datalog is a very simple, mathematically pure, and well-understood language that was once popular with theoreticians who reasoned about database systems.

DDlog is a compiled language. The DDlog compiler generates a Rust program, which can be compiled and executed natively on any platform that supports Rust. Since DDlog programs always interact with their environments (to receive and send changes), we provide support for interoperating with C, C++, Rust, Java, and Go programs.

Nuts and Bolts of DDlog

Datalog programs compute on database tables (database people call these tables “relations,” whereas other programming languages call them “collections”).

The following is a table:

Person Related to Relationship
Mike John Brother
Joan John Spouse
Frank John Child

 

In DDlog, this could be declared as:

input relation Relationship(
person: string, relatedTo: string, relationship: string)

The above is a typical database table with three columns. We could also write some code that fills this table with data but in general, the set of tables is fixed, while the contents of the tables changes during the program execution:

Relationship("Mike", "John", "Brother").
Relationshipthey("Joan", "John", "Spouse").
Relationship("Frank", "John", "Child").

These are known as “facts” in Datalog parlance.

All DDlog computations take some tables as inputs and produce other tables. For example, we could produce a table containing just the children:

output relation Children(
person: string, child: string)
Children(parent, child) :- Relationship(parent, child, "Child").

The turnstyle Datalog symbol :- is read as “if.” So, this program says: parent and child are in the relation Children if there is a row in the table Relationship having parent in the column person, child in the column relatedTo, and the string “Child” in the column relationship.

When you run this program, it will produce a Children table that has the following contents:

Children("Frank", "John").

We can also define more complicated relations, such as a Descendant relation:

output relation Descendant(ancestor: string, descendant: string)

Descendant(ancestor, descendant) :- Child(ancestor, descendant).
Descendant(ancestor, descendant) :-
Child(ancestor, child), Descendant(child, descendant).

You are someone’s descendant if you are their child or if you are the descendant of one of their children.

In a Datalog program, you declare some output tables and write formulas describing how the contents of the output tables are computed based on the inputs — possibly via some intermediate tables. Then, given a collection of facts that tell us what is in the input tables, the program computes the contents of the output tables.

What’s the Difference?

A DDlog program is written in exactly the same way as a Datalog program, but it executes in a very different way. Instead of simply computing the output tables from a set of input tables, DDlog receives changes (differences) as input in the input tables and shows us the changes in the output tables.

The following diagram illustrates this process:

This explains why DDlog was built on top of Datalog: for tables, the notion of “change” is clearly defined. It means inserting and deleting rows in the input tables. In contrast, it would be impossible to define a change with a more powerful programming language (such as Python).

A DDlog program will start with a set of empty input and output tables. Every time a change is made to input tables, it will notify the interested consumers of all changes made to the output tables. This works both for data being added or removed from an input table.

What is it good for?

Why would you use DDlog instead of a more traditional programming language with a lot of nice bells and whistles? The short answer is that efficiently computing changes is more difficult than it sounds. As a challenge for the technical reader, I invite you to write a program that computes the changes in the Descendant relation in the programming language of your choice. Chances are that either your implementation will be slower or significantly more complex than our two-line DDlog program.

DDlog is a great tool when you care about efficiently processing a constant stream of changes and when you can express your problem as a computation on tables. It turns out that there are many application domains where inputs keep changing. We have had a lot of success with DDlog in implementing control software for various distributed systems. In particular, we have rewritten some very complex network management applications, such as OVN — which is more than 14,000 lines— in DDlog.

Open-Source Availability

DDlog is being developed as an open-source project with a liberal MIT license. It is available for download at https://github.com/vmware/differential-datalog. It is still under very active development. The language features a powerful type system, a rich and ever-growing collection of libraries, support for aggregate operations (like finding the largest value in a table), many tools for debugging, a compiler that translates SQL into DDlog, and various other features that make this language a useful tool for serious software engineering. We invite you to consider using DDlog for your next project if you care about tracking a changing dataset.