The era of continuous exponential increase of efficiency of computers (known more commonly as Moore’s law) is finally coming to an end. What will be the next driver of growth in computing? Experts forecast a new era, in which performance increases will come from more efficient designs, customized to the problem at hand. This is clearly visible in the plethora of domain-specific hardware devices that are proliferating (as an example, consider the chips optimized for Artificial Intelligence applications). Another important ingredient for efficiency will be domain-specific languages: programming languages that are customized for a specific problem, rather than general-purpose. The subject of this short write-up is an example of the latter, a new programming language named Differential Datalog.
Differential Datalog is based on an old programming language, Datalog, but enhanced with many modern features that are very important for programmer productivity (garbage-collection, strong typing, concurrent execution, interoperation with other programming languages). Datalog itself is a very simple, mathematically pure, and well-understood language; it was used mostly by theoreticians who reason about database systems.
Datalog programs compute on database tables (database people call these tables relations; other programming languages call them collections). Here is a table:
In a Differential Datalog this could be declared as
input relation Relationship(
person: string, relatedTo: string, relationship: string)
This is just a typical database table with three columns. We could also write some code that fills this table with data, but in general the set of tables is fixed, while the contents of the tables changes during the program execution:
Relationship(“Mike”, “John”, “Brother”).
Relationship(“Joan”, “John”, “Spouse”).
Relationship(“Frank”, “John”, “Child”).
These are known as “facts” in Datalog parlance.
All DDlog computations take some tables as inputs and produce other tables. For example, we could produce a table containing just the children:
output relation Children(
person: string, child: string)
Children(parent, child) :- Relationship(parent, child, “Child”).
The turnstyle Datalog symbol :- is read as “if”; this program says: parent and child are in the relation Children if there is a row in the table Relationship having parent in the column person, child in the column relatedTo, and the string “Child” in the column relationship.
When you run this program it will produce a Children table that has the following contents:
We can also define more complicated relations; for example a Descendant relation:
output relation Descendant(ancestor: string, descendant: string)
Descendant(ancestor, descendant) :- Child(ancestor, descendant).
Descendant(ancestor, descendant) :-
Child(ancestor, child), Descendant(child, descendant).
So you are someone’s descendant either if you are their child, or if you are the descendant of one of their children.
In a Datalog program you declare some output tables and write formulas describing how the contents of the output tables is computed based on the inputs, possibly via some intermediate tables. Then, given a collection of facts that tell us what is in the input tables, the program computes the contents of the output tables.
So, what is the Differential part in Differential Datalog? A Differential Datalog program is written in exactly the same way as a Datalog program, but it executes in a very different way: instead of simply computing the output tables from a set of input tables, it actually receives as input changes (or differences) in the input tables and it tells us what the changes are in the output tables. The following diagram illustrates this process:
Now we can explain why Differential Datalog, or DDlog for short, was built on top of Datalog: for tables the notion of “change” is clearly defined: a change consists in inserting and deleting rows in the input tables. If we were to use a more powerful programming language as a base (e.g. Python) we would have much more trouble defining what a “change” even is.
A DDlog program will start with a set of empty input and output tables; every time some change is made to input tables, it will notify the interested consumers of all changes made to the output tables. This works both for data being added or removed from an input table.
What is it good for?
Why would you use DDlog instead of a more traditional programming language with a lot of nice bells and whistles? It turns out that computing changes efficiently is in general quite difficult. As a challenge for the reader, we invite you to write a program that computes the changes in the Descendant relation in the programming language of your choice. We bet you that either your implementation will be slower, or significantly more complex than the 2-line program written in DDlog.
DDlog is a great tool when you care about processing a constant stream of changes efficiently, and when you can express your problem as a computation on tables. It turns out that there are many application domains where inputs keep changing; we have had a lot of success with DDlog in implementing control software for various distributed systems; in particular, we have rewritten some very complex network management applications, such as OVN – which is more than 14 thousand lines of DDlog.
DDlog is a compiled language; the DDlog compiler generates a Rust program, which can be compiled and executed natively on any platform that supports Rust. Since DDlog programs always interact with their environment (to receive and send changes), we provide support for interoperating with C, C++, Rust, Java and Go programs.
DDlog is being developed as an open-source project with a liberal MIT license. It is available for download at https://github.com/vmware/differential-datalog. It is still under very active development. The language features a powerful type system, a rich – and ever growing – collection of libraries, support for aggregate operations (like finding the largest value in a table), many tools for debugging, a compiler that translates SQL into DDlog, and many other features that make a language a tool for serious software engineering. You should consider using DDlog for your next project if you care about tracking a changing dataset. To learn more about the researchers working on this project please go to research.vmware.com.