Domino Gambrel – An Introduction
Data science has matured over the last decade. As more organizations invest increasingly in developing innovative analytical capabilities, data analysts are now increasingly using best practices from information science to better make their own workflows more manageable (e.g., variant control). Unfortunately, even with tools that make it easier to handle complex workflows, sometimes the final results can be disastrous. This article presents Domino, an open-source tool that has developed from the ground up in order to support modern day data analysis workflows.
In its most basic form, Domino is an interactive web-based environment for managing big data science projects. Domino allows users to define workflow processes in a programming language that closely resembles Java or R (the language most often used by data scientists themselves). The underlying strength of Domino is that its design facilitates the development of reusable workflows – this means that if you need to do the same analysis twice, or reuse some of the functionality between two different experiments, you can. This gives Domino an “if only” quality, as each individual experiment can be used as a “base” case study and plugged into other workflows to derive the maximum results. On the contrary, traditional database management platforms are inflexible and usually require the developer to rewrite functions multiple times, making them harder to reuse for subsequent projects.
Domino has several distinct advantages over traditional platforms for handling big data analysis workflows, all of which contribute to its growing popularity among developers. First, because its programming model allows for reuse, many ideas and improvements can be taken from existing projects for turning these ideas into stable modules. This is not just a semantic issue: by allowing the developers to reuse code, domino also reduces the amount of time necessary to adapt to a new statistical analysis platform. Indeed, as new and experimental ideas become available, the need to change your own code becomes much less.
Next, there is little or no configuration required to get started and start developing on it. Unlike most data analysis platforms, domino comes with its own developer console and a number of demos for testing out the various functionality aspects. Domino doesn’t require any special environment, so even those without extensive software experience can start experimenting immediately with the suite. The console allows users to visualize plots and data visualizations, and offers a number of useful features like the ability to export results in tables, graphs, and heat maps.
Finally, one of the biggest advantages of using a good data science platform – in fact, the biggest advantage – is that it provides support for a wide range of scientific procedures. Many data science workflows have a number of generic steps, and in order to use them on any platform, you usually need to write generic code that will run on any supported machine. With domino, this problem is solved due to the fact that the toolkit includes support for C++, Go, Rust, Python, R and tens of other languages and workflows.
In summary, it’s clear that there are many advantages to using a data science tool like domino. In fact, this was the main reason why I started working with dominoes almost two years ago. However, just like any technology, there are disadvantages too. One of the biggest disadvantages of using the tool is that the performance impact of your code changes dramatically when you switch between CPU and laptop machines. If your projects need to run on a number of machines, this can become a significant disadvantage.