The United Manufacturing Hub as Open-Source Historian

In the last blog article “Historians vs Open-Source databases - which is better?” we discussed the differences between historians and Open-Source databases and concluded that traditional IT databases discourage OT engineers and traditional OT historians discourage IT engineers.

Our conclusion was that an optimal solution would be a system that meets the needs of the OT engineer, but can still be maintained by IT.

This is why we are happy to announce that we now enable this with the United Manufacturing Hub!

To this end, we have thoroughly redesigned our frontend plugin for Grafana to create a user-friendly interface:

0:00

/0:58

(members of our Discord channel already got a sneak-preview)

As shown in the video above, we

adapted the data model to use the ISA95 standard (enterprise --> site --> area --> production line --> work cell)
reorganized the potential values. There is now a distinction between automatically calculated and user-defined tags, different KPIs and tables.
added time series functions, from automatic downsampling to gapfilling and statistical functions like Min, Max, Avg.

Are you interested in trying it out yourself? Sign up via our website contact form and become a beta tester!

Frequently asked questions

We received a lot of feedback on our last blog post, so in the following section we would like to answer the most common questions.

What about data compression and retention?

Most open source time series databases also have this feature. Just take a look at our tutorial on enabling data compression and data storage in TimescaleDB.

https://learn.umh.app/guides/umh/working-with-the-data/tutorial/data-compression-and-retention-in-timescaledb/

What about performance? Relational databases are very inefficient

Yes, relational databases are inefficient. But TimescaleDB is not a relational database. It is an addon for time series data on (the relational) PostgreSQL, which in combination makes querying and inserting time series data very efficient.

What about support? My company only deals with large-players and we cannot rely on random people on the internet!

TimescaleDB is currently (2022-08-26) funded with 181.1M USD. Their entire business model relies on providing a reliable database as a service.

By the way, for years Microsoft used exactly the argument that Open-Source is not worthy, insecure and you cannot get support for it.

However, the IT market has taken a clear stance here and the entire IT server landscape is now dominated by Open-Source tools like Linux, Docker and Kubernetes.

Microsoft realized that it has lost the battle against Open-Source. For this reason they bought GitHub (the world's largest open source community) four years ago for $7.5 billion and is now using it to actively promote Open-Source.

What about data modeling?

Data modeling in the United Manufacturing Hub is not done directly in the Unified Namespace instead of the database level. This allows working with contextualized data in real time and there is no need to wait until it is available in the database.

What about connectors?

The United Manufacturing Hub offers a variety of connections for newer and older industrial equipment and systems - from PLCs, sensors, cameras and ERP/MES systems.

I am still skeptical.

You should be! Especially in the area of Industrial IoT and Industry 4.0.

However, if you take a look at our architecture and the decisions that underlie it, you'll find that our approaches to data processing are actually more reliable than those of a traditional historian.

For example, take a look at our article on "Tools & Techniques for scalable data processing in Industrial IoT".

Or look at the extent to which traditional IT databases like PostgreSQL (and TimescaleDB) implement the issues of reliability, scalability and maintainability:

Relational databases in particular rely heavily on ACID compliance, which ensures that data loss is technically impossible.
There is write-ahead logging (WAL), a family of techniques to ensure two of the four major points of ACID compliance
There is much scientific literature on replicating and partitioning databases to improve scalability and reliability (Web of Science returns 192 papers mentioning the three words "database", "partitioning", and "replication")
More general information about these techniques can be found in the book "Designing data-intensive applications" by Martin Kleppmann

What is missing?

We are currently working on an integration with Grafana Alerting so that you can define thresholds for tags or automated calculated KPIs and list them or notify them via mail, SMS or PagerDuty.

In addition, we have not found a user-friendly AND reliable way to set up continuous data queries, e.g., add two different tags and derive a third "calculated" tag in real-time.

We could rely on SQL queries, but that doesn't allow us to send the data back to the Unified Namespace. There is Node-RED, which is a very common tool in Industrial IoT. It offers a great user experience and is intuitive to use, but it has some scaling issues. Then there are stream processing tools like benthos, which are very reliable, maintainable and scalable, but nowhere near as user-friendly as a Node-RED.

For now, we recommend starting with Node-RED and then scaling up with Benthos once you know what you want.

We believe we have developed a full-featured and Open-Source historian.

Do you want to test it out? Become a beta tester by simply signing up via our website contact form!

Tell us what you think is still missing by commenting under the LinkedIn post or in our Discord channel.

The United Manufacturing Hub as Open-Source Historian

Frequently asked questions

What about data compression and retention?

What about performance? Relational databases are very inefficient