An Architect's Guide to Data Modeling in Manufacturing in UNS-based Architectures
B) Data Modeling in the Unified Namespace: From Topic Hierachies over Payload Schemas to MQTT/Kafka
In manufacturing, the Unified Namespace (UNS) is a powerful tool for facilitating communication between nodes in a network. This event-driven architecture operates on the principle that all data should be made available for consumption, regardless of whether there is an immediate consumer. This means that any node in the network can act as either a producer or a consumer, depending on the system's needs at a given time.
Any manufacturing company intending to implement a UNS-based architecture should follow a series of steps:
Step 1: Connecting to Operational Technology (OT)
The data utilized in manufacturing can be broadly categorized into three types: relational, time-series, and semi-structured or unstructured.
Time-series data can be dispatched through two primary methods:
- Transmitting at consistent intervals to the message broker. This method includes details such as device uptime, but it can generate a substantial volume of data.
- The "report-by-exception" approach. Here, data is transmitted only when changes occur, effectively reducing the data volume. However, this method necessitates additional uptime information. For instance, it's crucial to determine whether the value didn't change or if the device was offline.
The conversion of relational data into events for transmission via a Unified Namespace (UNS) is recommended. This process entails subscribing to all changes in the existing relational table (inserts, updates, deletes) and transmitting these changes to the UNS. Examples of these events are "addOrder," "changeOrder," and "deleteOrder."
SQL triggers or Change Data Capture (CDC) tools like Debezium can be utilized to generate data change events. In some cases, Programmable Logic Controllers (PLCs) or shop floor systems may directly emit these change events. This strategy enables event replay, allowing for the reproduction of the exact SQL table state.
Just like relational data, semi-structured / unstructured data should be transmitted when changes occur. Contrary to popular belief, there are no restrictions on sending images through MQTT. The MQTT standard allows up to 256 MB per payload, while Kafka's default is 1MB, extendable to 10MB without significant performance issues. This capacity is sufficient for most semi-structured or unstructured data.
If data size exceeds these limits, consider splitting larger data, like videos, into smaller segments to ensure compatibility. Alternatively, if large payloads are required, store the data in blob storage like file disk or S3 compatible systems. Then, process only the data reference, such as the file path, via Kafka.
Ensure that no data modification occurs between the OT producer and the message broker to maintain modularity. Some industries, like pharmaceuticals, even mandate this due to regulatory requirements, such as GxP compliance. Downsampling or altering the data before sending it to the UNS could create issues. For example, if someone wants to access the raw data for a different analysis or a new AI model, modification would render this impossible.
Step 2: Topic Hierarchy
The second step in the process requires the creation of a topic hierarchy. Ideally, this hierarchy should closely mirror the physical structure of the manufacturing plant or any existing asset naming system for two main reasons:
- Enhancing data point visibility and facilitating data browsing for OT engineers.
- Bolstering security. For instance, in multi-plant scenarios, access of a certain device could be restricted to specific plants or even more granular levels.
A large number of enterprises adopt the ISA-95 model, thus forming their topic structures based on this standard. However, it's important to note the existence of Sparkplug-B, a standard that outlines potential payload and topic structures. Despite its benefits, Sparkplug-B focuses solely on "Device Management" and doesn't align with the ISA95 model. Also, it relies on ProtoBuf, which, while straightforward for IT professionals, may complicate matters for those in OT. Given that most manufacturing devices are connected via Ethernet cables, the bandwidth savings provided by Sparkplug-B hardly justify the increased complexity.
Consider including the following elements in your topic structure for enhanced functionality:
- A version number to facilitate future changes.
- The client ID for granular access control, allowing a device to send data only to its own topics.
- The client ID for tracing and lineage purposes to pinpoint the exact source of each message.
- The tag name, and for larger quantities, tag groups, which can improve performance. For example, if a device sends large amounts of varied data, a specific microservice may only require a subset of this data. By filtering through tag groups, unnecessary burdens can be avoided.
Example from the United Manufacturing Hub
In the United Manufacturing Hub (UMH), we use an ISA95-compliant topic structure:
umh/v1/enterprise/site/area/productionLine/workCell/originID/_usecase/tag
This structure observes certain rules. All topic names are case-insensitive and only permit the characters a-z
, A-Z
, 0-9
, and -
and _
. Characters such as .
, +
, #
or /
are reserved by either MQTT or Kafka, and cannot be used.
The prefix umh/v1
is obligatory and allows for versioning and future changes. The terms enterprise
, site
, area
, productionLine
, and workCell
align with the ISA95 model. The only mandatory term is enterprise
; the rest are optional, and can be omitted to model data outside the traditional ISA95 model, e.g., a room temperature sensor for a specific area
The originID
signifies the data source. It could be a unique device ID like a serial number or a MAC address, or the name of a docker container extracting information from an MES. Multiple origins in the ID should be separated with underscores. Examples of originIDs: E588974
, 00-80-41-ae-fd-7e
, VM241_nodered_mes
The _use-case
field starts with an underscore and indicates the use-case modelled here. In the UMH, three default use-cases are used (however, the user can add as many as he/she wants): _historian
, _analytics
, and _local
. Messages sent to _historian
and _analytics
must follow the UMH payload schema and will be discarded if they do not. Messages in _local
remain local to the message broker and are not parsed, checked for schema or processed by any default microservice. This approach is suitable when the data format of the messages can't be controlled.
The tag
is optional but generally recommended. It can be used to specify a certain message type or to create topic groups.
Note regarding the underscore: because one can omit elements of the ISA95 model (such as workCell
when modeling the entire production line), it can make parsing and detecting whether the current part of the topic is now a workCell
or a use-case
difficult. To reduce complexity when parsing and increase resiliency, the use-case needs to always start with an underscore.
Step 3: Payload Structure
This step involves determining the structure of your payload data, which is subject to your production environment's specific needs such as speed, compatibility, simplicity, and capacity to handle complex data structures. Payload data can be defined using various data formats, including JSON, Protobuf, XML, or any other suitable data format.
Binary formats like Protobuf or Avro offer a structured and compressed approach, leading to bandwidth transmission savings. However, for instances where direct readability is crucial, XML, JSON, or YAML structures are preferred.
We recommend using JSON payloads, as in manufacturing bandwidth is rarely a concern. JSON enhances the capability of OT professionals to work with and understand messages. It enables you to directly open and interpret the messages in tools like MQTT Explorer or Node-RED. With formats like protobuf, such direct understanding isn't possible.
Example from the United Manufacturing Hub
For the two use-cases _historian
and _analytics
, the payload must follow the payload schema. The payload schema in the UMH data model is influenced by the _usecase
and tag
:
1. _historian
This is recommended when you want to use the Historian Feature of the UMH (**https://umh.docs.umh.app/docs/features/historian/**). This feature allows for reliable data storage and analysis for your time-series data.
The payload is a JSON format with at least two keys:
timestamp_ms
: An int64 type key representing the Unix timestamp in milliseconds upon message creation.- One additional key as either int64 or float64, e.g.,
"temperature": 56.3
You can group tags together using three methods:
- Use underscores in the key name, like
spindle_axis_x
. This will show up the tagx
in the groupaxis
which is a sub-group ofspindle
. - Use a
tag
inside the topic. This will be put before the key name. Multiple groups can be formed using the topic delimiter (/
for MQTT and.
for Kafka). - Combine both methods. For example, sending a message to a topic called
.../_historian/spindle/axis
with the key namex_pos
will store a tagpos
in the groupx
which is a sub-group ofaxis
andspindle
.
Example 1: if you send a message into the topic umh/v1/dcc/aachen/shopfloor/wristband/warping/_historian/spindle/axis
with the following payload:
You will save
- for the equipment
warping
in thewristband
production line located in the areashopfloor
in a site calledaachen
for the enterprisedcc
- four tags, with two tags called
pos
andspeed
in the groups / sub-groupsspindle_axis_x
as well as two tags with the same name in the groups / sub-groupsspindle_axis_y
- for the Unix timestamp
1680698839098
into the database, which can then retrieved from the API. More information about how it is stored and how it can be retrieved in the subsequent chapters.
Example 2: if you send a message into the topic umh/v1/dcc/aachen/_historian
with the following payload:
You will save
- for the site called
aachen
belonging to the enterprisedcc
- one tag called
temperature
- for the Unix timestamp
1680698839098
into the database, which can then retrieved from the API.
2. _analytics
The _analytics use-case should be used when you wish to leverage the Analytics Feature of the UMH (**https://umh.docs.umh.app/docs/features/analytics/**). This could be used to create production dashboards with automated calculated OEEs, drill-downs into stop-reasons, order overviews, machine states, and more. It is recommended to use for certain time-series data as well as for most relational data.
The payload always comes in JSON format, but the schema will vary based on the tag
specified.
Jobs
TODO: definition of a job according to ISA-95. a job is an order to produce target-amount of a certain product-type (which has a target cycle time speed). It can be added, started and ended. Products, when produced, are connected with the job, so that one can get later an overview over the produced/scrapped pieces or amounts. In process, this is also called a batch.
- tag: job/add (previously
addOrder
) - job-id
- product-type
- target-amount
- tag: job/delete
- job-id
- job/start (previously
startOrder
) - job-id
- timestamp-begin
- job/end (previously
endOrder
) - job-id
- timestamp-end
product type
- product-type/add (previously
addProduct
) - product-id
- cycle-time-in-seconds
product
- product/add (previously
count
) - product-type-id
- We do not know the product-type-id
- It is already specified in the job
- But when inserting into database, we need to know it
- timestamp-end
- timestamp-end is the “primary key” to identify a product for a workCell
- note: there can only be one product produced per millisecond.
- (optional) id
- (optional) timestamp-begin
- (optional) total-amount
- (optional) scrap
- product/overwrite (previously
modifyProducedPieces
) - timestamp-end
- (optional) id
- (optional) timestamp-begin
- (optional) total-amount
- (optional) scrap
Shifts
- shift/add (previously
addShift
) - timestamp-begin
- timestamp-end
- shift/delete (previously
deleteShift
) - timestamp-begin
States
- state/add (previously
state
- timestamp-begin
- state
- see also our state list, that one stays the same
- state/overwrite (previously
modifyState
) - timestamp-begin
- timestamp-end
- state
- state/activity (previously
activity
) - timestamp-begin
- activity
- note: there needs to be a microservice autoamtically calculating the state from activity and detectedAnomaly
- state/reason (previously
detectedAnomaly
) - timestamp-begin
- reason
- note: there needs to be a microservice autoamtically calculating the state from activity and detectedAnomaly
Note: Items related to digital shadow are removed for now (uniqueProduct
, scrapUniqueProduct
, addParentToChild
productTag
, productTagString
). Recommendation is also removed for now (recommendation
)
3. Other use-cases
For other use-cases, the payload should follow a JSON format containing the timestamp as UNIX Millis with the key "timestamp_ms".
Note: If a message does not comply with the topic and message schema, it will be discarded and logged. The mqtt-to-kafka bridge will output a warning message into its logs. Kafka-to-postgresql will also output a warning and send the message to a separate "putback queue". More details on this behavior can be found in the documentation.
Step 4: Enforcement
The final step in this process is enforcing the established standards and conventions throughout your system. This consistency ensures system integrity. The enforcement involves outlining security requirements, managing state and data flow, setting boundaries, and confirming that each device has the necessary authorization to exchange information.
Currently, enforcing message and topic schemas is not a feature available by default in MQTT. However, efforts are underway to introduce this functionality. HiveMQ, an MQTT broker, is actively working on providing such features. For more details, you can refer to their documentation: https://docs.hivemq.com/hivemq/4.15/data-governance-hub/index.html.
For Kafka, this feature is known as a 'schema registry'. It's an integral part of the protocol. If a message doesn't comply with the established standard, the MQTT/Kafka broker won't receive it, and it will be rejected.
A few other features in MQTT, such as last will and state, are less relevant for manufacturing contexts. These features are typically more applicable in use-cases involving globally scattered devices, such as connected cars. We recommend focusing on features that most directly apply to your specific manufacturing requirements to ensure the optimal setup for your system.