New OPC-UA Connection released

David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
edited November 2021 in Help


The OPC Unified Architecture (OPC-UA) is an extremely popular protocol for machine-to-machine communication. It connects industrial equipment and IoT (Internet of Things) devices with data collecting servers and can give deep insights about shop floor level data IoT devices can be connected to a variety of equipment types such as temperature and pressure sensors in a chemical plant tank and conveyor belt speed sensors in a packaging plant. The new RapidMiner OPC-UA Connector Extension, released at the same time as RapidMiner Studio version 9.10, allows analysts and data scientists to tap into this vast pool of industrial plant data. The extension provides capabilities to create and manage connections to an OPC-UA server, and new operators that aid engineers in discovering and integrating helpful data sources  into a RapidMiner process.

Technical Overview

The OPC-UA extension uses the open-source stack of the Eclipse Foundation project Milo to establish a connection to an OPC-UA server. The server endpoint URL is stored in a new connection object, and the connection can be tested to ensure that the server can establish connectivity with the local Studio client.

The Browse Nodes operator crawls through all connected nodes on the server and returns a list of them, including their data types and, if selected, a sample value. There are a few things to consider:
  • The node structure is quite complex; it has a hierarchical layout with multiple references to the same node.That is why on large servers crawling can take a long time and return duplicate entries.
  • Not all nodes are human-interpretable, especially in namespace 0, and are meant for diagnostics and internal settings only.

The Read OPC-UA operator connects to a specific node and collects new incoming data. It will request values at a specified time interval and duration. Note: the operator will wait until the duration is completed before returning any results. Hence, we recommend running this operator frequently with short durations (for example with an AI hub scheduler) rather than waiting a long time for one large result.

The Read OPC-UA History operator allows users to retrieve stored historical events from a node. Given a specific time window, the operator will collect as many data points as specified. Note: OPC-UA reads data in reverse chronological order, so it goes from the Start Date ‘backwards’ until the End Date). If users are retrieving high frequency or slowly changing data, we recommend skipping some values, for example reading every other data point. To ensure you get all values for a specific period, simply supply a large enough number of data points since the operator will stop when there are no more data available.
Note: not all nodes have the feature to store historic data, i.e. "HistoryRead". If you try to read stored data from a node without this property, you will see an error message like this:

Figure 1: Error message when trying to read historic data from a node that does not store data.

Practical Demonstration

Let us look at a practical approach on how to work with the OPC-UA Connector.
First, we create a connection to a publicly available demo server:


and then check if the connection can be established:

Figure 2: Successful connection test to a public OPC-UA server

We could choose to start scanning for all available nodes, but we take a shortcut and focus only on the node on production name space 1 with nodeID 1001. This is a demo asset of the server: a pressure tank with some sensors.
The output of the Browse Nodes operator reading data from the node (ns=1,i=1184) node looks like this:

Figure 3: Result of Browse Nodes, showing all sub-nodes that are linked to that asset.

For us, the most interesting attribute is Pressure (ns=1;i=1185) which provides a numerical value of the current pressure in the tank.
We now use the Read OPC-UA operator to start collecting new incoming sensor readings and use them right away in RapidMiner. We configure the operator to collect 20 measurements for a duration of 10 seconds:

Figure 4: Live data values of a pressure sensor.

Now after some understanding of the data we decide to analyze more data points, but without waiting for new data. Using the Read OPC-UA History operator, we can collect stored data from past events. Again, not all servers and nodes support this feature, but in this example, the pressure node has historical data for the past several minutes (remember this is a public server storing events only for a short time; in live systems data can reach back months or years).
We select the current time (while writing the post) as the start time, and the end time a few hours earlier. We also choose to retrieve only every 10th measurement which still gives us a 2-second time resolution:

Figure 5: Collected historical data of the same pressure sensor for a longer period.

With historical data we can now, for example, build an anomaly detection model to compare new events to these past data and see if they fall within an expected distribution. We train a univariate outlier detection model (from the Operator Toolbox extension [link]) and store the model in a RapidMiner repository for the next step, deployment, to see if new sensor values behave differently previously observed values.
So now we collect new data within a short duration, say 10 seconds, and calculate their respective outlier scores:

Figure 6: New collected live data, with an outlier scored, based on the model build on the historical data.

We see the pressure values during this period are considered normal, but we could calculate a maximum outlier score and then monitor these scores in real-time, raising an alarm if a score goes above this threshold. To accomplish this, we could place the operators inside a loop and let the process run locally or use an AI Hub for a more scalable and reliable solution. We could schedule the scoring process to be run regularly or place it in a Real-Time Scoring Agent (RTSA) for low-latency, on-demand deployment.


OPC-UA Server
The instance that is managing the OPC-UA infrastructure. It hosts all managing components and stores the data collected by associated assets. The exact architecture and available features can vary, especially with many commercial vendors selling their own proprietary versions.
Endpoint URL
URL pointing to a specific OPC-UA server. URLs typically start with "opc.tcp://" as protocol followed by an IP or alias, plus port number and additional identifiers.
    • opc.tcp://localhost:26543/UA/MyLittleServer
    • opc.tcp://opcuademo.sterfive.com:26543

Namespace ID
Top Level ID of a server represented as an integer
Namespace 0 is typically used for the system level of the server (status level, diagnostics, references), whereas levels 1, 2, ... are user defined name spaces for registering assets such as machines.

Node ID
The Node ID is the unique identifier of an asset in a namespace.
Node IDs can have the following types:
  • Numeric
  • String
  • GUID
  • Byte String
Numeric and string types are used most often.

Node ID Naming Schema
The common naming schema for Node IDs is the following:
ns=0;i=1158 or ns=1;s=VesselPressure
The first part is the namespace address "ns=", followed by the integer value of the namespace. The second part is the Node ID with its type of prefix:
  • i = numeric
  • s = stringg
  • g= GUID
  • b = byte string
In RapidMiner it is possible to either enter the namespace and node id as separate integer values or enter the full id directly as sting.


  • Options
    David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    edited August 2021
    Quick Update:

    The data structure of the demo server seems to have changed.

    Just adopt the the node IDs to these two new ones:
    • Pressure Vessel : ns=1;i=1184
    • Pressure : ns=1;i=1185

  • Options
    David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    New version released

    Read History now supports continuation points to retrieve larger amounts of stored data.
    From the user perspective nothing changes, but results are no longer limited by the server response limit per request.
Sign In or Register to comment.