summary
Data contextualization is key to understanding and preventing the impact of bad factory floor data on downstream applications.
When my IT colleagues talk about data lineage, they are trying to understand the upstream and downstream connections of a particular dataset and who is affected by that data. They want to understand the origins of their data and the transformations it undergoes to reach its desired destination, such as an enterprise data lake. Unfortunately, this can be a difficult task for those of us working with industrial data.
Factory floor data is diverse. Most sites generate telemetry data from machines and sensors, as well as transactional, time series, historical, and file data. These diverse data streams challenge manufacturers to not only extract meaningful context across the production line, but also to transform disparate data from disparate sources and factories into actionable insights. If these challenges are not addressed, poor data quality can lead to inaccurate performance assessments, potential problems along the production line, and the inability of manufacturers to proactively prevent machine failures and inefficiencies.
So how do I solve this? The first step is developing a comprehensive data strategy, starting with helping manufacturers clean up and make their data usable. This overhaul of data management processes requires a deeper understanding of data lineage: where it comes from, where it goes, and how it is used over time.
Connecting data lineage to data quality
Pedigree and quality are intertwined concepts. Data lineage is critical to enabling manufacturers to address key issues related to poor data quality, including:
If the data you receive is bad, where did it come from? Why and what went wrong? How can you be notified in real time when data quality degrades, instead of weeks later from a business unit that needs the data for a project or regulatory requirement?
With the right data lineage and observability tools, manufacturers can answer these questions and properly maintain data quality throughout the production process.
Of course, new AI solutions on the factory floor make high-quality data more important than ever. Currently, AI chatbots and agents are imperfect at decisive tasks that require “yes” or “no” answers. While AI can help detect poor quality issues, manufacturers cannot feed “garbage” data to AI tools. Otherwise, you risk hallucinations and unpredictable results. AI assistants and agents require high-quality, contextualized data that is purposefully curated to accurately complete tasks.
The importance of data context
If your IT colleague simply receives a “temperature 33.4” data point from a factory in Atlanta, Georgia while he is sitting in Seattle, Washington, he will likely have no context as to what that data point refers to, what machine acquired it, when it was collected, and whether this temperature is within an acceptable range. But in reality, you don’t have one data point to solve for; you have terabytes of data points.
Manufacturers need to clean data at the edge, as close to the source as possible, and add the proper context needed to data lineage to avoid information gaps and ensure data is properly utilized throughout the production chain.
In most Industry 4.0 use cases, the context of a data point often resides in another system. This means that data needs to be collected from a variety of sources to properly contextualize it. For example, a predictive asset maintenance use case might require collecting raw machine data from one system, work order and planning information from another system, and operator information from another system. This data comes in many different formats and is primarily available through different interfaces, making it difficult to integrate it all into one cohesive view of the factory floor.
Traditionally, manufacturers have taken the approach of “vacuuming” all raw data into a data lake and transforming it as needed to suit their purposes. However, this approach often fails because raw manufacturing data is incredibly heterogeneous and lacks the necessary context for proper data lineage. Then there’s the issue of personas. Data lake users do not have the necessary domain knowledge to add context to raw data.
These examples demonstrate why industrial data must be merged and contextualized at the edge by domain experts.
Actual data lineage
Revisiting the temperature readings from the Atlanta factory, the manufacturer’s data lineage model must provide context from the time a particular data point was collected: what machine it came from, what factory it was in, what the machine was producing, and who was running the machine. With the right context, Seattle analysts can more concisely interpret temperature readings and train machine learning models to predict when their assets will require maintenance. This is especially important when the data appears inaccurate or there are data gaps. Because manufacturers need to better understand the steps required to collect data back to its origins.
Data lineage tools are primarily used after these diverse data streams reach the data lake, but we are developing data lineage on the factory floor. Manufacturers are beginning to adopt industrial DataOps solutions and tools like OpenTelementry, an observability standard that many IT systems use to monitor and manage data pipelines, adding as much context as possible to data before it leaves the factory. This approach requires leveraging the people at the factory who know the process best and ensuring that data is properly tracked throughout production.
The impact of data lacking context can disrupt factory floor processes, but by overhauling data infrastructure from pools of raw data to powering networks of machines and sensors that clearly communicate the provenance of data, manufacturers can empower their factories and evolve their operations into the AI era.
About the author
Aron Semle is HighByte’s Chief Technology Officer. HighByte Intelligence Hub is an edge-native, no-code solution that securely collects, models, and delivers payloads to target applications across the enterprise, unlocking the value of industrial data.
Did you enjoy this great article?
Check out our free e-newsletter to read more great articles.
Subscribe
