The use of open datasets in predictive maintenance fault detection

The power of open datasets in predictive maintenance fault detection

Blog, Resources

This post on predictive maintenance was created based on the presentation delivered by Piotr Herbut and Jakub Pietrucha at Wroclaw AI Team (WAIT) meetup. If you’re curious and want more, you can follow us or them on LinkedIn: slashdev / Wrocław AI Team (WAIT).

Background

In today’s industrial landscape, unplanned downtime due to equipment failures can cost manufacturers millions of dollars annually. The emergence of predictive maintenance technologies promises to revolutionize how we maintain industrial equipment by identifying potential failures before they occur. At the heart of these technologies lie machine learning and advanced analytics, which require quality data to develop accurate and robust models.

While many organizations are investing heavily in predictive maintenance solutions, they often face a critical challenge at the beginning of their journey: the lack of sufficient failure data from their own operations. This is where open datasets come into play, offering a valuable resource that can accelerate development and deployment of predictive maintenance solutions.

Problem: demonstrating value and ROI

One of the biggest challenges in implementing predictive maintenance solutions is demonstrating their value and return on investment (ROI) to stakeholders. Decision-makers want to know potential risks and gains before committing resources to full-scale implementation. In consequence, this creates a catch-22 scenario:

to prove the value of predictive maintenance, you need deployed models that accurately predict failures;
to build accurate models, you need sufficient historical failure data;
to collect sufficient failure data from your own equipment could take years, especially for critical but rarely failing components.

This creates a significant barrier to entry, as organizations cannot afford to wait years before seeing any return on their predictive maintenance investments.

Solution: leveraging open datasets in predictive maintenance models

Open datasets related to predictive maintenance offer a compelling solution to this problem. These datasets contain recordings of various equipment conditions – including normal operation and different fault types – that can be used to develop and test predictive maintenance models without waiting for failures to occur in the factory’s equipment.

Several high-quality open datasets are available for rotating equipment, including:

Fordatis Imbalance Dataset: contains vibration data from rotating machinery with various imbalance conditions;
Paderborn University Bearing Dataset: features recordings from rolling bearings under different damage scenarios;
MAFAULDA: the Machinery Fault Database containing various fault types in rotating machinery;
COMFAULDA: a comprehensive fault dataset with multiple sensor types and fault conditions;
IEEE Three Phase Induction motor: a fault dataset for induction motor.

These datasets provide a foundation for developing predictive maintenance solutions, allowing organizations to:

Prototype and validate approaches: test different algorithms and methodologies without operational data;
Build initial models: develop baseline models that can later be fine-tuned with organization-specific data;
Demonstrate value: showcase the potential of predictive maintenance technologies to stakeholders using real-world fault examples.

Finding open datasets

Several GitHub repositories offer comprehensive collections of predictive maintenance datasets:

PredictiveMaintenance-and-Vibration-Resources: A curated repository of datasets, papers, and tools focused on vibration analysis and predictive maintenance
Predictive Maintenance Resources: Collection of datasets and resources specifically for predictive maintenance applications
Awesome Industrial Datasets: A comprehensive list of open datasets for industrial applications, including many relevant to predictive maintenance

These repositories not only provide access to datasets but often include example code, implementation references, and related research papers.

Standardizing data for effective use

A common challenge when working with open datasets is the variety of formats and structures they come in. Each dataset might use different file formats, sampling rates, sensor configurations, and metadata structures.

To effectively leverage these resources, implementing a standardized approach to data management is crucial:

Storage standardization: Converting diverse formats into consistent structures using parquet files for sensor data and JSON for metadata
Runtime processing standardization: Implementing pandas multiindex DataFrames that combine both data and metadata in a structured format. Additionally, we enforce each processing step to take and return data in specified format

This standardization offers several benefits:

Facilitates easier integration of multiple datasets
Enables consistent preprocessing pipelines
Ensures metadata (equipment specifications, operating conditions) is preserved alongside sensor data

Problem: choosing the right analytical approach

Another challenge in predictive maintenance is selecting the most appropriate analytical approach from the wide range of available techniques. The options typically fall into two main categories: analytical or deep learning approaches.

Analytical approaches

Deep learning approaches

Each approach has its strengths and limitations, and the optimal choice depends on factors like data availability, fault types, and deployment constraints.

Effect: accelerated implementation and improved outcomes

By leveraging open datasets in predictive maintenance initiatives, we can achieve several significant benefits:

Domain adaptation through transfer learning

Perhaps the most powerful application of open datasets is in domain adaptation, which follows this progression:

Pre-train models using large, diverse open datasets
Fine-tune with specific open datasets relevant to your equipment type
Further refine with limited organizational data to adapt to your specific operating conditions

This approach dramatically reduces the data requirements from your own operations while still providing models that perform well in your specific context.

Edge vs. cloud deployment considerations

Crucial point in implementing predictive maintenance system it to decide how to deploy our model. There are tradeoffs between edge and cloud deployment:

Edge deployment: Lower latency, operates without connectivity, but has computational constraints
Cloud deployment: Greater computational resources, easier updates, but requires connectivity

By utilizing open datasets for prototyping, you can make more informed decisions for your specific use case deployment strategy.

Summary of benefits

Incorporating open datasets into your predictive maintenance strategy delivers multiple advantages:

Access to valuable datasets: Gain immediate access to diverse failure patterns that might take years to observe in your own operations
Standardized data practices: Establish consistent data handling practices that will benefit your entire data pipeline
Rapid prototyping: Quickly evaluate different modeling approaches without waiting for internal data collection
Effective domain adaptation: Leverage transfer learning to make models relevant to your specific equipment
Balanced approach: Don’t overlook simple correlations and domain knowledge while exploring complex models

Conclusion

Open datasets represent an invaluable resource. By leveraging these datasets effectively – through proper standardization, thoughtful analytical approaches, and strategic domain adaptation – you can accelerate your predictive maintenance journey and demonstrate value much earlier than would otherwise be possible.

While open datasets aren’t a complete replacement for organization-specific data, they provide the foundation needed to overcome the cold-start problem in predictive maintenance. As models initially trained on open data begin making successful predictions, they generate the proof points needed to secure broader organizational buy-in and support for more comprehensive predictive maintenance implementations.

In an era where equipment reliability directly impacts the bottom line, open datasets offer a practical pathway to developing effective predictive maintenance capabilities saving time before seeing first results. For organizations looking to enhance operational reliability while demonstrating clear ROI, open datasets represent an opportunity that should not be overlooked.

Share this Post:

Woman touching an intuitive industrial HMI panel at an assembly line in a factory

HMI panel enabling detailed insights and control in a factory

The HMI application improved visibility into the production quality process:

operators quickly trace issues to specific inspection events, reducing the time spent on fault diagnostics;
process engineers can access historical data e.g. in case of reclamation and improve their quality assurance process;
staff training became easier thanks to the clear, dashboard-style interface;
maintaining high inspection standards became simpler.

Moreover, by automating the pairing of image and PLC data and flagging anomalies early, the system helps in preventing production errors from going unnoticed.

A fully autonomous mower equipped with machine-learning-powered vision system and lidars

Fully autonomous mower project: cost reduction and business development

The Client was left with a strong, developed R&D project just a step away from becoming a fully-fledged startup company. The key effects of our cooperation were:
– BOM cost halved, despite the component price increase;
– fully working autonomy kit for a McConnel S300 slope mower;
– the team gained experience with building industrial partnerships through 2 manufacturing contracts with 2 different European companies…

The power of open datasets in predictive maintenance fault detection

Background

Problem: demonstrating value and ROI

Solution: leveraging open datasets in predictive maintenance models

Finding open datasets

Standardizing data for effective use

Problem: choosing the right analytical approach

Analytical approaches

Domain knowledge-based methods

Statistical techniques

Digital signal processing

Classical machine learning models

Deep learning approaches

Convolutional Neural Networks (CNNs)

Long Short-Term Memory Networks (LSTMs)

Transfer Learning

Transformers