Process Analytics in the Age of Big Data and the Internet of Things
What is Process Analytics?
The primary goal of process analytics is to discover, monitor and improve processes of any kind . Process analytics is thereby often used synonymously with the term process mining. Personally, however, I prefer process analytics as it reflects a broader perspective and helps to think beyond classical data mining problems.
Process analytics is a rather young field that combines many disciplines (see Figure 1). First, there are business-oriented topics like Business Process Management (BPM). A major element here is process models that explicate and document processes. There are numerous process modeling techniques, and many of them are around for a long time like flow charts, Petri nets or Gantt charts. However, today’s process modeling languages like UML or BPMN are more advanced and come with great functionalities to model even very complex structures .
However, process analytics encompasses more than just process modeling. By combining approaches from business intelligence and analytics (BIA), it overcomes limitations of abstract process models and makes them come alive with data. In this context, BIA brings in a broad set of useful approaches, such as data integration and and ETL (extract, transform and load), which can be used by process analytics to integrate data sources throughout a process. Similarly, process analytics can use data mining and visual analytics to generate useful insights and generate predictive and optimization models that improve business processes.
Figure 1. Process analytics combines many disciplines
Three Approaches to think about Process Analytics
At a high level, there are three major approaches to process analytics (see Figure 2). The first one is to run a process model (“play-out”) across multiple business scenarios and generate data to help find bottlenecks and process improvements.
Conversely, existing process data can be examined to find patterns that imply processual structures and abstract them in new process models. This concept of process discovery somehow represents the search of desire lines in data that reveal how processes really look like .
Finally, if you have a process model and related process data, it is possible to run the data through the process model (“replay”) to validate the process and derive information about certain scenarios, such as exceptions where the data shows that the actual process deviates from the envisaged process model.
Figure 2. Abstract scenarios of process analytics 
These scenarios are just abstract ways to start thinking about process analytics. Real-life scenarios usually mix these approaches and combine them with other methods, e.g., discovering process models and then replaying test data on it to derive a predictive model.
The relation of Big Data, the Internet of Things and Process Analytics
Big data and the Internet of Things (IoT) primarily affect the available data that can be used for process analytics, e.g. with massive amounts of sensor data coming from IoT components. Besides, IoT sensor data helps to bridge the gaps between virtual and physical parts of business processes.
For instance, in classical supply chains there were always a lot of gaps and inconsistencies in the data due to digital blind spots. IoT components enriched with RFID-tags or other “smart” technology, in contrast, can continuously track their status, position and other data throughout an entire supply chain. Besides an uninterrupted view of a process, such smart objects can even provide additional information (e.g., about their environment) that provides many benefits. If you think of a pharmaceutical supply chain for example, shippers can use real-time data from RFID-enabled drug packages to record temperature, humidity or other environmental factors to better ensure product quality.
Considering all this new data, process analytics quickly becomes a Big Data use case – with all its challenges. For instance, there is a need for an adequate infrastructure to store and process the large data sets. This becomes even more challenging when processes are highly decentralized and span many physically and logically distributed systems. Besides, more data also increases the chances of nonsense correlation and makes cleaning and processing more complex.
The use cases below illustrate applications of process analytics in the context of Big Data and the Internet of Things.
- Real Time Process Monitoring and Operational Support
One obvious use case for process analytics is real-time process monitoring. The idea is not entirely new and has been discussed comprehensively under the term business activity monitoring (BAM). BAM seeks to observe process activity and performance with dashboards that show certain KPIs and process events. Classical application domains of BAM are the banking and insurance sectors, telecommunications, or logistics, but the advancing digitalization of the world allows the adaption of this concept to new processes that are rather of physical nature.
One great example is agriculture, where processes generate more and more data thanks to new tools and machines with digital interfaces, sensor data and other trends like open data. In this way, it becomes possible to digitally track processes and visualize them in dashboards that show their status and support operational decisions like “when to harvest depending on resource allocation, business data and external data (e.g., weather)”. Obviously, agriculture can also be used to illustrate process discovery based on sensor data (e.g., data coming from various data sources on a farm like tractors) or process confirmation and prediction that we will further discuss below.
- Process Prediction Models
Predictive analytics is a heavily discussed topic today  and is clearly connected to process analytics. The underlying idea is to come up with prediction models to forecast how a process (resp., entities in the processes) will behave at a certain time or under certain conditions. One way to come up with the necessary prediction models is to replay event logs on existing process models to identify various scenarios and abstract them in mathematical models.
An often-discussed application in this context is predictive maintenance. In industrial manufacturing processes, for example, it is possible to predict when machines in a process will fail or exceed a certain quality threshold, based on data gathered from sensors during the process. A company can use that information to send out service workers to fix potential problems before they arise and thereby avoid downtime or defective goods.
- Big Data process discovery
A very powerful part of process analytics is the exploration and assessment of entire new processes in organizations. New IoT data sources and their large data sets provide a completely new way of doing this almost automatically.
However, even though the basic idea of process discovery is quite simple (finding process patterns based on event logs that usually contain a timestamp), it is very hard in practice to separate the signal from noise, incompleteness and other challenges. However, there are tools  and algorithms that can help with that.
A great illustration for process discovery is the exploration of workflows in hospitals based on a data set containing location and status of patients at certain timestamps . The information can be used to confirm existing process models and optimize processes, e.g., by eliminating bottlenecks. IoT can elaborate this scenario by adding many new data sources like patients’ vital data, consumed power during a treatment, machine logs of medical devices and many others. With all this, it becomes feasible to get a holistic 360° view of workflows and identify new ways to optimize processes and resource consumption.
Process analytics is powerful because it combines the idea of process modeling with methods from data analytics. The advancing convergence of the physical and digital world give the topic an additional momentum by expanding its scope to numerous new use cases.
In my opinion, thinking in processes is also great because considering business processes often helps to ensure the relation of an analytics scenario to its actual business value. Furthermore, many upcoming analytics scenarios take place in complex ecosystems that span over many systems and organizations, and the only common denominator is often a process.
Nevertheless, process analytics also pose new requirements to BIA landscapes. On the one hand, it fuels the Big Data debate, for instance, with the needed data storages for large and complex event data sets. On the other hand, integration becomes even more important and very challenging, especially if systems are heavily distributed and a real-time integration is necessary. Consequently, process analytics requires agile architectures that can flexibly adapt to arbitrary processes, e.g., ones based on data virtualization .
 van der Aalst, Wil. et al. (2011): Process Mining Manifesto, http://link.springer.com/chapter/10.1007/978-3-642-28108-2_19
 Process Mining: Desire Lines or Cow Paths? http://wwwis.win.tue.nl/~wvdaalst/old/etc/desire-lines-or-cowpaths.htm
 van der Aalst, Wil. (2016): Process Mining: Data Science in Action.
 Smith, S. J. (2016): What is “Predictive Analytics”? http://eckerson.com/articles/what-is-predictive-analytics
 E.g. http://www.fluxicon.com/
 Lismont, J. et al. (2001): A guide for the application of analytics on healthcare processes, http://dl.acm.org/citation.cfm?id=3009306
 Ereth, J. (2016): Data Virtualization in Business Intelligence and Analytics - http://eckerson.com/articles/data-virtualization-in-business-intelligence-and-analytics