A Tour in Process Mining: From Practice to Algorithmic Challenges
Petri Nets 2018 in Bratislava, Slovakia
by Wil van der Aalst, Josep Carmona, and Thomas Chatain
Tuesday, June 26th, Austria Trend Hotel Bratislava, Vysoka 2A, 811 06 Bratislava, Slovakia
Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis. Process mining bridges the gap between traditional model-based process analysis (e.g., simulation and other business process management techniques) and data-centric analysis techniques such as machine learning and data mining. Process mining seeks the confrontation between event data (i.e., observed behavior) and process models (hand-made or discovered automatically). This technology has become available only recently, but it can be applied to any type of operational processes (organizations and systems). Example applications include: analyzing treatment processes in hospitals, improving customer service processes in a multinational, understanding the browsing behavior of customers using booking site, analyzing failures of a baggage handling system, and improving the user interface of an X-ray machine. All of these applications have in common that dynamic behavior needs to be related to process models.
Hence, process mining considers the discovery of process models from real process executions. Discovered models may deviate from reality, and therefore a very important functionality in process mining, as important as discovery, is the automatic assessment of the quality of a process model in representing the reality, a discipline known as conformance checking. Taken together, discovery and conformance checking offer a powerful toolbox to organizations for improving their processes.
Process mining provides not only a bridge between data mining and business process management; it also helps to address the classical divide between “business” and “IT”. Evidence-based business process management based on process mining helps to create a common ground for business process improvement and information systems development.
In the research community related to the topics of the Petri Nets conference, process mining can be the killer application for many other disciplines such as formal methods, concurrency and distributed systems. In particular, the use of Petri nets has grown considerably due being the most popular representation for process mining algorithms.
The motivation of this tutorial is to provide an introductory tour to the field, and then the necessary background and practice so that an attendant can understand the current challenges the field of process mining is facing nowadays. Unlike some of the current textbooks or online courses for process mining, the tutorial will pay special attention to the conformance checking dimension, where some interesting challenges can be addressed.
Module I: A Practical Introduction to Process Mining (1h) – Introduction to process mining with demos of commercial tools (highlighting the incredible relevance and the limitations of existing tools). (WvdA, 9.15-10.15)
- Module II: Discovering Process Models (2h) – Process discovery with inductive mining as an example of scalable discovery technique. Process discovery with region-based approaches. (WvdA-JC, 10.45-12.45)
- Module III: The Challenge of Alignments (1.5h) – Complexity issues for relating observed and modeled behavior. Formal definition of alignments. Selected techniques for the computation of alignments. Alignments applications. (JC, 14.15-15.45)
Module IV: Evidence-based Quality Metrics for Process Models (2h) – Current metrics for measuring the quality of process models with respect to observed behavior. Limitations of some of the discovery approaches. (TC-JC, 16.15-18.15)
Preparing for the course
To get most out of the course it is recommended to download software and event logs before the course starts. This is not mandatory, but will increase the likelihood that you can apply process mining in your own research or organization afterwards!
- Download ProM Lite 1.2 from http://www.promtools.org/doku.php?id=promlite12.
- Download some datasets to be able to apply the algorithms discussed in the tutorial
- Start from the simple ones on http://www.processmining.org/logs/start.
- Then download event logs from http://data.4tu.nl/repository/collection:event_logs (these include several real-life event logs, note that some logs may be too complex or too large for an introduction to process mining).
- It is also recommended to play with some commercial tools. Many have an academic program. Two examples:
- Disco (Fluxion) is an easy to use tool and will help to lower the threshold to get started on real-life data. Check out their academic program on https://fluxicon.com/academic/. Disco free of charge for students and research staff members of partner universities.
- Celonis is a more extensive process mining tool that also allows for conformance checking, dashboards, data mining, etc. Check out their academic program on http://www.celonis.com/en/academic-alliance/. Also Celonis is free of charge for students and research staff members.
- To get deeper into the topic, also after the course:
- Register for the free-online Coursera course on process mining (https://www.coursera.org/learn/process-mining). This course has been taken by over 100.000 participants. The course “Process Mining: Data science in Action” provides exercises, software, datasets, etc.
- Read the Springer book “Process Mining: Data science in Action” (http://www.springer.com/gp/book/9783662498507). The book is available on-line via Springerlink in most institutes. Selected parts of the book can be obtained by registering for the Coursera course with the same title.
Bring your own data
By following the above steps, you will see that it is easy to get started with process mining. Even better you can immediately apply process mining to your own datasets. Participants are invited to bring their own data. During and after the tutorial, we will be happy to help you to analyze your own data. Please take a look at the slides http://www.processmining.org/_media/presentations/event_logs_the_input_for_process_mining.pdf to see what the data should look like. A simple CSV file with three columns will do the job!
Wil van der Aalst
RWTH Aachen University, web page: www.vdaalst.com
Prof.dr.ir. Wil van der Aalst is a full professor at RWTH Aachen University leading the Process and Data Science (PADS) group. He is also part-time affiliated with the Technische Universiteit Eindhoven (TU/e). Until December 2017, he was the scientific director of the Data Science Center Eindhoven (DSC/e) and led the Architecture of Information Systems group at TU/e. Since 2003, he holds a part-time position at Queensland University of Technology (QUT). Currently, he is also a visiting researcher at Fondazione Bruno Kessler (FBK) in Trento and a member of the Board of Governors of Tilburg University. His research interests include process mining, Petri nets, business process management, workflow management, process modeling, and process analysis. Wil van der Aalst has published over 200 journal papers, 20 books (as author or editor), 450 refereed conference/workshop publications, and 65 book chapters. Many of his papers are highly cited (he one of the most cited computer scientists in the world; according to Google Scholar, he has an H-index of 136 and has been cited over 80,000 times) and his ideas have influenced researchers, software developers, and standardization committees working on process support. Next to serving on the editorial boards of over ten scientific journals, he is also playing an advisory role for several companies, including Fluxicon, Celonis, Processgold, and Bright Cape. Van der Aalst received honorary degrees from the Moscow Higher School of Economics (Prof. h.c.), Tsinghua University, and Hasselt University (Dr. h.c.). He is also an elected member of the Royal Netherlands Academy of Arts and Sciences, the Royal Holland Society of Sciences and Humanities, and the Academy of Europe. In 2017, he was awarded a Humboldt Professorship.
Universitat Politècnica de Catalunya, Barcelona, Spain, web page: http://www.cs.upc.edu/~jcarmona/
Josep Carmona is an associate professor at the Department of Computer Science at Universitat Politècnica de Catalunya. In 2004, he received a PhD. in Computer Science from the same university. He is a member of the ALBCOM research group, a multidisciplinary group that holds a distinction from the Government of Catalunya. Furthermore he is a founding member of the IEEE Task Force on Process Mining. Josep published around 80 articles in journals, such as Data Mining and Knowledge Discovery, IEEE TKDE, IEEE Transactions on Computers, Information Systems, and highly competitive conferences like ECML/PKDD, BPM, ATVA, EMNLP, LREC, DAC and ICCAD. He served as PC Co-chair of the ACSD 2011 in Newcastle, and organizes the ATAED workshop since 2011. Josep is the General Chair of the BPM conference in 2017, where he also serves as a PC Co-chair. He co-organizes the Process Discovery Contest.
LSV/ENS Paris-Saclay, Cachan, France. web page: http://www.lsv.fr/~chatain
Thomas Chatain is an associate professor at ENS Paris-Saclay, France. He received his PhD in 2006 from University of Rennes I (France) and did a postdoc at Aalborg University (Denmark). His research focuses on formal methods for design, verification, control and supervision of distributed and real-time systems.