Tag: python

What orchestrator to use for your ETL jobs?

What orchestrator to use for your ETL jobs?

In today’s data-driven world, organizations are facing the ever-increasing complexity of data pipelines. The need to handle diverse data formats, sizes, sources and processing requirements can result in the need of different types of transformations blocks, ranging from SQL to PySpark to low/no-code solutions. When complex table dependencies also enter the picture, you get high, […]

Feature Store

Feature Store

Everyone who has already come in touch with data science, has already heard of features used in such models. One aspect that can become quite challenging, is reusing features in a consistent way, across several team members, projects and in environments. In this article, I will explain the most commonly used way to resolve these […]

Pandas, Koalas and PySpark in Python

Pandas, Koalas and PySpark in Python

If you landed on this page to learn more about animals, I have to disappoint you. Pandas, Koalas and PySpark are all packages that serve a similar purpose in the programming language Python.  Python has increasingly gained traction over the past years, as illustrated in the Stack Overflow trends. Originally designed as a general purpose […]

Process Mining: Understanding Simple Process Discovery Techniques using Python

Process Mining: Understanding Simple Process Discovery Techniques using Python

Hi and welcome to this blog on process mining Process mining is a set of techniques used in the field of process management and improvement which supports the analysis of processes based on event logs. Process Mining is able to fire different algorithms on a certain event log to identify patterns and trends in your […]