Tag: spark

What orchestrator to use for your ETL jobs?

What orchestrator to use for your ETL jobs?

In today’s data-driven world, organizations are facing the ever-increasing complexity of data pipelines. The need to handle diverse data formats, sizes, sources and processing requirements can result in the need of different types of transformations blocks, ranging from SQL to PySpark to low/no-code solutions. When complex table dependencies also enter the picture, you get high, […]

Kimball in a data lake? Come again?

Kimball in a data lake? Come again?

Most companies are already familiar with data modelling (be it Kimball or any other modelling technique) and data warehousing with a classical ETL (Extract-Transform-Load) flow. In the age of big data, an increasing number of companies are moving towards a data lake using Spark to store massive amounts of data. However, we often see that […]

Transfer learning in Spark for image recognition

Transfer learning in Spark for image recognition

Transfer learning in Spark demystified in less than 3 minutes reading Businesses that want to classify a huge set of images in batch per day can do this by leveraging the parallel processing power of PySpark and the accuracy of models trained on a huge set of images using transfer learning. Let’s first explain the […]

Managed Big Data: DataBricks, Spark as a Service

Managed Big Data: DataBricks, Spark as a Service

The title accompanying this blog post is quite the mouth full. This blog post will explain why you should be using Spark. If a use case would make sense, then we will introduce you to the DataBricks product, which is available on Azure. Being recognised as a Leader in the Magic Quadrant, emphasizes the operational […]