30 Jun 25 Mark Postgres Hidden Features That Make MongoDB Completely Obsolete (From an Ex-NoSQL Evangelist) PostgreSQL had evolved into a database that could do everything MongoDB could — and it…
5 Jun 25 Mark Optimizing Spark Aggregations: How We Slashed Runtime from 4 Hours to 40 Minutes by Fixing GroupBy Slowness & Avoiding spark EXPAND command. Handling massive datasets efficiently is critical in big data processing, but it’s not uncommon to…
3 May 25 Mark Kafka without ZooKeeper: Building Single and Multi-Broker Clusters in KRaft Mode Introduction In my previous story, we covered the basics of Apache Kafka. In this story,…
26 Apr 25 Mark What is the Future of Apache Spark in Big Data Analytics? Started in 2009 as a research project at UC Berkeley, Apache Spark transformed how data…
24 Apr 25 Mark The end of Docker? The Reasons Behind Developers Changing Their Runtimes Docker once led the container revolution—but times have changed. Developers are embracing faster, leaner, and…
8 Apr 25 Mark Real-Time Use-case : Fraud Detection in Financial Transactions with Kafka and Spark Streaming Let’s face it — fraud is a big problem for financial institutions. With a staggering…
4 Apr 25 Mark Top 10 Java Optimization Techniques for High-Performance Code Java applications can become slow due to inefficient memory usage, unnecessary object creation, or poor…
4 Apr 25 Mark Is Apache Spark Really Dying? Let’s Talk The world of data engineering moves fast. Every few months, a new tool emerges, claiming…
31 Mar 25 Mark Handling Large Data Volumes (100GB — 1TB) in PySpark: Best Practices & Optimizations Processing large datasets efficiently is critical for modern data-driven businesses, whether for analytics, machine learning,…
27 Mar 25 Mark Apache Spark & Airflow in Docker: Step by Step guide If you want to understand the nuances of setting up Apache Spark and Airflow? Or…