KaniniPro – Page 4

KaniniPro

spark

Unique key in spark DataFrame

Published by

Arulraj Gopal

on

July 21, 2025

Creating a unique key within a data pipeline is essential for reliably identifying individual records, especially in scenarios where the source dataset lacks a natural primary key and where record traceability is required in later stages of processing. In distributed processing frameworks like Apache Spark, which operate in-memory and leverage…
Continue reading →: Unique key in spark DataFrame
spark

Different ways of removing duplicates in spark

Published by

Arulraj Gopal

on

July 13, 2025

Removing duplicates in any data processing systems is essential, like other systems spark has some good ways to get rid of duplicates. We will look into the different ways of removing duplicates spark and application of that. Distinct & Drop duplicates. Distinct and drop duplicates are most common ways and…
Continue reading →: Different ways of removing duplicates in spark
spark

Union vs UnionAll in spark

Published by

Arulraj Gopal

on

July 5, 2025

Unlike traditional structured query databases, the difference between union and unionAll in Spark is unusual and not very intuitive. Below is the exercise, Two dataframes created with some of duplicate values. Ideally, in any traditional database union removes the duplicates from both the dataset (ie table) and returns only unique…
Continue reading →: Union vs UnionAll in spark
Data Projects

Stock Price Streaming using Apache Kafka

Published by

Arulraj Gopal

on

July 17, 2024

In today’s fast-paced and highly volatile financial markets, having access to real-time stock quotes is crucial for making informed and precise decisions. Traditional methods of obtaining stock quotes often involve delays, which can lead to missed opportunities. This project aims to develop a robust and scalable pipeline that ingests live…
Continue reading →: Stock Price Streaming using Apache Kafka
Basics of Computers

What is Machine Language?

Published by

Arulraj Gopal

on

July 12, 2024

Is it something that machines speaks? C, C++, Java are machine languages? The language which machine understands is machine language. So, what machine understands? Obviously, it is 0 and 1. Machine understand only digital values. So, if we need to interact with machine, the only way is to communicate is…
Continue reading →: What is Machine Language?

Let’s connect

Recent posts