Home
Company
About Expeed
Leadership
Careers
News & Updates
Services
Services In General
At Expeed, we offer a comprehensive suite of services to drive your business forward. Our Application Development creates scalable, reliable web applications. With Data Analytics, we provide actionable insights for strategic decisions. Our Digital Transformation services adopt cutting-edge technologies to improve efficiency and customer experiences. Our UI/UX Design ensures engaging, user-centered digital products. Together, these services help you stay ahead of the competition and achieve long-term success.
Data Analytics
Data Modernization
Business Intelligence
Advanced AI
Application Development
Web App Development
Mobile App Development
DevOps Solutions
Microservices
IoT applications
Digital Transformation
IT Consultancy
Legacy System Modernization
User Experience
UX Heuristics Evaluation
Express Design
UX Realignment
UX Overhaul
Explore Our Technical Blogs
Stay ahead with our expert insights, latest industry trends, and valuable tips on application development, data analytics, digital transformation, and UI/UX design. Check it out now to elevate your knowledge!
Dive Into Our Expert Blogs Now!
Products
Saaslogic
Comprehensive and Secure Launchpad for Saas products
Explore more on Saaslogic
Cargospot
Revolutionizing Track and Trace Shipping Technology
Konnectware
The Ideal IoT Analytics Platform for Your Business
Explore more on Konnectware
Schoolflo
A Complete Management Platform for the Entire Educational Ecosystem
Explore more on Schoolflo
Saaslogic CRM
Transform Your Customer Relationships with Our Advanced CRM Solution
Partnerships
Microsoft
Qlik
Case studies
Blogs
Schedule Consultation

IoT

Get Real-Time IOT Data Analytics using Apache Kafka and Apache Spark

The Need for IoT Data Analytics

In today’s data-driven landscape, the ability to extract meaningful insights from relevant data has become essential for survival. And we don’t mean just for IOT data analytics companies or technology firms.

A vast majority of us interact with one form of an IoT device or application, sharing, communicating, and functioning as a cluster of interconnected lives. In fact, without the access to ‘right information at the right time,’ we’d have entire countries, large business enterprises, and even our very own individual lives spiraling out of control.

The universe of IoT data is huge and quite frankly scary if you think about it in sheer volume. But it is also bursting with such strong potential to transform our lives for the better if leveraged correctly. In this article, we’ll try to demystify the world of IoT analytics and show you how you can tap into real-time IoT data insights using Apache Kafka and Apache Spark.

Tap into the MQTT Telemetry Flow

One of the most essential concepts in IoT analytics is the configuration of various data reception points from different devices. Based on the nature of the endpoints, there are multiple data transportation mechanisms that can be used – and MQTT ((Message Queuing Telemetry Transport) ) is one of the most popular protocols being used currently.

An MQTT is a lightweight, efficient protocol that enables devices to publish their telemetry data. In order to move forward in our case, we’ll start with MQTT and it as the source of our IoT data.

So, step one is to create MQTT connectors. These are basically just MQTT clients that subscribe to specific topics where the IoT device publishes its data. By creating MQTT connectors, we establish real-time access to the telemetry data flowing through the MQTT broker. The connectors usually do not perform analytics. Their job is to cleanse the data and push it to other sources such as Kafka to process and analyze the data.

How to use Apache Kafka to Manage the Data Flow

The next step is to use a distributed messaging infrastructure like Apache Kafka to line up the clean, transformed data for real-time analytics. The reasons for preferring Kafka are two-fold. One, it is powerful enough to handle the high volume of IoT data and ensure a fault-tolerant data flow. Two, Kafka topics, which can be partitioned, enable distributed and parallel processing of incoming data, enhancing the performance of our IoT analytics pipeline.

Contrary to traditional messaging systems, Kafka supports multi-subscribers and automatically balances the data flow to subscribers during failure. In our case, Kafka acts as an intermediary between the MQTT connectors and the IoT analytics platform. Not only does it guarantee seamless data ingestion, but its inherent distributed design also makes it very easy to scale.

Plug into Apache Spark for Real-Time Data Magic

Now that we’ve established a smooth flow of data from devices to MQTT to Kafka, it’s time to add our main analytical engine to the flow. Our first choice for live stream data processing is Apache Spark. Not only does it connect beautifully with Kafka, but it is also known for its continuous data processing capabilities.

In this example, we will integrate with Spark Streaming, which comes with an API for processing real-time streaming data. It is just one of the many libraries Spark offers for data analysis with parallel processing capabilities. This integration opens access to a Spark library that enables us to perform stream analytics on the data coming from Kafka topics or push it down to a database or file system for further processing.

Here in this example, once the connection is established, Spark Streaming receives the cleansed and transformed data and makes them available Dstream format. These Dstreams are then pushed to file systems and databases so that we can perform complex analytics and derive valuable insights.

The Code Snippet

It is now time to look into the code.

We have used Spark 3.0.1. See the below code sample in SCALA.

We have used Maven for dependency management and build. Before starting the code, below are the dependencies used in pom.xml file.

This Pom.xml covers the dependency for Spark Core, Spark Streaming, Spark SQL, Kafka, and Hadoop.

Based on the project requirement additional dependencies can be included.

Let’s start a sample program with the main method.

Here is an explanation of what the above code does.

Main method accepts Kafka topic name as a parameter so that the program dynamically connects to Kafka topic when it starts.
Spark streaming batch size variable is initialized as 10 seconds. (val streamInterval = Spark Streaming is a continuous batch of DStreams. So here it will be a batch of data for every 10 seconds. Every 10 seconds, the Spark processes the data that is received till then as a DStream.
Next, it is creating the Spark config object. See the call – .setMaster(“local[*]”) to run in the local machine. Then create a Spark context by using the Spark config object. Everything run in the Spark program is based on this Spark context created.
In this example, create a streamingContext object to initiate Spark streaming and the interval of streaming is set to 10 seconds.

The next step is to establish a Kafka connection and transform the json data in Dstream to Dataframe.

Define KafkaParams object with the necessary configuration to connect to a Kafka server. Here we specify Kafka bootstrap server IP, Kafka consumer group ID, and key and value deserializer class.
Kafkautil.createDirectStream method is called to subscribe to our Kafka topic. This method collects the message received in Kafka topic and returns it as DStream object.
foreachRDD method is used to read from each batch of DStream and perform the necessary transformation on the data.

Here, each RDD data is in JSON format and it is transformed into Dataframe which matches the structure of JSON. StreamDF.show() will display the current RDD data as Dataframe.

Once the input JSON is transformed into Dataframe, it can be used for further analysis or saved as Dataframe into a file or database. There are multiple options for processing data further, and it is decided based on the particular analytical requirement.

For our example, we are sharing the below code sample to show how to save a Dataframe into an HDFS storage in parquet format. Also, the file can be uploaded into cloud storage like AWS S3, Azure DateLake Storage, and the like.

Below is the sample code to save the Dataframe into a Postgresql table.

Expeed Software is a global software company specializing in application development, data analytics, digital transformation services, and user experience solutions. As an organization, we have worked with some of the largest companies in the world, helping them build custom software products, automate processes, drive digital transformation, and become more data-driven enterprises. Our focus is on delivering products and solutions that enhance efficiency, reduce costs, and offer scalability.

Data Management

AI-Powered Data Management: How Microsoft Copilot Enhances Data Governance and Analytics

June 12, 2025

Software Development

Solving DateTime Issues Between Micronaut and JavaScript Using ISO 8601

June 20, 2025

Secure AI Gateways

Secure AI Gateways: The Smarter Alternative for Enterprises Seeking AI Control, Security, and Impact

June 24, 2025

Ready to transform your business with custom enterprise web applications?

At Expeed Software, we deliver next-generation solutions in Data Analytics, Application Development, Digital Transformation, and User Experience. Partner with us to drive innovation, enhance efficiency, and achieve your business goals.

Services

Important Links

Get In Touch

Phone: +1 (614) 516 0789

Email: info@expeed.com

Address: 8101 N High Street, Suite 180 Columbus, Ohio, 43235

Saaslogic

Cargospot

Konnectware

Schoolflo

Saaslogic CRM