Day 46 : Storming the network party!

Hi Folks!

Let's sidestep from the building the SDN architecture and explore some ways to process the streaming network packets.
Yes! This post focuses on building a streaming preprocessor that will feed data into machine learning component. Let's revisit the real time architecture we were building for a quick recap.


We have completed the first module, i.e building a kafka pub-sub model to stream real time network packets into a topic. The topic holds the all of the raw data. In order to process it, we need to
1. Build a consumer that will subscribe to the topic and consume the data
2. Process the data to get the required features

Let's start with understanding the basics,

What Is Storm?

Storm is a distributed, reliable, fault-tolerant system for processing streams of data. The work is delegated to different types of components that are each responsible for a simple specific processing task.
The input stream of a Storm cluster is handled by a component called a spout. The spout passes the data to a component called a bolt, which transforms it in some way.
A bolt either persists the data in some sort of storage, or passes it to some other bolt. You can imagine a Storm cluster as a chain of bolt components that each make some kind of transformation on the data exposed by the spout.
The tap here represents spout.
The lightning here represents bolt.
Both emit data in the form of  tuples.

Let's focus on the first component now,

Since we have a kafka pub sub model, which needs to be integrated with Storm topology, we need to introduce a kafka + spout(quite literally, a KAFKA SPOUT) as the consumer of topics. This kafka spout will further act as the data source for the storm topology.
To put it simply,
Reference :  https://dzone.com/articles/storm-kafka-integration-with-configurations-and-co

The storm cluster here, forms a storm topology.

In the next post. we shall explore what the topology will consist of, how many bolts it will contains, and what each bolt will do.

Comments

  1. Very easily understandable :)
    One question that pops to the head: what sort of transformation should be performed on the data on the bolts? Is this the component which extracts the 8 features you had mentioned in your previous post?

    ReplyDelete
    Replies
    1. Yes. Bolts are primarily used for processing. Here, we could use it initially to preprocess data and get it into the right format before feeding it into ML algorithm

      Delete

Post a Comment

Popular posts from this blog

Day 12: Master Slave SDN Controller Architecture

Day 50: Tcpreplay and tcpliveplay approach

Day 1: Understanding Ransomware and how to detect them?