Securing SDN Network using Distributed Controller and Real time Machine Learning

Posts

Showing posts from March, 2019

Day 70 Part 2 : Getting MVP ready for ML part

March 27, 2019

Todo for this week, > Getting a basic ML DDoS detection algorithm up and running The features extracted so far, 1. Total Inflow Packets 2. Average Number of inflow packets (Total/Number of flows) 3. Total Number of flows 4. Total Packet length 5. Average packet length (Total/Number of packets) Machine Learning algorithm at focus : K means clustering How will the ML algo fit into the project?

Day 70: Demo

March 27, 2019

The below video shows the DoS attack detection, syncing between controllers and how I have decided to mitigate these attacks. The video doesn't show mitigation process correctly, due to few recording issues. I shall upload another video only for the mitigation process shortly. Refer to previous and next posts here. Author: Shravanya Co-author: Swati

Day 68: Syncing between master and slave controllers Part 2

March 27, 2019

We decided in our previous post that we need to keep our slave controller in sync. What is the scenario we are looking at? The master controller gets connected to all switches and contributed to installing flows in them, thus master controller knows all flows in all switches The slave controllers mostly get connected to only switches in their domain and thus whenever a switch migrates from one slave to another slave, the second slave controller has to spend bandwidth and time knowing about the flows installed in it To avoid the overhead that exists during switch migration, we need to ensure that all controllers are in sync with what flows exist in what switches. I had initially decided to follow the below approach: Since the master controller knows all flows at all times, the slave controller would contact the master controller just after switch migration. The master controller which is aware of the switch migration would give necessary details of the flows in the switch for...

Day 67: Syncing between master and slave controllers Part 1

March 27, 2019

Few assumptions stated in my previous posts have been wrongly assumed. The assumptions were: The flows installed in the switches are periodically refreshed or re-installed The flows in the switch are retained in the switch as long as it is connected to the controller After experimenting with the switches, trying to analyze if any timeouts exist on the flows, if the switch configurations are set properly, I realized the following: The flows once installed in the switch remain in it until the switch is turned off The flows remain in the switch even when the controller port is not connected to any controller So, why did I wrongly make the former assumptions? It so happens that my Zodiac FX switches (not all of them, few of them) automatically switch off and back on sometimes. And out of these, only few times the switches retain all flows, few other times they lose out on few or all flows. This appears as an error on my controller DPSET: Multiple connections from <switch...

Day 65: DoS detection from Ryu controller

March 27, 2019

Working on from where we left off in Day 63, we have to start off with writing the code to detect the DoS attack. I shall be considering only 1 feature - that is the bitrate of the packets that go in and out of the switch ports. This has been sufficient enough for a basic thresholding mechanism of identifying DoS attacks. According to my observation, I have set up a 200 bitrate threshold for identifying a DoS attack. This is a little higher threshold cosidering the average bitrate numbers. The reason I have setup a higher bitrate is to avoid false positives. At the same time, I am fine with few DoS attacks geting to my controller side as the controllers are designed to sustain the attacks in a round-robin sort of fashion. As of now, I have left my DoS attack prolong for 5 - 10 minutes without affecting the functionality of my controllers. The ML algorithm being built would take care of more critical details and better identification of DoS attacks. The one built on the Ryu contr...

Day 63: Building the Ryu code for DoS detection

March 27, 2019

As explained in Day 59's post , we shall be continuing the project with detecting DoS attacks on the controller by the switch. We can monitor all switches from the controller and keep observing the bandwidth usages and bitrates of OpenFlow communications between switch and controller. The monitoring happens completely from the controller side. Each controller irrespective of whether it is responsible for installing flows in a particular switch can monitor it. A new TCP communication is established from the Ryu controller to the switch. This TCP connection is first used to get all switch statistics including flows. Then, it monitors the packets that flow in and out of a switch. This is the strategy behind monitoring. Once monitoring is done, we would concentrate on DoS attack detection. We shall do this in the next post. I've borrowed the monitoring code from Ryu controller Github page itself. I found this link helpful in understanding how to go about integrating the code....

Day 62:Building the storm topology

March 24, 2019

Hey guys! Finally finished building the basic storm topology for extracting the following features. It took me 2.5 weeks to finish integrating kafka with and to build the topology. The features extracted so far, 1. Total Inflow Packets 2. Average Number of inflow packets (Total/Number of flows) 3. Total Number of flows 4. Total Packet length 5. Average packet length (Total/Number of packets) What next? I want to set up an ML model for analyzing the features in real time. What are the hiccups along the way? 1. Are the input features enough? What more can be extracted? 2. Outflow features cant be utilized directly since there is a cyclic dependency with the outflow features. 3. Which ML model to build such that it doesn't slowdown the decision process? Should it be supervised or unsupervised? 4. How to improve accuracy once the model is built? I am reading up on existing real time ML models for networks. I shall be updating the solution soon on t...

Day 59: What next!

March 20, 2019

The features built into the architecture so far: high sustainability of controllers when DoS attack occurs - due to switchover mechanism (high availability) avoiding single point of failures in load balancer and master controller dynamic learning of new flows scalability synchronized controllers Thus in case any DoS attack happens from switch to controller, the distributed controller architecture can sustain this attack for a very long time. This was the purpose of the architecture from when we conceived the idea of building it. What other features should we be implement? an ML server - detection and decision regarding malicious packets ( watch out for Swati's blogs regarding this) this ML server must receive packets from load balancer using any pub-sub model like kafka and zookeeper architecture What other features can make our already existing architecture better? As a further enhancement to the architectural design, I have intended to add a component to d...

Day 57: Scaling up the network

March 20, 2019

The controller architecture is now almost complete. It is highly available since the switch-over mechanism takes care of longevity of running controller even under a DoS attack. I tried to drop few packets in the load balancer based on the threshold as decided in the Day 45 's post. This would only mean dropping even legitimate packets and would increase the false positives. Thus I have decided not to pursue a thresholding mechanism before contacting the ML server. Suppose I already decide to drop few packets before contacting the ML server, the data sent to ML server would be incomplete and thus incorrect. Also, if a threshold is decided even before sending the data to the ML server, there will be some bias introduced in the dataset which the ML model wouldn't be aware of. Thus, I am dropping the idea of threshold from the implementation of a secure distributed SDN controller architecture. The only important phase left in the project is to check for scalability. If the ...

Day 55: Back-up master controller

March 20, 2019

From the previous posts we can see that we have achieved a huge amount of the architecture we had set out to build! There was one scenario in the last article, which we have not attended to yet. That is - What if the master controller (which connects to all switches at some point or the other) faces a single point of failure? This is completely probable as master controller is connected to all switches and installs flows in all of them, even the ones that cause DoS attack on the controllers. Thus, we need to protect the master controller from single point of failure. We shall use the same approach as used in Day 45 to protect load balancer from single point of failure. Thus, we are going to need an additional computer in our network architecture that can act as a back-up master controller. We are going to use the keepalived tool for the same. My Github page has the code. You could refer to Day 45 's code and experiment yourself regarding what changes to be made to the code t...

Day 53: Switch-over mechanism implementation on Load balancer

March 20, 2019

The below is the newly implemented code for switch-over and synchronization between controllers: global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy stats socket /var/run/haproxy.stat mode 600 level admin stats timeout 2m user haproxy group haproxy daemon # Default SSL material locations ca-base /etc/ssl/certs crt-base /etc/ssl/private # Default ciphers to use on SSL-enabled listening sockets. # For more information, see ciphers(1SSL). This list is from: # https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/ # An alternative list with additional directives can be obtained from # https://mozilla.githu...

Day 52: Switchover mechanism vs tcpliveplay

March 20, 2019

In Day 50's article we familiarized ourselves with tcpliveplay and how to use the same to redirect OpenFlow packets to the master controller. Yesterday, I tried to create TCP connections from master controller to the switch from which the duplicated packets reached the master controller. These were few findings that made me rethink my tcpliveplay approach: Zodiac FX switch cannot forward packets to the required port if it is not connected to any controller. This is irrespective of whether the flows are already installed in it or not. Whenever a Zodiac FX switch contacts any controller, it informs the controller of all the flows it already has installed in it. Only new flows get installed from then on. Thus the controller automatically becomes aware of the previously installed flows. This does not take perceivable amount of time. The flows installed in the FX switch keep getting updated and this is the reason the FX switch needs to be connected to a controller at all times. F...

Day 50: Tcpreplay and tcpliveplay approach

March 20, 2019

Today we shall be exploring how to copy and redirect the OpenFlow packets to more than one controller from the load balancer. The load balancer is anyway sending the OpenFlow packets to one of the slave controllers - c1 or c2. The packets sent to these two controllers needs to be sent to master controller so the master controller can update the new flows added by c1 or c2 on the switches. This ensures synchronization between controllers. There are two implementation methods to achieve the same: Copy all packets generated from the load balancer to the controller side. Only the packets having destination IP address of c1 and c2 can be redirected to the master controller. Thus we can prevent the packets which already are reaching the master controller to be sent as duplicates Copy and redirect packets from the slave controllers - c1 and c2. We need not take care of duplicate packets reaching master controller. This will reduce the time taken to redirect packets. Since the second a...

Day 49: Kafka and Zookeeper vs tcpreplay

March 19, 2019

So far we've built the basic architecture required. Now, we need to take care of how the switch can talk to not only the slave controller or the hierarchy 2 controller but also the master or the hierarchy 1 controller. The approaches discussed in the previous post include: Pub-sub model which is Kafka and Zookeeper TCPreplay tool to record and replay TCP packets to the master controller. Day 48 was spent completely in trying out Kafka and Zookeeper and analyzing how good of a choice it would be to use the same for our purpose. Kafka + Zookeeper combo acts as messaginf tool. The packets seen on load balancer or any slave controller can definitely reach the master controller. But it would not reach on another port on the master controller. This further needs to be processed through Storm for further analysis. Or another option could be, again redirecting the traffic from the Kafka port to the Ryu port 6633. This seems like too much of an overhead. Thus, I am dropping th...

Day 47: Way Forward - Phase 3 of architecture implementation

March 05, 2019

Coming back to the architecture we have been building, let us look at what has been completed: The switch can communicate with the controller for installing flows This communication happens over a load balancer Load balancer takes care of failover mechanism Load balancer also takes care of switch-controller mapping Load balancer is protected from Single point of failure through keepalived Controllers are protected from crashing by limiting the number of connections they can handle - this is also the responsibility of the load balancer (could be fine tuned) So, the major chunk of work left to complete building a highly available, reliable and secure SDN controller architecture is as follows: We need to take care of assigning master controller a hierarchy 1 by forwarding all packets that enter the controller network We need to protect master controller from single point of failure Strategies to achieve the same: Today, I looked at a publisher-subscriber model which...

Day 46 : Storming the network party!

March 04, 2019

Hi Folks! Let's sidestep from the building the SDN architecture and explore some ways to process the streaming network packets. Yes! This post focuses on building a streaming preprocessor that will feed data into machine learning component. Let's revisit the real time architecture we were building for a quick recap. We have completed the first module, i.e building a kafka pub-sub model to stream real time network packets into a topic. The topic holds the all of the raw data. In order to process it, we need to 1. Build a consumer that will subscribe to the topic and consume the data 2. Process the data to get the required features Let's start with understanding the basics, What Is Storm? Storm is a distributed, reliable, fault-tolerant system for processing streams of data. The work is delegated to different types of components that are each responsible for a simple specific processing task. The input stream of a Storm cluster is handled by a component calle...

Day 45: I managed to keep it alived!!!

March 03, 2019

In one of the previous articles , I had tried to make my keepalived work on my load balancer and terribly failed at it. Once I implemented HAProxy successfully, I realized what the mistake was. In my previous few configurations, my HAProxy was not working properly and thus not receiving any packets. Since my keepalived was dependent on my HAProxy, that too failed to run successfully. These are the below steps to be followed to implement keepalived successfully: > sudo apt-get install keepalived > sudo gedit /etc/keepalived/keepalived.conf Then type the below configuration settings in the open file: (In the master load balancer): global_defs { # Keepalived process identifier lvs_id haproxy_DH } # Script used to check if HAProxy is running vrrp_script check_haproxy { script "killall -0 haproxy" interval 2 weight 2 } # Virtual interface # The priority specifies the order in whi...