Posts

Showing posts from May, 2019

Machine Learning component part1

The design choices to be made while building the machine learning server were: 1. Train the model in real time 2. Train the model in batches beforehand and predict in real time 3. Keep samples of data for training. Train and predict in real time The factors that help decide which approach to use are: 1. Volume of training data and How much time does it take to train the model?            If the training takes too long, there is no point in real time model building. We would need to build the model apriori. Load the model in real time and use real time streams as test flows. The machine learning algorithm chosen in this project takes less time to train(~2-5 seconds) , hence we can afford to learn, train and predict in real time. 2. Criticality of accuracy, how much error is reasonable?            If the accuracy obtained is greater than 85%, its reasonable to go ahead and predict using the model. If not we will need ...

Setting up kafka bolt

Once storm topology was set up, there were two ways in which machine learning could be incorporated in the project 1. ML in storm  2. ML outside storm topology In order to choose the design for implementation the factors considered were: 1. Scalability of components            If the storm component is separated from machine learning component, then we can scale each component independently and individually. This factor gives us better flexibility for programming each component. 2. Ease of debugging            Errors that prop up in each individual component will be raised and identified better when the components are disaggregated. We can provide workarounds for the bugs and solve them more efficiently in the suggested setup. 3. Options and research scope that can be explored in each choice             Machine learning in storm has restrictions of its own. The types ...