SVM, Decision Trees, Bayesian Method, Forest, Markov Chains aka Machine Learning VS DEEP Learning
In this blog post, we’ll be covering the three W’s (what, when and where) about the SVM(Support Vector Machines), Decision Trees, Bayesian Methods, Markov Chains in comparison with Deep Learning.
Recently, someone asked a very bright question(at least for me, it was bright :))”If deep learning is the future, what is the need for machine learning methods like SVM, decision trees, Bayesian methods, Markov chains, etc.?”
Well, to start off, i would like to talk about “No Free Lunch Theorem” ( No free lunch in search and optimization – Wikipedia ) which states that there is no best algorithm that works best in all cases. So, remember how they are all different and were also created for different use-cases of real-world problems.
For the purpose of this Blog post, i’ll create two sections or team; one which has SVM, Decision Tress, Bayesian Methods, Markov Chains i.e. all Machine Learning Algorithms and second would be Deep Learning.
Other machine learning methods can have greater interpretability and predictability – for instance a decision tree can often give a clear heuristic of why it does what it does.
Deep learning takes a lot of data and training to get reasonable results – if you don’t have lots of data and lots of time and plenty of tweaking – you often won’t get good results with deep learning, but many of the other methods you list can get reasonable results on small data sets with limited training and without much hyper-parameter tuning.
Speed might be more important than accuracy – there are many applications where response time and processor time are constraints – you might be able to get lower response times for good enough accuracy with other machine learning methods.
Benchmarking – it is often useful to throw a number of different machine learning algorithms at a problem for comparison of how difficult the problem is, and how difficult particular items and classes are to learn.
Ensembling – it is often the case that combining predictions from different models gives more robust results than predictions from a single model.
Size of the Data – Data Deep learning doesn’t work well on small data (minus an R&D paper from MIT), and a lot of data out there is small. Deep learning will not converge with 500 observations, but trees and random forest likely will. It also doesn’t generalize well to other problems, so a new architecture needs to be built and tuned if the problem changes even a little bit.
Finally, there are two ways to look at it.
There is a group of people who want to apply deep learning to everything. If things do not work as expected then they will play around with the architecture, and invent new techniques to make it work. There is an immense advantage in the long run because people have a single set of tools to attack a problem.
The other group of people will take the dataset they have and will iterate over various techniques they have in their toolkit. They want to squeeze out the last bit of accuracy from their models because there is a huge benefit of doing that. These group will work on specific application areas such as computational pathology, computational Biology etc.
Then there is a question of the scale of the data available. Deep Learning(DL) being a simpler technique (computationally) compared to some more sophisticated machine learning techniques like SVM, is a natural choice for huge amount of data. Come to think of it DL is a scaled up version of logistic regression (multi-level so to speak).
Hope that helps you understand which model to choose. Again, the answer would be situational !!!