A question that often comes along is: "What is the difference between Machine Learning and Data Mining Science (now Data Science)?" Newcomers often confuse these, not mentioning the businesses who try to use these terms to make their products more appealing to their customers and investors. I make an attempt to describe the difference between the two here. For more detail, you can read my new book.
Machine learning and pattern recognition are very close in principles and established fields taught in top universities for a long time, Both try to address the problem of learning with the difference that ML has its roots in computer science while pattern recognition has its roots in engineering. Statistical Learning Theory is a sub-field of ML that focuses on the formalization of the problem of learning, basically focusing more on the theoretical aspects. Statistical learning theory explains why machine learning techniques in general work, something the practitioners had experienced through empirical results. Traditional statisticians have always been skeptics regarding machine learning applications even though nowadays these techniques are a part of many statistical toolsets.
Historically, ML techniques had to deal with solving much more challenging problems and as such, they do not make pre-assumptions about the problem and instead, uses the power of the computer to search and optimize for "a good" solution, often using heuristics. Machine learning is not necessarily in search of the best optimized solution, but "a good solution" and many times it uses heuristics in its approaches. In other words, traditional statistics forces severe assumptions on data to get its best solution, but the solution is only best assuming the correctness of those assumptions. However, dealing with many real-world machine and human-generated data, those pre-assumptions are rarely correct. Hence the resulting solutions can be mediocre at best. ML generally does not seek or claim to find the best possible solution. Often given the complexity of the problems, the best solution may not be achievable or worth the time and resources to find it even if it exists.
Another fundamental pillar of machine learning is the concept of "generalization" and the trade-off between accuracy and robustness. ML solves this using empirical approach of using training/validation/test approach while traditional statistic uses statistical tests on the training data(coupled with tight initial assumptions) to address this.
As a practice, data mining (and data science) has four phases of equal importance:
Business Understanding: Data mining starts from the full understanding of the business problem where business domain knowledge and data mining knowledge both have to be leveraged. The final result of this process is to set an ROI expectation and a formulation of the business problem into a data mining problem.
With the explosion of data and the popularity of analytics and ML in general, all players in the market are using the term "Data Science" and "Data Scientist." The data platform vendors (MPP, NoSQL, Hadoop vendors) use the terms to emphasize the database/data store, and basic analytics aspects. Outside the context of big data, I do not consider basic analytics even close to what a data scientist has to do. Startups and big web companies may emphasize more on the programming requirements aspects of a data scientist.
In my opinion, Data Science is mainly a sexier and more appropriate name for "Data Mining Science." "Mining" does not portray the right image because it is generally associated with dangerous and hard manual labor. Also Data Science as a practice is more focused on creating new products (data products) and tends to be much higher-level in the organization's leadership.
In the third piece of this topic, I try to enumerate what skills are required for a data scientist, what skills one needs in a data science team, and what needs to be taught in a data science curriculum.