Friday, June 6, 2014

Data Mining, Data Science, and Machine Learning (1)

See the book site for "High-Performance Data Mining and Big Data Analytics: The Story of Insight from Big Data" ( ).

People often ask me about data science and its relationship with data mining science (often just referred to as data mining) and machine learning.  In my book, I provide a viewpoint on this topic. For those like me who have spent their entire career in developing and promoting data mining, machine learning, and data analytics, “data science” is nothing but a new term

The use of machine learning and data mining to create value from corporate or public data is nothing new. It is not the first time that these technologies are in the spotlight. Many remember the late ‘80s and the early ‘90s when machine learning techniques—in particular neural networks—had become very popular. Data mining was at a rise. There were talks everywhere about advanced analysis of data for decision making. Even the popular android character in “Star Trek: The Next Generation” had been named appropriately as “Data.” 

Data mining science has been the cornerstone of many applications for more than two decades, e.g., in finance and retail. However, the popularity of web products from the likes of Google, Linked-in, Amazon, and Facebook has helped analytics become a household name. While a decade ago, the masses did not know how their detailed data were being used by corporations for decision making, today they are fully aware of that fact. Many people, especially the millennial generation, voluntarily provide detailed information about themselves. Today people know that any mouse click they generate, any comment they write, any transaction they perform, and any location they go to, may be captured and analyzed for some business purpose. 

All these have contributed to finally bring analytics to the forefront of many conversations even among regular people. A decade ago, we could not comfortably tell a customer how we anonymously analyze their detail transactions in real-time to protect them even from fraud (See Chapter 9 of this book).