Friday, October 24, 2014

"The Pale Blue Dot" Effect and Big Data


Many of you may have heard of the “Pale Blue Dot” which is a photograph of planet Earth taken in 1990 by the Voyager One spacecraft when it was leaving the solar system.[1] The picture was taken from a distance of about 3.7 billion miles from the earth. In the photograph, the earth with all its magnificence (i.e., life) only appears as a fraction of a pixel against the vastness of space, hence the name “Pale Blue Dot.” 

This “Pale Blue Dot” effect may well represent insights that can potentially be extracted for some big data explorations. In such contexts, the insight itself may seem very small given the vastness of data collected, processed, and analyzed. However its value could be unimaginable when discovered. For example, some answers to potential cures for diseases may be hidden in DNA sequencing data, but it is extremely difficult and expensive to analyze this data and correlate it with known diseases, given the vastness of the data. However, if an insight is found and is leveraged for a cure, it will have huge value for society as a whole.


See the book site for "High-Performance Data Mining and Big Data Analytics: The Story of Insight from Big Data" (http://bigdataminingbook.info )


[1] Subsequently, the title of the photograph was used by Sagan as the main title of his 1994 book, Pale Blue Dot (Sagan, 1994).

Friday, October 10, 2014

My book titled "High-performance Data Mining and Big Data Analytics: The Story of Insight from Big Data" is published


My book titled "High-Performance Data Mining and Big Data Analytics: The Story of Insight from Big Data" is published.

Order at CreateSpace

Order at Amazon

Here is the Book Site.

Description:
The use of machine learning and data mining to create value from corporate or public data is nothing new. It is not the first time that these technologies are in the spotlight. Many remember the late '80s and the early '90s when machine learning techniques-in particular neural networks-had become very popular. Data mining was at a rise. There were talks everywhere about advanced analysis of data for decision making. Even the popular android character in "Star Trek: The Next Generation" had been named appropriately as "Data." Data mining science has been the cornerstone of many data products and applications for more than two decades, e.g., in finance and retail. Credit scores have been in use for decades to assess credit worthiness of people when applying for credit or loan. Sophisticated real-time fraud scores based on individual's transaction spending patterns have been used since early '90s to protect credit card holders from a variety of fraud schemes. However, the popularity of web products from the likes of Google, Linked-in, Amazon, and Facebook has helped analytics become a household name. While a decade ago, the masses did not know how their detailed data were being used by corporations for decision making, today they are fully aware of that fact. Many people, especially the millennial generation, voluntarily provide detailed information about themselves. Today people know that any mouse click they generate, any comment they write, any transaction they perform, and any location they go to, may be captured and analyzed for some business purpose. 

Every new technology comes with lots of hype and many new buzzwords. Often, fact and fiction get mixed-up making it impossible for outsiders to assess the technology's true relevance. I wrote this book to provide an objective view of analytics trends today. I have written it in complete independence, and solely as a personal passion. As a result, the views expressed in this book are those of the author and do not necessarily represent the views of, and should not be attributed to, any vendor or employer.

Due to the exponential growth of data, today there is an ever increasing need to process and analyze big data. High-performance computing architectures have been devised to address the need for handling big data, not only from a transaction processing standpoint but also from a tactical and strategic analytics viewpoint. The success of big data analytics in large web companies has created a rush toward understanding the impact of new big data technologies in classic analytics environments that already employ a multitude of legacy analytics technologies. There is a wide variety of readings about big data, high-performance computing for analytics, massively parallel processing (MPP) databases, Hadoop and its ecosystem, algorithms for big data, in-memory databases, implementation of machine learning algorithms for big data platforms, and big data analytics. However, none of these readings provides an overview of these topics in a single document. The objective of this book is to provide a historical and comprehensive view of the recent trend toward high-performance computing technologies, especially as it relates to big data analytics and high-performance data mining. The book also emphasizes the impact of big data on requiring a rethinking of every aspect of the analytics life cycle, from data management, to data mining and analysis, to deployment.

As a result of interactions with different stakeholders in classic organizations, I realized there was a need for a more holistic view of big data analytics' impact across classic organizations, and also the impact of high-performance computing techniques on legacy data mining. Whether you are an executive, manager, data scientist, analyst, sales or IT staff, the holistic and broad overview provided in the book will help in grasping the important topics in big data analytics and its potential impact in your organizations.