6. In many problems, one has to deal with transactional data requiring creation of time-based variables that provide the learner with a short and long term memory of the past behavior of the entities to model. Depending on the problem, such variables need to be computed and updated from the transaction history of each entity and for every transaction in realtime or at specific time intervals.
7. For a fixed model complexity, as the number of rows (observations) in ADS increases, the training and test errors converge.
8. For some problems, all population must be represented in an ADS (e.g., social net analysis, long tail problems, high cardinality recommenders, search). For all other problems, sampling
continues to be valid. For a subset of these problems, sampling is mandatory, e.g., highly unbalanced datasets, segmented modeling, micro-modeling, and campaign groups. For the remainder, it is optional but not a limiting factor anymore. Historically for these problems, sampling had to be done to speed up the processing or to reduce the storage cost.
9. For big data, it is desired to use the same platform and interface for data understanding, preparation, and model development with minimal data movement and least iterations through the data to get to the result.
10. In the transition from model development to deployment, automatic code generation for computation of variables and models is of high importance to ensure quality control. Automatic code generation is mandatory in applications that require a large number of models.
- Building models on-the-fly based on each SKU (millions of them) based on store characteristics for one the largest retailers,
- And tens of other applications in customer experience, risk, marketing, and fraud. Whatever machine learning or pattern recognition technique was used, the above always held true.
For more information and orders, visit the book site for "High-Performance Data Mining and Big Data Analytics: The Story of Insight from Big Data" (http://bigdataminingbook.info ).