Abstract
Data scientists use statistical models and methods along with algorithmic (machine-learning) approaches to solve problems of classification, forecasting, pattern recognition, inference and interpretation of results. Practical difficulties include dealing with enormous datasets with complex structures requiring substantial computational support. Moreover, when applied in real-world business, the challenges for data science multiply as practice induces an additional level of complexity requiring solutions optimized with respect to cost, time, and other specific regulatory, financial or environmental constraints. Here it is important to complement scientific methods with common sense approach, practical heuristics and judicious decision making during all important phases of developing data driven solutions. We focus on two complementary yet representative data science topics: (1) a problem of periodic automated sales forecasting for a large number of industrial products where we survey forecasting models and methods along with practical procedures for evaluation of their predictive performance and weigh empirical evidence of their relative merits in model selection, and (2) a problem of cluster analysis as a basis for personalization and segmentation of customers and products in a typical e-commerce application. Disclaimer: This article expresses personal views and opinions of the authors, which may not coincide with policies or positions of their employers. Examples were chosen for the purpose of illustrating main ideas and do not refer necessarily or directly to their work on any specific project.
Original language | English |
---|---|
Pages (from-to) | 468-481 |
Number of pages | 14 |
Journal | Applied Methods of Statistical Analysis |
Publication status | Published - 2019 |
Externally published | Yes |
Event | 5th International Workshop on Applied Methods of Statistical Analysis. Statistical Computation and Simulation, AMSA 2019 - Novosibirsk, Russian Federation Duration: 18 Sep 2019 → 20 Sep 2019 |
Keywords
- angular distance measure
- clustering
- forecasting intermittent demand
- personalization
- segmentation