Comment
Author: Admin | 2025-04-28
Related papersThe Lure of Statistics in Data MiningJournal of Statistics EducationThe field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived ideas. Here we try to discuss the similarities and differences as well as the relationships between statisticians and data miners. This article is intended to bridge some of the gap between the people of these two communities.Data Mining: Statistics and More?The American Statistician, 1998Data mining is a new discipline lying at the interface of statistics, database technology, pattern recognition, machine learning, and other areas. It is concerned with the secondary analysis of large databases in order to find previously unsuspected relationships which are of interest or value to the database owners. New problems arise, partly as a consequence of the sheer size of the data sets involved, and partly because of issues of pattern matching. However, since statistics provides the intellectual glue underlying the effort, it is important for statisticians to become involved. There are very real opportunities for statisticians to make significant contributions.Statistical Perspectives on Data MiningThis document identifies statistical issues that can be and commonly are important for data mining problems. As far as possible, it will avoid the technical language of mathematical statistics. Key issues for any data analysis are: 1. Why are we undertaking this investigation? 2. What is the intended use of results? 3. What limitations, arising from the manner of collection or from the incompleteness of the information, may constrain that intended use? Finally, when results are presented, the data analyst should be well placed to answer the question: "What is the relevance of these results?" Part I discusses statistical issues and ideas. Examples, most of them taken from published data, highlight the importance of these issues. Remaining parts of the document summarize important results and concepts from classical statistics. More recent data mining methodologies supplement rather than
Add Comment