How to Use Teradata Warehouse Miner at the University of Arkansas
By: Sarang • Research Paper • 2,233 Words • May 9, 2010 • 833 Views
How to Use Teradata Warehouse Miner at the University of Arkansas
Teradata Warehouse Miner
How to Use Teradata Warehouse Miner at the University of Arkansas
Teradata University Network (TUN) members – faculty and students—can take advantage of Teradata Warehouse Miner (TWM) housed at the University of Arkansas (UA). This document illustrated using TWM by stepping through a decision tree and an association analysis example. The purpose is not to explain decision trees or other data mining algorithms but rather to focus on how to do a decision tree and association analysis data mining tasks using TWM. Before getting started, a brief overview of data mining is provided.
Data Mining—
Data mining has many definitions and may be called by other names such as knowledge discovery. It is generally considered to be a part of the umbrella of tasks, tools, techniques etc. within business Intelligence (BI). Many corporate managers consider BI to be the heart of all the processes that support decision making at all levels. A definition of data mining typically includes large datasets, discovering previously unknown knowledge and patterns and that this knowledge is actionable. That what is discovered is not trivial but can be usefully applied. BI and its Data Mining component are receiving considerable attention and fanfare as companies utilize BI for competitive advantage.
Different authors may address the data mining tasks slightly different from each other but the following terminology provides a helpful and useful basis for discussing data mining. The data mining tasks are:
• Description
• Estimation
• Classification
• Prediction
• Association Analysis
• Clustering
Description—uses descriptive statistics to better understand and profile areas of interest. Thus a variety of well known statistical tools and methods are used for this task—including frequency charts and other graphical output, measures of central tendency and variation.
Data Mining Tasks with a Target or Dependent Variable
Estimation, classification and prediction are data mining tasks that have a target (dependent) variable. Sometimes these, are referred to as predictive analysis; however, many authors reserve the term Prediction to use of models for the future. The terms supervised and directed apply to these data mining tasks. Estimation data mining tasks have an interval level dependent target variable whereas classification data mining tasks have a categorical (symbolic) target variable. An example of an estimation data mining task would be estimating family income based on a number of attributes; whereas a model to place families into the three income brackets of Low, Medium or High would be an example of a classification data mining task. Thus, the difference between the two tasks is the type of target variable.
When either an estimation data mining task or classification task is used to predict future outcomes, the data mining task becomes one of Prediction. Again, estimation and classification are referred to as predictive models because that would be the typical application of models built for these data mining tasks.
In summary, the most important concept is that estimation and classification data mining tasks require a target variable. However, the difference lies in the data type of the target variable.
Data Mining Algorithms for Directed/Supervised Data Mining Tasks—linear regression models are the most common data mining algorithms for estimation data mining tasks. Of course, linear regression is a very well known and familiar technique. A number of data mining algorithms can be used for classification data mining tasks including logistic regression, decision trees, neural networks, memory based reasoning (k-nearest neighbor), and Naïve Bayes.
Data Mining Tasks without a Target or Dependent Variable
Association Analysis and Clustering are data mining tasks that do not have a target (dependent) variable. Affinity analysis is another term that refers to association analysis and is typically used for market basket analysis (MBA) although association analysis