[Data Mining] Basics Summary – Data Types, Analysis Methods

By | Y2014Y2014-10M-D

Data Types, Analysis Methods

* Data Mining > Explaining the Past

* Data Mining > Explaining the Past > Data Exploration > Univariate Analysis

  1. Categorical Variables:
    A categorical or discrete variable is one that has two or more categories (values).

    1.  Types
      1. Nominal: No intrinsic ordering to its categories (e.g.: Gender – male/female)
      2. Ordinal: Variables  those have clear ordering(ex:Temperature as a variable with three orderly categories – low, medium and high)
    2. Methods
      1. Count
      2. Count%
  2. Numerical (Continuous) Variables:
    A variable (attribute) that may take on any value within finite or infinite interval
    (e.g.: height, weight)

    1. Types
      1. Interval: Variable that has values whose differences are interpretable but  it doesn’t have a true zero (e.g.: Temperature in Centigrade degrees)
      2. Ratio: Variable that has  value with a true zero and can be added, subtracted, multiplied or divided(e.g.: weight)
    2. Methods
      1. Minimum/Maximum/Mean/Median/Mode
      2. Quantile/Range/Variance/Standard Deviation/Coefficient of Deviation
      3. Skewness/Kurtosis
      4. Histogram/Boxplot

* Data Mining > Explaining the Past > Data Exploration > Bivariate Analysis

  1. Categorical & Categorical:
    1. Methods
      • Stacked Column Chart
      • Combination Chart
      • Chi-square Test
  2. Categorical & Numerical:
    1. Methods
      • Line Chart with Error Bars
      • Combination Chart
      • Z-test and t-test
      • Analysis of Variance (ANOVA)
  3. Numerical & Numerical:
    1. Methods
      • Scatter Plot
      • Linear Correlation

Data Mining > Predicting the Future

*Data Mining > Predicting the Future > Modeling
Predictive Modelling is the process by which a model is created to predict an outcome.

  1. Classification: If the outcome is  CATEGORICAL
    1. Frequency Table
    2. Covariance Matrix
    3. Similarity Functions
    4. Others
  2. Regression: If the outcome is NUMERICAL
    1. Frequency Table
    2. Covariance Matrix
    3. Similarity Function
    4. Others
  3. Clustering: is the assignment of observations into clusters so that observations in the same cluster are similar
    1. Hierarchical
    2. Partitive
  4. Association: can find interesting associations among the observations
    1. AIS Algorithm
    2. SETM Algorithm
    3. Apriori Algorithm
    4. AprioriTid Algorithm
    5. AprioriHybrid Algorithm

 

1,277 total views, 1 views today

댓글 남기기