multivariate data analysis

46
Multivariate Data Analysis SETIA PRAMANA

Upload: setio-pramono

Post on 26-Jun-2015

475 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Multivariate data analysis

Multivariate Data AnalysisSETIA PRAMANA

Page 2: Multivariate data analysis

Survival Data Analysis 2

Course Outline Introduction

◦ Overview of Multivariate data analysis◦ The applications

Matrix Algebra And Random Vectors Sample Geometry Multivariate Normal Distribution Inference About A Mean Vector Comparison Several Mean Vectors

Setia Pramana

Page 3: Multivariate data analysis

Survival Data Analysis 3

Course Outline Principal Component Analysis Factor Analysis Cluster Analysis Discriminant Analysis Canonical Correlations

Setia Pramana

Page 4: Multivariate data analysis

Survival Data Analysis 4

Course Workload 40% Theory, 60% practice Group Project (4 students) Group Presentation in ENGLISH every week Software used is mainly R, others are allowed R code would be provided Slides can be seen at : http://www.slideshare.net/hafidztio/

Setia Pramana

Page 5: Multivariate data analysis

Survival Data Analysis 5

Reference Books

Setia Pramana

Page 7: Multivariate data analysis

Data Types

Page 8: Multivariate data analysis

Type of Analysis

Page 9: Multivariate data analysis

Type of Analysis

Page 10: Multivariate data analysis

What is Multivariate? Univariate Analysis? Some describe it as: any statistical technique used to analyze data that arises from more than one variable Multivariable vs. Multivariate Analysis http://www.youtube.com/watch?v=KhA_PCMPZZo

Page 11: Multivariate data analysis

Example of MV Data

Page 12: Multivariate data analysis

Other Examples?

Page 13: Multivariate data analysis

What is Multivariate Data Analysis?

The statistical analysis of the data collected on more than one (response) variable. We want to analyze them simultaneously The variables may be correlated with each other The dependence is taken into account More complex univariate analysis In the real world, most data are multivariate data Basic Statistical Analysis for Data Mining

Page 14: Multivariate data analysis

Types of MVA Exploratory Data Analysis (EDA): Sometimes called data mining this area is useful for gaining deeper insights into large, complex data sets.

Regression analysis: Develops models to predict new and future events. Is useful for predictive analytics applications.

Classification for identifying new or existing classes: This area is useful in research, development, market analysis, etc.

Page 15: Multivariate data analysis

MVD objectives1. Data reduction or structural simplification. To simplify without

loosing any valuable information and make interpretation easier.

2. Sorting and grouping. Similar objects or variables are grouped, based upon the characteristics. Define rules for classifying objects into well-defined groups.

3. Investigation of the dependence among variables. The nature of the relationships among variables is of interest. Are all the variables mutually dependent/ independent?

Page 16: Multivariate data analysis

MVD objectives4. Prediction. Relationships between variables must be determined for the purpose of predicting the values of one or more variables on the basis of observations on the other variables.

5. Hypothesis construction and testing. Specific statistical hypotheses, formulated are tested.

Page 17: Multivariate data analysis

Examples of Multivariate Data http://www.youtube.com/watch?v=eEpxN0htRKI

Page 18: Multivariate data analysis

Software1. SAS

2. R

3. SPSS

4. Herodes

5. etc….

Page 19: Multivariate data analysis

Applications Petrochemical and refining operations, including early fault detection and gasoline blending and optimisation Food and beverage applications, particularly for consumer segmentation and new product development Agricultural analysis, including real-time analysis of protein and moisture in wheat, barley and other crops Business Intelligence and marketing for predicting changes in dynamic markets or better product placement Oil and gas and mining, including analysis of machinery performance and locating new sources of commodities

Page 20: Multivariate data analysis

Applications Data reduction or simplificationUsing data on several variables related to cancer patient responses to radiotherapy, a simple measure of patient response to radiotherapy was constructed.Multispectral image data collected by a high-altitude scanner were reduced to a form that could be viewed as images (pictures) of a shoreline in two dimensions.Data on several variables relating to yield and protein content were used to create an index to select parents of subsequent generations of improved bean plants.

Page 21: Multivariate data analysis

Applications Sorting and grouping • Data on several variables related to computer use were employed to create clusters of categories of computer jobs that allow a better determination of existing (or planned) computer utilization.

• Measurements of several physiological variables were used to develop a screening procedure that discriminates alcoholics from nonalcoholics.

• Data related to responses to visual stimuli were used to develop a rule for separating people suffering from a multiple-sclerosis-caused visual pathology from those not suffering from the disease.

Page 22: Multivariate data analysis

Applications Investigation of the dependence among variables • Data on several variables were used to identify factors that were responsible for client success in hiring external consultants.

• Measurements of variables related to innovation, and variables related to the business environment and business organization, on the other hand, were used to discover why some firms are product innovators and some firms are not.

• Measurements of pulp fiber characteristics and subsequent measurements of characteristics of the paper made from them are used to examine the relations between pulp fiber properties and the resulting paper properties. The goal is to determine those fibers that lead to higher quality paper.

Page 23: Multivariate data analysis

Applications Prediction • The associations between test scores, and several high school performance variables, and several college performance variables were used to develop predictors of success in college.

• Data on several variables related to the size distribution of sediments were used to develop rules for predicting different depositional environments.

• Measurements on several accounting and financial variables were used to develop a method for identifying potentially insolvent property-liability insurers.

• cDNA microarray experiments (gene expression data) are increasingly used to study the molecular variations among cancer tumors. A reliable classification of tumors is essential for successful diagnosis and treatment of cancer.

Page 24: Multivariate data analysis

Applications Hypotheses testing • Several pollution-related variables were measured to determine whether levels for a large metropolitan area were roughly constant throughout the week, or whether there was a noticeable difference between weekdays and weekends.

• Experimental data on several variables were used to see whether the nature of the instructions makes any difference in perceived risks, as quantified by test scores.

• Data on many variables were used to investigate the differences in structure of American occupations to determine the support for one of two competing sociological theories.

Page 25: Multivariate data analysis

Other Applications? In Group, discuss multivariate data on:

1. Biomedical

2. Economic

3. Government Policy

4. Health

5. Social

6. Demography

7. Business

8. Telecommunication

9. Education

10. Psychology

Page 26: Multivariate data analysis

Data Structure

Page 27: Multivariate data analysis

Descriptive Statistics

Page 28: Multivariate data analysis

Descriptive Statistics

Page 29: Multivariate data analysis

Descriptive Statistics

Page 30: Multivariate data analysis

Descriptive Statistics

Page 31: Multivariate data analysis

Visualization: Two-Dim Scatter Plots

Page 32: Multivariate data analysis

Visualization: Two-Dim Scatter Plots

Page 33: Multivariate data analysis

Visualization: Growth Curves

Page 34: Multivariate data analysis

Visualization: Growth Curves

Page 35: Multivariate data analysis

Visualization: Stars

Page 36: Multivariate data analysis

Visualization: Stars

Page 37: Multivariate data analysis

Visualization: Chernoff Faces

Page 38: Multivariate data analysis

Chernoff Faces

Page 39: Multivariate data analysis

Visualizations

Page 40: Multivariate data analysis

Other Visualizations

Page 41: Multivariate data analysis

Other Visualizations

Page 42: Multivariate data analysis

Other Visualizations

Page 43: Multivariate data analysis

Distance

Page 44: Multivariate data analysis

Distance

Page 45: Multivariate data analysis

Next Week: Matrix Algebra

Page 46: Multivariate data analysis