an overview and exploration of jmp - iasri.res.in · an overview and exploration of jmp – a data...

12
AN OVERVIEW AND EXPLORATION OF JMP – A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE A.P. Ruhil and Tara Chand National Dairy Research Institute, Karnal-132001 JMP commonly pronounced as “Jump” is a statistical software that gives graphical interface to display and analyze data interactively. JMP is a standalone application and was conceived, developed and started by John Sall in October, 1989. JMP is not a part of the SAS System, though portions of JMP were adapted from routines in the SAS System, particularly for linear algebra and probability calculations. It is supported on Windows, Macintosh and Linux operating system with emphasis on combining dynamic visual data discovery with powerful analytical tools. Besides the basic statistical analysis, JMP is also capable of many advanced analytic techniques, including state of the art methods for design of experiments and computer simulations. A few important features of JMP are as follows: A spreadsheet for viewing, editing, entering, and manipulating data A broad range of graphical and statistical methods for data analysis Extensive design of experiments capabilities Options to select and display subsets of the data Data management tools for sorting and combining tables A formula editor for each table column to compute values A way to group data and compute summary statistics Special plots, charts, and communication capabilities for quality improvement techniques Tools for moving analysis results between applications A scripting language for saving frequently used routines Advanced statistical and data mining tools such as DOE, Recursive partitioning, Neural networks Interfaces to SAS JMP is easy to learn and user friendly software. Statistics are organized into logical areas with appropriate graphs and tables, which help users to find patterns in data, identify outlying points, or fit models. Based on the types of variables and assigned roles, JMP automatically selects an appropriate analysis to be performed on these variables. JMP offers descriptive statistics and simple analyses for beginning statisticians and complex model fitting for advanced researchers. Standard statistical analysis and specialty platforms for design of experiments, statistical quality control, ternary and contour plotting, and survival analysis provide the tools you need to analyze data and see results quickly. JMP is designed as problem centric rather than tool centric. User can progressively go into deeper analysis of data starting from simple exploration of data. The output of each step can further be refined to discover new patterns. The red spots displayed in the graphs also known as hot spots are used to refine and analyze the data further as shown in fig. 1.

Upload: truongdiep

Post on 07-Jul-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

 

 

AN OVERVIEW AND EXPLORATION OF JMP – A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE

A.P. Ruhil and Tara Chand

National Dairy Research Institute, Karnal-132001

JMP commonly pronounced as “Jump” is a statistical software that gives graphical interface to display and analyze data interactively. JMP is a standalone application and was conceived, developed and started by John Sall in October, 1989. JMP is not a part of the SAS System, though portions of JMP were adapted from routines in the SAS System, particularly for linear algebra and probability calculations. It is supported on Windows, Macintosh and Linux operating system with emphasis on combining dynamic visual data discovery with powerful analytical tools. Besides the basic statistical analysis, JMP is also capable of many advanced analytic techniques, including state of the art methods for design of experiments and computer simulations. A few important features of JMP are as follows:

A spreadsheet for viewing, editing, entering, and manipulating data A broad range of graphical and statistical methods for data analysis Extensive design of experiments capabilities Options to select and display subsets of the data Data management tools for sorting and combining tables A formula editor for each table column to compute values A way to group data and compute summary statistics Special plots, charts, and communication capabilities for quality improvement

techniques Tools for moving analysis results between applications A scripting language for saving frequently used routines Advanced statistical and data mining tools such as DOE, Recursive partitioning,

Neural networks Interfaces to SAS

JMP is easy to learn and user friendly software. Statistics are organized into logical areas with appropriate graphs and tables, which help users to find patterns in data, identify outlying points, or fit models. Based on the types of variables and assigned roles, JMP automatically selects an appropriate analysis to be performed on these variables. JMP offers descriptive statistics and simple analyses for beginning statisticians and complex model fitting for advanced researchers. Standard statistical analysis and specialty platforms for design of experiments, statistical quality control, ternary and contour plotting, and survival analysis provide the tools you need to analyze data and see results quickly. JMP is designed as problem centric rather than tool centric. User can progressively go into deeper analysis of data starting from simple exploration of data. The output of each step can further be refined to discover new patterns. The red spots displayed in the graphs also known as hot spots are used to refine and analyze the data further as shown in fig. 1.

Page 2: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

JMP analysis integrates graphical presentation of the results with the statistical output. These graphs are dynamically linked to the data table and to one another. Selecting a point in a graph will automatically highlight that observation in the data table as well as associated graphical output as shown in fig. 2.

JMP Script Language: The tasks performed during the analysis can be saved in form of scripts produced by JMP so that these steps can be reproduced to perform same analysis. This script can be saved in the data table to keep record of analysis along with data file as shown in fig. 3. The scripts can also be saved in form journal and publication.

Page 3: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

Formula Editor: It is used to create new columns based on some computations. It includes a rich library of built in functions and operator to generate expression graphically and interactively as shown in fig. 4.

Context Sensitive help: JMP has extensive user assistance options to help users and guide them. The context sensitive help provides a unique way to access help topics from within analysis platforms. JMP enables users to access context sensitive help using the toolbar button with a question mark icon (the help tool). When this tool is selected, clicking in any field of an output window links you directly to the relevant topic in the help documentation as shown in fig. 5.

Starting JMP: Click on Start → Program Files → JMP8 → JMP8 or double click on the JMP8 icon displayed on the desktop. Following screen (Fig. 6) is displayed on the screen. Two windows are also opened simultaneously namely JMP starter and Tip of the day. The window titled “Tip of the Day” gives some important tips to users for using the JMP software. Another window titled “JMP Starter” displays all options of JMP in graphical forms

Page 4: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

in different categories e.g. file, script, analysis, graph etc. These windows can be closed temporarily by clicking on close button or permanently by changing the default settings through Preferences option.

Data Management: JMP arrange data in form a spreadsheet. A new data file can be created in JMP or data can also be imported from other sources. JMP can read data from different type of file formats such as xls, CSV, txt, access database files, dat, sas etc. JMP supports three data types namely 1: Continuous (the values are numeric measurement), 2: Ordinal (the values are ordered categories which have either character or numeric value, 3: Nominal (the values are numeric or character classification). Open Excel File: To open an existing file click on File → Open → Select the path, file name and type of file as shown in figure (Fig. 7) given below. File will be opened in a datasheet view. JMP will try to automatically check the data type of each column. Save the file in JMP format. You can try to open the file named “Sample data sets.xlsx” and its first work sheet “First Lactation Length$”, click on “OK” Data View: The opened data will be displayed in tabular form. Attributes are displayed in columns and records are row wise. A view of data table is shown in figure (Fig. 7). There are four components of a data table namely:

Data Grid: Organizes data into columns (variables or attributes) and rows (observations).

Column panel: Displays the list of total columns along with their properties and selected columns.

Row panel: Indicates the number of rows in data table and how many are selected/hidden/ excluded/ labeled.

Table panel: It indicates the name of the table and presents a list of any table variables and properties. A table variable has a constant character value that always available in the table.

Page 5: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

Data Exploration and Manipulation: • Column Menu – used to perform number of actions on variables such as change name,

modeling type, create new columns, formula, label/ unlabel, hide/unhide, lock/unlock, exclude/ unexclude from analysis, validate for enforcing validation checks on values to be entered in columns etc. • Label/Unlabel:- the labels i.e. col values will be displayed on data points in a graph

select column name and then Col - Label • Hide:- Select the column name and Col – Hide • Validation: Select the column name “Farm” and Col – Validation – List Check…

• Row Menu – select, hide, mark, color, label, add row, data filter to create more complicated logical selection etc.

• Creating new tables and subsets: • Create new table as subset of existing table • Select Rows – Data Filter to selected rows • Select Tables – Subset • Change the name of output table and see other options then OK

• Creating new columns: • Select Cols – New Column (there are other methods also) • Type name of new column such as Concentrate Ratio • Select column properties – Formula • Type the formula in new window and then OK and again OK to close window • See a new sign in front of new column

Page 6: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

• Create Summary Tables:

• Select Tables – Tabulate • Select variables and drop on the appropriate places • Set the decimal places by selecting Change Format • After that click on Done • Save summary table as fodder summary.jrp

Page 7: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

Examples of Graphical Data Exploration JMP software can be used for exploration and analysis of data using graphical features

and statistical methods. In the following exercises we will use data sets available in the file “Sample data sets.xlsx” The important features of JMP software are as follows:

• Create, Examine and interact with JMP graphs to begin data discovery. • Use options and commands to improve images for user understanding. • Save scripts to a data table.

• Frequency Distribution and Histograms • File - Open - First lactation length$.jmp from desired directory • Select Analyze – Distributions • Select Days – Y, Columns • Select OK • Select a bar in the graph, the corresponding rows are also selected • Save script in data table by Red Tri – Script – Save Script to Data Table • Close file

• Multiple Histograms

• Select File - Open - Fodder Consumtion$.jmp • Select columns from Mineral to Green • Select Analyze – Distribution • Selecte these columns – Y, Columns • Select OK • Play with various options

Page 8: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

• Scatter Plots • Select Analyze – Fit Y by X • Select Dry – Y Response • Select Mineral, Concentrate and Green as X, Factor • Select OK • Change the names of graphs e.g. double click on Bivariate Fit of Dry By Mineral

rename as Dry Fodder versus Mineral

Page 9: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

• Numerical and Graphical results • Select Graph – Chart • Select Year – Categories X, Levels • Select Mineral and Dry – statistics – mean • Select OK • Play with other options • Select Red Triangle of Chart – Script - Relaunch Analysis

• Regression Analysis • Select File – Open - Body Weight of Lamb$.jmp • Analyse – Fit Y by X • Select the variables for appropriate axis • Select OK • Click on Red Tr of Bivariate Fit… and select models of Fitting line • To see the results scroll down the window • Save Analysis in Data table or Journal

• Bubble Chart

Page 10: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

• Select File – Open - Bacteriological Quality of Milk$.jmp • Graph – Bubble Plot • Select the variables for appropriate axis • Select OK • Click on Red Tr of Bubble Plot and select All labels • To change the selection Click on Red Tr of Bubble Plot and select Show Roles • Save Bubble chart in Data table or Journal

• Tree Map: A substitute of bar graph when components are more. Two parameters can be

represented in a single graph in rectangular form. • Select File – Open – Human-Animal Population$.jmp • Exclude the row of India from data set

Page 11: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

• Click on Graph – Tree Map • Select variable Area in the role of Sizes • Select variable State of Union Territory in the role of categories • Select variable population in the role of categories • Select OK • Save Analysis in Data table or Journal

• Recursive Partitioning (Decision tree):

The Partition platform recursively partitions data according to a relationship between the X and Y values, creating a tree of partitions. It finds a set of cuts or groupings of X values that best predict a Y value. It does this by exhaustively searching all possible cuts or groupings. These splits (or partitions) of the data are done recursively forming a tree of decision rules until the desired fit is reached. Variations of this technique go by many names and brand names: decision trees, CARTTM, CHAIDTM, C4.5, C5, and others. The technique is often taught as a data mining technique, because • it is good for exploring relationships without having a good prior model • it handles large problems easily, and • the results are very interpretable. The classic application is where you want to turn a data table of symptoms and diagnoses of a certain illness into a hierarchy of questions to ask new patients in order to make a quick initial diagnosis. The factor columns (X's) can be either continuous or categorical (nominal or ordinal). If an X is continuous, then the splits (partitions) are created by a cutting value. The sample is divided into values below and above this cutting value. If the X is categorical, then the sample is divided into two groups of levels. The response column (Y) can also be either continuous or categorical (nominal or ordinal). If Y is continuous, then the platform fits means, and creates splits which most significantly separate the means by examining the sums of squares due to the means differences. If Y is categorical, then the response rates (the estimated probability for each response level) become the fitted value, and the most significant split is determined by the largest likelihood-ratio chi-square statistic. In either case, the split is chosen to maximize

Page 12: AN OVERVIEW AND EXPLORATION OF JMP - iasri.res.in · An Overview and Exploration of JMP – A Data Discovery System in Dairy Science JMP analysis integrates graphical presentation

An Overview and Exploration of JMP – A Data Discovery System in Dairy Science 

 

 

the difference in the responses between the two branches of the split. The resulting tree has the appearance of artificial intelligence by machine learning. This is a powerful platform, since it examines a very large number of possible splits and picks the optimum one. Few advantages of this tool are: • It is good for exploring relationships without having a good prior model • It handles large problems easily, and • The results are easily interpretable. Procedure: • Select File – Open – Animal Activities Score$.jmp • Analyse – Model - Partitioning • Select appropriate variables for Y and X • Select OK • Click on split to improve R-square • Try for other options like tree view, leaf report, split history, save columns etc. • Save Analysis in Data table or Journal