data mining sas

Download Data Mining SAS

Post on 07-Oct-2014

828 views

Category:

Documents

6 download

Embed Size (px)

TRANSCRIPT

Praise from the ExpertsDr. Patricia Cerrito, professor of mathematics at the University of Louisville, has written a very useful introduction to SAS Enterprise Miner and data mining. Dr. Cerrito is a well-known expert in data mining, and is especially well known for her research in applications of text mining. Her experience in the classroom is clearly evident in the examples and motivation of her material. She begins the text with an easy-to-follow introduction to SAS Enterprise Miner. She follows the introductory chapter with an important overview and discussion of data visualization techniques, providing clear illustrations conducted in SAS Enterprise Miner. A unique feature of this book is the early introduction of text mining, which Dr. Cerrito does in Chapter 3. The following chapters examine data summarization, association (or market basket analysis), the use of text mining in combination with association analysis, and cluster analysis. The emphasis on these exploratory or pattern recognition techniques in data mining is a helpful addition to the literature illustrating the application of SAS Enterprise Miner. Dr. Cerrito's expertise in these areas is evident. The book concludes with an excellent chapter on predictive modeling techniques and on time series techniques. The final chapter is particularly unique: Dr. Cerrito provides the student with an excellent review of technical preparation. Many students often fail to understand the importance of communicating the results of their analyses either to other scientists or senior managers. This chapter provides the student with clear examples of good report preparation. This is a wonderful introductory text to data mining and to SAS Enterprise Miner. I am confident that faculty, students, and business analysts will find this book to be an invaluable resource. J. Michael Hardin, Ph.D. Associate Dean for Research Culverhouse College of Commerce The University of Alabama

Introduction to Data Mining Using SAS Enterprise Miner is an excellent introduction for students in a classroom setting, or for people learning on their own or in a distance learning mode. The book contains many screen shots of the software during the various scenarios used to exhibit basic data and text mining concepts. In this way, the student obtains validation of correct procedures while performing the steps of the narrative or for parallel processes to complete the assignments. The author uses a varied and interesting set of databases for demonstration of the many capabilities of SAS Enterprise Miner. The alteration of examples based on structured data and text data sets reinforce the awareness of the need to consider all sources of information and the fact that one can process text with SAS Enterprise Miner in a quite straight-forward manner.

James Mentele, Senior Research Fellow Central Michigan University Research Corporation

Introduction to Data Mining Using SAS Enterprise Miner is a useful introduction and guide to the data mining process using SAS Enterprise Miner. Initially the product can be overwhelming, but this book breaks the system into understandable sections. Examples of all the major tools of SAS Enterprise Miner are presented, including useful data analysis techniques of association, clustering, and text mining, as well as data exploration tools. The demonstrations use real-world data (provided on the CD that is included with the book) with a focus on the input settings needed for analysis and understanding the outputs produced. Jeff G. Zeanah President, Z Solutions, Inc.

Introduction to Data Mining Using SAS Enterprise Miner by Patricia B. Cerrito, a wellknown and highly regarded statistician-practitioner and educator, provides a crisp and lively presentation of the subject with neither undue attention nor neglect for technical detail. The books case study approach is its best strength. Section exercises are presented in timely and contextual fashion. This approach forces the student to think that data has meaning, message, and then the analysis makes more sense. Ravija Badarinathi Professor, Statistics and Management Science Cameron School of Business University of North Carolina Wilmington

Introduction to

Data Mining

Using SAS Enterprise Miner

Patricia B. Cerrito

The correct bibliographic citation for this manual is as follows: Cerrito, Patricia B. 2006. Introduction to Data Mining Using SAS Enterprise Miner. Cary, NC: SAS Institute Inc.

Introduction to Data Mining Using SAS Enterprise MinerCopyright 2006, SAS Institute Inc., Cary, NC, USA ISBN-13: 978-1-59047-829-5 ISBN-10: 1-59047-829-0 All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st printing, November 2006 SAS Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site at support.sas.com/pubs or call 1-800-727-3228. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.

Contents

Preface .............................................................................................................. vii Acknowledgments ............................................................................................... ix

Chapter 1 The Basics of SAS Enterprise Miner 5.2 ...................... 11.1 1.2 1.3 1.4 1.5 1.6 1.7 Introduction to Data Mining ..........................................................................1 Introduction to SAS Enterprise Miner 5.2......................................................4 Exploring the Data Set ................................................................................14 Analyzing a Sample Data Set.......................................................................19 Presenting Additional SAS Enterprise Miner Icons......................................32 Exercises ....................................................................................................43 References..................................................................................................46

Chapter 2 Data Visualization ....................................................... 472.1 2.2 2.3 2.4 2.5 Using the StatExplore Node ........................................................................47 Understanding Kernel Density Estimation...................................................57 Student Survey Example .............................................................................65 Discussion ..................................................................................................75 Exercises ....................................................................................................75

Chapter 3 SAS Text Miner Software ........................................... 773.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 Analyzing Unstructured Text .......................................................................77 Introducing the SAS Text Miner Node .........................................................78 Crawling the Web with the %TMFILTER Macro ...........................................90 Clustering Text Documents.........................................................................92 Using Concept Links ...................................................................................97 Interpreting Open-Ended Survey Questions ............................................. 103 Revisiting the Federalist Papers .............................................................. 109 Exercises ................................................................................................. 116 Reference ................................................................................................ 116

Chapter 4 Data Summary........................................................... 1174.1 4.2 4.3 4.4 4.5 Introduction to Data Summary ................................................................. Analyzing Continuous Data ...................................................................... Compressing Data.................................................................................... Analyzing Discrete Data ........................................................................... Using the Princomp Node......................................................................... 117 120 123 128 137

iv Contents

4.6 Additional Example .................................................................................. 141 4.7 Exercises ................................................................................................. 156

Chapter 5 Association Node ...................................................... 1575.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 Introduction to Association Rules ..........................

Recommended

View more >