mis 696 final presentation fall 2008 mary burnskatherine carljiesi cheng soomi cheongkoren elder li...
TRANSCRIPT
MIS 696 Final PresentationFall 2008
Mary Burns Katherine Carl Jiesi ChengSoomi Cheong Koren Elder Li FanChun-neng Huang Brent Langhals Matthew PickardNathan Twyman Shuo Zeng Xinlei Zhao
What is MIS?
As a discipline? As a field of research?
MIS: A Conventional Definition
Management Information
(Computer Science)
Systems
(Engineering)
The Quest: From the Seven Pillars to the Tree of Decision
1998: Seven Pillars1999: A
Simple Model and
Key Researchers
2000: Additional
Pillars
2001: Another 2D
Model, A Timeline of Researchers 2002:
Researchers, More of the
Same
2003: A 3D Model,
Timeline, Endnote Library
2005: Another Model,
Publication Trends
2004: A 2D Model,
Research Institutions
2006: Methodological
Approach
2007: A Normative Approach, Decision
Tree
2008: An IS approach to
MIS?
The Brainstorm“Discovery consists of seeing what everybody has seen and thinking
what nobody has thought.” –Albert Szent-Gyorgyi
Nathan
The Ideagora
Validation of 2007 Decision Tree
Web of Science
Graphical Representation
Clustering
Journal Trends
The RealizationWe are a large and intelligent group of people, but can we deliver all of
these analyses in a semester?
We need a way to manage a large quantity of data.
Contribution: Database
DataData
DataDataData
Contribution: Database
• Basic article info
• Category
Contribution: Database
Contribution: Database
Contribution: Database
• Web of Knowledge and Google Citations
Our Contribution: A Database
• Article Dimensions– Rigor vs. Relevance– Theoretical vs. Applied– Innovation vs. Review– Behavioral vs. Technical
Analysis 1:Statistical Analysis of the Corpus
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%19
37
1945
1959
1960
1962
1963
1966
1967
1968
1969
1970
1971
1972
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2008
Percentage of Articles in Corpus by Year Published
0
2
4
6
8
10
12
14
1930-1939 1950-1959 1960-1969 1970-1979 1980-1989 1990-1999 2000-2009
Artificial Intelligence
Collaboration
Data Management
Decision Sciences
eCommerce
Economics of Information
HCI
Information Assurance
Knowledge Management
Operations Management
OTHER
Social Informatics
Supply Chain Management
System Analysis & Design
Workflow/Business Process Management
0
2
4
6
8
10
12
14
1930-1939 1950-1959 1960-1969 1970-1979 1980-1989 1990-1999 2000-2009
Artificial Intelligence
Collaboration
Data Management
Decision Sciences
eCommerce
Economics of Information
HCI
Information Assurance
Knowledge Management
Operations Management
OTHER
Social Informatics
Supply Chain Management
System Analysis & Design
Workflow/Business Process Management
OTHER, 16.12%
Data Management, 14.67%
System Analysis & Design, 12.14%
Knowledge Managment, 10.77%
Collaboration, 9.38%
Economics of Information, 8.96%
Decision Sciences, 8.84%
Workflow/Business Process Management, 8.60%
Artificial Intelligence, 6.30% Social Informatics, 4.22%
Top 10 Categories by % of WK Citations
Data Management, 22.67%
System Analysis & Design, 19.77%
Collaboration, 9.88%OTHER, 9.30%
Decision Sciences, 8.14%
Artificial Intelligence, 7.56%
Economics of Information, 6.40%
HCI, 5.81%
Social Informatics, 5.81%
Knowledge Managment, 4.65%
Top 10 Categories by % of Article Count
Title Author Year Category WK Citations
WK Citations per Year
Dynamic capabilities and strategic management Teece et al 1997 OTHER 1532 139A Relational Model of Data for Large Shared Data Banks Codd 1970 Data Management 1269 33Organizational Information Requirements, Media Richness and Structural Design
Daft and Lengel 1986 Workflow/Business Process Management 1100 50
A Dynamic Theory of Organizational Knowledge Creation Nonaka 1994 Knowledge Managment 1098 78On the Criteria To Be Used in Decomposing Systems into Modules Parnas 1972 System Analysis & Design 680 19
Machine learning in automated text categorization Sebastiani 2002 Artificial Intelligence 663 111The Lagrangian Relaxation Method for Solving Integer Programming Problems
Fisher 1981 Decision Sciences 658 24
Electronic Markets and Electronic Hierarchies Malone et al 1987 Economics of Information 575 27A Foundation for the Study of Group Decision Support Systems DeSanctis and Gallupe 1987 Collaboration 543 26Internet paradox: A social technology that reduces social involvement and psychological well-being?
Kraut et al 1998 Social Informatics 509 51
Top 10 Articles by Web of Knowledge Citations
Title Author Year Category Google Citations
Google Citations per Year
The Nature of the Firm Coase 1937 OTHER 11151 157Dynamic capabilities and strategic management Teece et al 1997 OTHER 5536 503A Dynamic Theory of Organizational Knowledge Creation Nonaka 1994 Knowledge Managment 5121 366The entity-relationship model toward a unified view of data Chen 1976 Data Management 4583 143A Relational Model of Data for Large Shared Data Banks Codd 1970 Data Management 4197 110As We May Think Bush 1945 Knowledge Managment 2972 47Organizational Information Requirements, Media Richness and Structural Design
Daft and Lengel 1986 Workflow/Business Process Management 2853 130
A Spiral Model of Software Development and Enhancement Boehm 1988 System Analysis & Design 2848 142On the Criteria To Be Used in Decomposing Systems into Modules Parnas 1972 System Analysis & Design 2729 76
A Note on Two Problems in Connection with Graphs Dijkstra 1959 OTHER 2190 45
Top 10 Articles by GoogleCitations
Title Author Year Category WK Citations
WK Citations per Year
Review: Knowledge Management and Knowledge Management Systems: Conceptual Foundations and Research Issues
Alavi and Leidner 2005 Knowledge Managment 418 139
Dynamic capabilities and strategic management Teece et al 1997 OTHER 1532 139Machine learning in automated text categorization Sebastiani 2002 Artificial Intelligence 663 111A Dynamic Theory of Organizational Knowledge Creation Nonaka 1994 Knowledge Managment 1098 78User acceptance of information technology: Toward a unified view Venkatesh et al 2003 HCI 379 76Internet paradox: A social technology that reduces social involvement and psychological well-being?
Kraut et al 1998 Social Informatics 509 51
Organizational Information Requirements, Media Richness and Structural Design
Daft and Lengel 1986 Workflow/Business Process Management 1100 50
A Relational Model of Data for Large Shared Data Banks Codd 1970 Data Management 1269 33Frictionless Commerce? A Comparison of Internet and Conventional Retailers Brynjolfsson and Smith 2000 eCommerce 263 33
Design Science in Information Systems Research Hevner et al 2004 OTHER 131 33
Top 10 Articles by Average Web of Knowledge Citations per Year
Title Author Year Category Google Citations
Google Citations per Year
Review: Knowledge Management and Knowledge Management Systems: Conceptual Foundations and Research Issues
Alavi and Leidner 2005 Knowledge Managment 1593 531
Dynamic capabilities and strategic management Teece et al 1997 OTHER 5536 503A Dynamic Theory of Organizational Knowledge Creation Nonaka 1994 Knowledge Managment 5121 366Machine learning in automated text categorization Sebastiani 2002 Artificial Intelligence 2056 343User acceptance of information technology: Toward a unified view Venkatesh et al 2003 HCI 1055 211The Nature of the Firm R. H. Coase 1937 OTHER 11151 157The entity-relationship model toward a unified view of data Peter Pin-Shan Chen 1976 Data Management 4583 143A Spiral Model of Software Development and Enhancement Boehm 1988 System Analysis & Design 2848 142Frictionless Commerce? A Comparison of Internet and Conventional Retailers Brynjolfsson and Smith 2000 eCommerce 1129 141
Organizational Information Requirements, Media Richness and Structural Design
Daft and Lengel 1986 Workflow/Business Process Management 2853 130
Top 10 Articles by Average Google Citations per Year
Attribute-Based Clustering & Analysis of MIS Papers
Purpose and Methodology
• Purpose– Classify the MIS papers from a different perspective – the
general attributes of the papers– Provide useful information to assist the trend analysis and
prediction about MIS research• Methodology
– Clustering: Use Fuzzy k-Means Clustering Algorithm– Validation: Use Partition Index (SC) to determine the best
number of clusters– Cluster Evaluation: Label the papers with cluster numbers– Analysis: Analyze the clustering results
Attributes of Papers
• 8 Attributes / 4 Attributes Pairs– Theoretical vs. Applied– Rigor vs. Relevance– Review vs. Innovation– Technical vs. Behavior
• Scoring and Data Processing– Every attribute of a paper is given a score 1~5– The score of one attribute is considered the reverse score
of the other attribute in the pair (i.e. scoreTheoretical = 3 equals to scoreApplied = -3)
MIS-Paper Space Definition- A 4-dimensional space
Theoretical
Applied
Rigor Relevance
Review
InnovationTechnical
BehaviorX1
X2
X3
X4
Fuzzy k-Means Clustering
• Average value of scores in the same pair are used as the coordinates of the paper in MIS-Paper Space, it is 4-dimensional
• All coordinates of papers are used as the raw data in the clustering procedure
• Because the best number of clusters could not be decided at the beginning, the clustering procedure will run several times with the number of clusters predefined from 3~15
Validation
• Goal of clustering– Group the papers with as many similarities as
possible– Separate different groups as far away from each
other as possible
• Choice of validation index– Partition Index: The ratio of the sum of
compactness and separation of the clusters – The lower the ratio, the better
Validation (Cont’d)
2 4 6 8 10 12 14 16
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
X: 7Y: 1.47
Validation curve using Partition Index (SC)
Number of Clusters
Par
titio
n In
dex
Val
ue
Validation (Cont’d)
• Best number of clusters: 7• Reasons
– It is the “elbow” point, the increase of performance after 7 is not as prominent as that before 7
– Although 12 has the lowest index value, too many clusters (too few papers every cluster) will affect the generalization of the characteristics of every cluster
Cluster Evaluation
• Choose the largest membership value and label the paper with cluster number
• Center and number of papers of every cluster
1 1
2 2
3 3
4 4
5 5
6 6
0.6052 0.5752 0.8217 1.1602 28
1.2773 0.5368 0.2232 1.4327 24
0.2477 0.0712 0.3298 -1.3236 29
-0.9300 -0.3595 0.9106 -1.7341 26
-0.1861 0.6770 -0.8188 -0.4133 28
1.0888 0.6974 0.9997 -1.6909 2
v N
v N
v N
v N
v N
v N
7 7
9
0.1839 0.9297 -0.8653 1.1222 21v N
Characteristics TableTheoretical Applied Rigorous Relevant Review Innovation Technical Behavior
Cluster1 --- Moderate --- Moderate --- Moderate --- Normal
Cluster2 Normal --- --- Moderate --- Moderate --- Normal
Cluster3 --- Moderate Slight Slight --- Moderate Normal ---
Cluster4 Moderate --- Moderate --- --- Moderate Extreme ---
Cluster5 Moderate --- --- Moderate Moderate --- Moderate ---
Cluster6 --- Normal --- Moderate --- Moderate Extreme ---
Cluster7 --- Moderate --- Moderate Moderate --- --- Normal
Data Visualization
Data Visualization (Cont’d)
Data Visualization (Cont’d)
Data Visualization (Cont’d)
Domain-Cluster Paper DistributionCluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Sum
Artificial Intelligence 0 1 5 1 2 4 0 13Collaboration 4 5 1 1 0 1 5 17
Data Management 1 1 12 10 7 8 0 39Decision Sciences 2 1 1 4 3 0 3 14
eCommerce 2 0 1 0 1 0 1 5Ecomomics of Information 4 2 0 0 4 0 1 11
HCI 3 2 1 0 2 1 1 10Information Assurance 1 0 0 0 0 0 0 1
Knowledge Management 1 5 1 0 0 0 1 8Operations Management 1 0 0 1 0 0 0 2
Social Informatics 3 1 0 0 2 0 4 10Supply Chain Management 0 0 0 2 0 0 0 2
System Analysis & Design 6 0 4 4 3 13 4 34Workflow/Business Process
Management 0 1 0 1 0 1 0 3OTHER 0 5 3 2 4 1 1 16
Sum 28 24 29 26 28 29 21 185
Possible Analysis Results
• By analyzing the paper distribution in domain and clusters, we can generate– Authors’ research map– Universities’ research map– Journals’ preference on paper types
• By analyzing the above result with a time series, we can generate– Trend and prediction of authors’, universities’
research– Journals’ preference
Benefits
• Catch the latest research hotspot in every domain
• Follow the changes of the preference of journals
• Acquire real-time information about the changes of universities’ and professors’ roles in the MIS community
• Discover the unexplored domain in MIS area
Discussion & Future Work
• Two difficulties– Need information from perspectives to reasonably explain
the results – Attribute score may contain bias, which will affect the
performance of the clustering• Future work
– Select new attributes to evaluate papers– Examine the effect of score bias and design better
approach– Replace manual analysis with automatic process, such as
Text Mining and Social Network Analysis
Text Mining
SQL 2005 Data Mining
42
Data Mining Algorithms
√√ √√ √√ √√ √√ √√
√√ √√ √√ √√ √√
√√ √√ √√
√√ √√ √√ √√ √√ √√
√√ √√ √√
√√
√√
Decision Tre
esNaïve
Bayes
Clusterin
g
Seq. C
lusterin
gTim
e Serie
sAsso
ciation ru
lesNeural N
etwork
Classification
Regression
Segmentaion
Assoc. Analysis
Anomaly Detect.
Seq. Analysis
Time series
√ √ - second choice- second choice√ √ - first choice- first choice
43
Naïve Bayesian• Based on Bayesian Theorem with “Naïve” assumption• The fastest algorithm, and gives reasonable accuracy• Best used for
– Advanced data exploration (correlation, attribute discrinimation, etc)– Manual feature selection
• Parallel correlation counting• Parameters:
– MAXIMUM_INPUT_ATTRIBUTES– MAXIMUM_OUTPUT_ATTRIBUTES– MINIMUM_NODE_SCORE– MAXIMUM_STATES
44
Decision Trees• Best accuracy for classification, regression,
association prediction in many cases.• Multiple internal algorithms
– Bayesian with K2 prior, Uniform prior– Entropy-based– Bayesian Gaussian for regression trees– Complete/simple-binary splits
• Patent-pending technologies– Automatic feature-selection– High cardinality attribute handling– Continuous attribute handling
• Parallel correlation counting• Parameters:
– COMPLEXITY_PENALTY– MAXIMUM_INPUT_ATTRIBUTES– MAXIMUM_OUTPUT_ATTRIBUTES– MINIMUM_LEAF_CASES– FORCE_REGRESSORS– SCORE_METHOD– SPLIT_METHOD
45
Clustering• Segmentation, profiling• Multiple internal algorithms
– K-means– EM
• Automatic feature selection on input attributes, automatic high cardinality attribute handling
• Parameters– CLUSTER_COUNT– MAXIMUM_INPUT_ATTRIBUTES– CLUSTER_METHOD– MAXIMUM_STATES– MINIMUM_CLUSTER_CASES– MODELLING_CARDINALITY– STOPPING_TOLERANCE
46
Neural Network• Classification, segmentation, association prediction,
segmentation.• Conjugate gradient method• 0-1 hidden layer• Early stopping criteria• Automatic feature selection• Parameters
– MAXIMUM_INPUT_ATTRIBUTES– MAXIMUM_OUTPUT_ATTRIBUTES– MAXIMUM_STATES– HIDDEN_NODE_RATION– HOLDOUT_PERCENTAGE
SQL 2005 Data Mining
SQL 2005 Data Mining
SQL 2005 Data Mining
SQL 2005 Data Mining
MIS Space Data Model
Improve the Corpus• Make it more Representative
– Consider citations– Bring up to date– Broaden Horizons
Number of Citations
Num
ber o
f Arti
cles
CACM & AI: 400+ articles with 100+ citations
Improve the Corpus
• The Corpus is the most critical– Consider citations– Bring up to date– Broaden horizons
Improve the
Corpus
Magnify the Human Judgment
Enhance the
Analyses
Expand the Database
Expand the Database
• Citations• Abstract• Journal Rankings• Impact Factors
Computer Science
EngineeringManagement
MIS
Rigor: What % of the paper is (A), (B), (C)
Magnify the Human Judgment
• Article Categorization– Improve or Replace the Decision Tree
• Inflexible – Articles can have more than one category, but there is no “well, sort of…” allowed in a decision tree
• Create a rubric for each category and rank each article against each rubric
• Article Dimensions– Expand the Scale – 7 pt.– Add structure to the Decision Process
Knowledge Mgmt
Collaboration
Applied: What % of the paper is (A), (B), (C)
Theoretical: What % of the paper is (A), (B), (C)
Enhance the Analyses
• Linguistic Analysis• Citation Analysis
– See what disciplines are most influential to MIS– See what disciplines are most influenced by MIS– Inbreeding
• Fusion of Categorization Methods: Human Judgment, Journal Analysis, Author Analysis Citation Analysis, Linguistic Analysis, Others
Our Experience, Concluded
Learn to Love the Question
Choose an appropriate definition of MIS
As a student
As a teacher
As a researcher
As a journal editor
What is MIS?Computer
Science
EngineeringManagement
MIS
OTHER, 16.12%
Data Management, 14.67%
System Analysis & Design, 12.14%
Knowledge Managment, 10.77%
Collaboration, 9.38%
Economics of Information, 8.96%
Decision Sciences, 8.84%
Workflow/Business Process Management, 8.60%
Artificial Intelligence, 6.30% Social Informatics, 4.22%
Top 10 Categories by % of WK Citations
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
1937
1945
1959
1960
1962
1963
1966
1967
1968
1969
1970
1971
1972
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2008
Percentage of Articles in Corpus by Year Published
Thank You