%!ps-adobe-3.0susandumais.com/sigir00.doc · web viewthe hierarchical model showed 2-4%...

%!PS-Adobe-3.0

Hierarchical Classification of Web Content

Susan Dumais

Microsoft Research

One Microsoft Way

Redmond, WA 99802 [email protected]

Hao Chen

Computer Science DivisionUniversity of California at Berkeley Berkeley, CA 94720-1776 USA

[email protected]@sims.berkeley.edu

ABSTRACT

This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train different second-level classifiers. In the hierarchical case, a model is learned to distinguish a second-level category from other categories within the same top level. In the flat non-hierarchical case, a model distinguishes a second-level category from all other second-level categories. Scoring rules can further take advantage of the hierarchy by considering only second-level categories that exceed a threshold at the top level.

We use support vector machine (SVM) classifiers, which have been shown to be efficient and effective for classification, but not previously explored in the context of hierarchical classification. We found small advantages in accuracy for hierarchical models over flat models. For the hierarchical approach, we found the same accuracy using a sequential Boolean decision rule and a multiplicative decision rule. Since the sequential approach is much more efficient, requiring only 14%-16% of the comparisons used in the other approaches, we find it to be a good choice for classifying text into large hierarchical structures.

KEYWORDS

Text classification, text categorization, classification, support vector machines, machine learning, hierarchical models, web hierarchies

INTRODUCTION

With the exponential growth of information on the internet and intranets, it is becoming increasingly difficult to find and organize relevant materials. More and more, simple text retrieval systems are being supplemented with structured organizations. Since the 19th century, librarians have used classification systems like Dewey and Library of Congress subject headings to organize vast amounts of information. More recently, web directories such as Yahoo! and LookSmart have been used to classify web pages. Structured directories support browsing and search, but the manual nature of the directory compiling process makes it difficult to keep pace with the ever increasing amount of information. Our work looks at the use of automatic classification methods to supplement human effort in creating structured knowledge hierarchies.

LEAVE BLANK THE LAST 2.5 cm (1”) OF THE LEFT COLUMN ON THE FIRST PAGE FOR THE COPYRIGHT NOTICE.

A wide range of statistical and machine learning techniques have been applied to text categorization, including multivariate regression models [8,22], nearest neighbor classifiers [26], probabilistic Bayesian models [13, 16], decision trees [16], neural networks [22, 25], symbolic rule learning [1, 4], and support vector machines [7,12]. These approaches all depend on having some initial labeled training data from which category models are learned. Once category models are trained, new items can be added with little or no additional human effort.

Although many real world classification systems have complex hierarchical structure (e.g., MeSH, U.S. Patents, Yahoo!, LookSmart), few learning methods capitalize on this structure. Most of the approaches mentioned above ignore hierarchical structure and treat each category or class separately, thus in effect “flattening” the class structure. A separate binary classifier is learned to distinguish each class from all other classes. The binary classifiers can be considered independently, so an item may fall into none, one, or more than one category. Or they can be considered as an m-ary problem, where the best matching category is chosen. Such simple approaches work rather well on small problems, but they are likely to be difficult to train when there are a large number of classes and a very large number of features. By utilizing known hierarchical structure, the classification problem can be decomposed into a set of smaller problems corresponding to hierarchical splits in the tree. Roughly speaking, one first learns to distinguish among classes at the top level, then lower level distinctions are learned only within the appropriate top level of the tree. Each of these sub-problems can be solved much more efficiently, and hopefully more accurately as well.

The use of a hierarchical decomposition of a classification problem allows for efficiencies in both learning and representation. Each sub-problem is smaller than the original problem, and it is sometimes possible to use a much smaller set of features for each [13]. The hierarchical structure can also be used to set the negative set for discriminative training and at classification time to combine information from different levels. In addition, there is some evidence that decomposing the problem can lead to more accurate specialized classifiers. Intuitively, many potentially good features are not useful discriminators in non-hierarchical representations. Imagine a hierarchy with two top-level categories (“Computers” and “Sports”), and three subcategories within each (“Computers/Hardware”, “Computers/Software”, “Computers/Chat”, “Sports/Chat”, “Sports/Soccer”, Sports/Football”). In a non-hierarchical model, a word like “computer” is not very discriminating since it is associated with items in “Computers/Software”, “Computers/Hardware”, and “Computers/Chat”. In a hierarchical model, the word “computer” would be very discriminating at the first level. At the second level more specialized words could be used as features within the top-level “Computer” category. And, the same features could be used at the second level for two different top-level classed (e.g., “chat” might be a useful feature for both the category “Sports/Chat”, and “Computers/Chat”). Informal failure analyses of classification errors for non-hierarchical models support this intuition. Many of the classification errors are for related categories (e.g., a page about “Sports/Soccer” might be confused with “Sports/Football”, thus category specific features should improve accuracy).

Recently several researchers have investigated the use of hierarchies for text classification, with promising results. Our work differs from earlier work in a couple of important respects. First, we test the approach on a large collection of very heterogeneous web content, which we believe is increasingly characteristic of information organization problems. Second, we use a learning model, support vector machine (SVM), that has not previously been explored in the context of hierarchical classification. SVMs have been found to be more accurate for text classification than popular approaches like naïve Bayes, neural nets, and Rocchio [7, 12, 27]. We use a reduced-dimension binary-feature version of the SVM model that is very efficient for both initial learning and real-time classification, thus making it applicable to large dynamic collections. We will briefly review the earlier work and contrast it with ours.

RELATED WORK

Reuters

Much of the previous work on hierarchical methods for text classification uses the Reuters-22173 or Reuters-21578 articles. This is a rather small and tidy collection, and this alone is problematic for understanding how the approaches generalize to larger more complex internet applications. In addition, the Reuters articles are organized into 135 topical categories with no hierarchical structure. To study hierarchical models, researchers have added one level of hierarchical structure manually. Kohler and Sahami [13] generated a small hierarchical subset of Reuters-22173 by identifying labels that tended to subsume other labels (e.g., corn and wheat are subsumed by grain). The largest of their hierarchies consisted of 939 documents organized into 3 top-level and 6 second-level categories. They compared naïve Bayes, and two limited dependency Bayes net classifiers on flat and hierarchical models. Test documents were classified into the hierarchy by first filtering them through the single best matching first level class and then sending them to the appropriate second level. Note that errors made at the first level are not recoverable, so the system has to make k correct classification for a k-level hierarchy. They found advantages for the hierarchical models when a very small number of features (10) were used per class. For larger numbers of features (which will be required in many complex domains) no advantages for the hierarchical models were found.

Weigend et al. [25] also used the Reuters-22173 collection. An exploratory cluster analysis was first used to suggest an implicit hierarchical structure, and this was then verified by human assignments. They created 4 top-level categories (agriculture, energy, foreign exchange, and metals) and a 5th miscellaneous category that was not used for evaluation. The final evaluations were on 37 categories with at least 16 positive training examples. They used a probabilistic approach that frames the learning problem as one of function approximation for the posterior probability of the topic vector given the input vector. They used a neural net architecture and explored several input representations. Information from each level of the hierarchy is combined in a multiplicative fashion, so no hard decisions have to be made except at the leaf nodes. They found a 5% advantage in average precision for the hierarchical representation when using words, but not when using latent semantic indexing (LSI) features.

D’Alessio et al. [6] used the Reuters-21578 articles. The hierarchy they use comes from Hayes and Weinstein’s original Reuters experiments [9]. It consists of 5 meta-category codes (economic indicator, currency, corporate, commodity, energy) that include all but three of the original categories. For their experiments they only considered articles that had a single category tag, and the 37 categories with more than 20 positive training examples. Their model requires hard assignment at each branch of the tree. The hierarchical model showed 2-4% improvements in precision and recall over the flat model, and modifications to the hierarchy led to advantages of 2-9%.

Ng et al. [19] also used a hierarchical version of Reuters that consisted of countries at the top level and two topical levels below that, but they did not compare their hierarchical model against a flat model. While all of this work is encouraging, the Reuters collection is small and very well organized compared with many realistic applications.

MeSH, U.S. patents

Some researchers have investigated text classification in domains that have rich hierarchical taxonomies (e.g., MeSH, IDC codes of diseases, U.S. patent codes). The size and complexity of the medical and patent hierarchies are like those used for web content. These hierarchies were designed for controlled vocabulary tagging and they have been extensively refined over the years. Ruiz and Srinivasan [21] used a hierarchical mixture of experts to classify abstracts within the MeSH sub-category Heart. They learned classifiers at different levels of granularity, with the top-level distinctions serving as “gates” to the lower level “experts”. They found small advantages for the hierarchical approach compared with a flat approach. Larkey [15] compared hierarchical and flat approaches for classifying patents in the Speech Signal Processing sub-category. She found no multilevel algorithms that performed significantly better than a flat one which chooses among all the speech classes.

Web

McCallum, et al. [17] describe some interesting experiments on three hierarchical collections -- Usenet news, the Science sub-category of Yahoo!, and company web pages. The Yahoo! sub-collection is most closely related to our experiments, although it is less diverse since all items come from the same top-level category. They used a probabilistic Bayesian framework (naïve Bayes), and a technique called “shrinkage” to improve parameter estimates for the probability of words given classes. The idea is to smooth the parameter estimate of a node by interpolation with all its parent nodes (given by the hierarchical organization of classes). This is especially useful for classes with small numbers of training examples. For the Usenet collection, the best performance and largest advantages of the hierarchical model over the flat model are observed for large numbers of features (10,000). For the Yahoo! sub-collection, accuracy is low and there is a much smaller difference between two approaches

Mladenic and Grobelnik [18] examined issues of feature selection for hierarchical classification of web content, but they did not compare hierarchical models with flat non-hierarchical models. Chakrabarti et al. [2] also experimented with web content as well as patent and Reuters data. They use hierarchical structure both to select features and to combine class information from multiple levels of the hierarchy. For the non-hierarchical case, they use a simple vector method with term weighting and no feature selection. They find no advantages for the hierarchical approach in Reuters, a small advantage in the patents collection, and did not test the difference for the Yahoo! sub-collection. But, because they used different representations, it is difficult to know whether differences are due to the number of features, feature selection, or the hierarchical classification algorithm.

Our work explores the use of hierarchies for classifying very heterogeneous web content. We use SVMs, which have been found to be efficient and effective for text classification, but not previously explored in the context of hierarchical classification. The efficiency of SVMs for both initial learning and real-time classification make them applicable to large dynamic collections like web content. We use the hierarchical structure for two purposes. First, we train second-level category models using different contrast sets (either within the same top-level category in the hierarchical case, or across all categories in the flat non-hierarchical case). Second, we combine scores from the top- and second-level models using different combination rules, some requiring a threshold to be exceeded at the top level before second level comparisons are made. The sequential approach is especially efficient for large problems, but how it compares in accuracy is not known. All of our models allow for each test item to be assigned to multiple categories.

WEB DATA SET AND APPLICATION

Application – Classifying web search results

We are interested in moving beyond today’s ranked lists of results by organizing web search results into hierarchical categories. HCI researchers have shown usability advantages of structuring content hierarchically [3, 10, 14]. For information as varied and voluminous as the web it is necessary to create these structures automatically. Several groups have explored methods for clustering search results. Zamir and Etzioni [29] grouped web search results using suffix tree clustering, and Hearst and Pedersen [11] used Scatter/Gather to organize and browse search results. While these methods show some promise, the computational complexity of clustering algorithms limits the number of documents that can be processed, and generating names for the resulting categories is a challenge.

The basic idea behind our work is to use classification techniques to automatically organize search results into existing hierarchical structures. Classification models are learned offline using a training set of human-labeled documents and web categories. Classification offers two advantages compared to clustering – run time classification is very efficient, and manually generated category names are easily understood. To support our goal of automatically classifying web search results two constraints are important in understanding the classification problems we studied. First, we used just the short summaries returned from web search engines since it takes too long to retrieve the full text of pages in a networked environment. These automatically generated summaries are much shorter than the texts used in most classification experiments, and they are much noisier than other document surrogates like abstracts that some have worked with. Second, we focused on the top levels of the hierarchy since we believe that many search results can be usefully disambiguated at this level. We also developed an interface that tightly couples search results and category structure, and found large preference and performance advantages for automatically classified search results (see Chen and Dumais [3] for details). For the experiments reported in this paper, we focused on the top two levels of the hierarchy, and used short automatically generated summaries of web pages.

LookSmart Web Directory

For our experiments, we used a large heterogeneous collection of pages from LookSmart’s web directory [30]. At the time we conducted our experiments in May 1999, the collection consisted of 370597 unique pages that had been manually classified into a hierarchy of categories by trained professional web editors. There were a total of 17173 categories organized into a 7-level hierarchy. We focused on the 13 top-level and 150 second-level categories.

Training/Test Selection

We selected a random 16% sample of the pages for our experiments. We picked this sampling proportion to ensure that even the smallest second-level categories would have some examples in our training set. We used 50078 pages for training, and 10024 for testing. Table 1 shows the number of pages in each top level category. Top level categories had between 578 and 11163 training examples, and second-level categories had between 3 and 3141 training examples. The wide range in number of pages in each category reflects the distribution in the LookSmart directory. Pages could be classified into more than one category. On average, pages were classified into 1.20 second-level categories and 1.07 first-level categories.

Category Name

Total

Train

Test

Automotive

4982

578

114

Business & Finance

31599

3508

703

Computers & Internet

46000

5718

1126

Entertainment & Media

88697

11163

2159

Health & Fitness

25380

3500

722

Hobbies & Interests

22959

3227

682

Home & Family

11484

1373

280

People & Chat

35157

3309

682

Reference & Education

58002

5574

1175

Shopping & Services

19667

2122

423

Society & Politics

38968

4855

946

Sports & Recreation

23559

3081

640

Travel & Vacations

43685

5409

1091

Total Unique

370597

50078

10024

Table 1- Training and Test Samples by Category

Pre-processing

A pre-processing module extracted plain text from each web page. In addition, the title, description and keywords fields from the META tag, and the ALT field from the IMG tag were also extracted if they existed because they provide useful descriptions of the web page. If a web page contained frames, text from each frame was extracted.

Since we were interested in classifying web search results, we needed to work with just short summary descriptions of web pages. We automatically generated summaries of each page and used this for evaluating our classification algorithms. Our summaries consisted of the title, the keywords, and either the description tag if it existed or the first 40 words of the body otherwise.

We did minimal additional processing on the text. We translated upper to lower case, used white space and punctuation to separate words, and used a small stop list to omit the most common words. No morphological or syntactic analyses were used.

After preprocessing, a binary vector was created for each page indicating whether each term appeared in the page summary or not. For each category 1000 terms were selected based on the mutual information between the term and category (details below). Unlike many text retrieval applications, term frequency and document length are not taken into account in this representation, because earlier experiments showed good performance with the binary representation for SVMs [7], and this representation improves efficiency.

TEXT CLASSIFICATION USING SVMs

Text classification involves a training phase and a testing phase. During the training phase, a large set of web pages with known category labels are used to train a classifier. An initial model is built using a subset of the labeled data, and a holdout set is used to identify optimal model parameters. During the testing or operational phase, the learned classifier is used to classify new web pages.

A support vector machine (SVM) algorithm was used as the classifier, because it has been shown in previous work to be both very fast and effective for text classification problems [7, 12, 27]. Vapnik introduced SVMs in his work on structural risk minimization [23, 24], and there has been a recent surge of interest in SVM classifiers in the learning community. In its simplest form, a linear SVM is a hyperplane that separates a set of positive examples from a set of negative examples with maximum margin. The margin is the distance from the hyperplane to the nearest of the positive and negative examples.

Figure 1 shows an example of a simple two-dimensional problem that is linearly separable.

Category Name

Best F1

0.841

Health & Fitness/Drugs & Medicines

0.797

Home & Family/Real Estate

0.781

Reference & Education/K-12 Education

0.750

Sports & Recreation/Fishing

0.741

Reference & Education/Higher & Cont. Ed.

Worst F1

0.034

Society & Politics/World Cultures

0.088

Home & Family/For Kids

0.122

Computers & Internet/News & Magazines

0.131

Computers & Internet/Internet & the Web

0.133

Business & Finance/Business Professions

Figure 1 - Graphical Representation of a Linear SVM

In the linearly separable case maximizing the margin can be expressed as an optimization problem:

minimize

2

2

1

w

r

subject to

i

b

x

w

y

i

i

"

³

-

×

,

1

)

(

r

r

where xi is the ith training example and yi is the correct output of the SVM for the ith training example.

In cases where points are not linearly separable, slack variables are introduced that permit, but penalize, points that fall on the wrong side of the decision boundary. For problems that are not linearly separable, kernel methods can be used to transform the input space so that some non-linear problems can be learned. We used the simplest linear form of the SVM because it provided good classification accuracy, and is fast to learn and apply.

Platt [20] developed a very efficient method for learning the SVM weights. His sequential minimal optimization (SMO) algorithm breaks the large quadratic programming (QP) problem down into a series of small QP problems that can be solved analytically. Additional efficiencies can be realized because the training sets used for text classification are sparse and binary. Thus the SMO algorithm is nicely applicable for large feature and training sets.

Once the weights are learned, new items are classified by computing

x

w

r

r

×

, where

w

r

is the vector of learned weights, and

x

r

is the input vector for a new document. With the binary representation we use, this amounts to taking the sum of the weights for features present in a document and this is very efficient.

After training the SVM, we fit a sigmoid to the output of the SVM using regularized maximum likelihood fitting, so that the SVM produces posterior probabilities that are directly comparable across categories.

Feature Selection

For reasons of both efficiency and efficacy, feature selection is often used when applying machine learning methods to text categorization. We reduce the feature space by eliminating words that appear in only a single document (hapax legomena), then selecting the 1000 words with highest mutual information with each category. Yang and Pedersen [28] compared a number of methods for feature selection. We used a mutual information measure (Cover and Thomas [5]) that is similar to their information gain measure. The mutual information MI(F, C) between a feature, F, and a category, C, is defined as:

{

}

{

}

å

å

Î

Î

=

f

f

F

c

c

C

C

P

F

P

C

F

P

C

F

P

C

F

MI

,

,

)

(

)

(

)

,

(

log

)

,

(

)

,

(

We compute the mutual information between every pair of features and categories.

For the non-hierarchical case, we select the 1000 features with the largest MI for each of the 150 categories (vs. all the other categories). For the hierarchical case, we select 1000 features with the largest MI for each of the 13 top-level categories, and also select the 1000 features for each of the 150 second-level categories (vs. the categories that share the same top-level category). We did not rigorously explore the optimum number of features for this problem, but these numbers provided good results on a training validation set so they were used for testing. In all cases, the selected features are used as input to the SVM algorithm that learns the optimum weights for the features,

w

r

. SVM Parameters

In addition to varying the number of features, SVM performance is governed by two parameters, C (the penalty imposed on training examples that fall on the wrong side of the decision boundary), and p (the decision threshold). The default C parameter value (0.01) was used.

The decision threshold, p, can be set to control precision and recall for different tasks. Increasing p, results in fewer test items meeting the criterion, and this usually increases precision but decreases recall. Conversely, decreasing p typically decreases precision but increases recall. We chose p so as to optimize performance on the F1 measure on a training validation set. For the flat non-hierarchical models, p=0.5. For the hierarchical models, p=0.2 for the multiplicative decision rule, and p1=0.2 (first level) and p2=0.5 (second level) for the Boolean decision rule.

RESULTS

As just described decision thresholds were established on a training validation set. For each category, if a test item exceeds the decision threshold, it is judged to be in the category. A test item can be in zero, one, or more than one categories. From this we compute precision (P) and recall (R). These are micro-averaged to weight the contribution of each category by the number of test examples in it. We used the F measure to summarize the effects of both precision and recall. With equal weights assigned to precision and recall, F1 = 2*P*R/(P+R).

For each test example, we compute the probability of it being in each of the 13 top-level categories and each of the 150 second-level categories. Recall that for the non-hierarchical case, models were learned (and features chosen) to distinguish each category from all other categories, and that for the hierarchical case, models were learned (and features chosen) to distinguish each category from only those categories within the same top-level category. We explored two general ways to combine probabilities from the first and second level for the hierarchical approach. In one case we first set a threshold at the top level and only match second-level categories that pass this test – that is we compute a Boolean function P(L1)&&P(L2). In order to be correctly classified, a test instance must satisfy both constraints. This method is quite efficient, since large numbers of second level categories do not need to be tested. In the other case, we compute P(L1)*P(L2). This approach allows matches even though the scores at one level fall below threshold. Although the non-hierarchical models are not trained to use top-level information, we can compute the same probabilities for this case as well.

Learned Features

There are many differences in the learned features for the hierarchical and non-hierarchical models. For example, the feature “sports” is useful in distinguishing the top-level category Sports & Recreation from the other top-level categories. The feature “sports” is also useful for distinguishing between some second-level categories -- it has a high weight for News & Magazines (under Society & Politics), SportingGoods (under Shopping & Services), and Collecting (under Hobbies & Interests). But, the feature “sports” is not selected for any categories using the flat non-hierarchical approach. It is difficult to study this systematically, but the different learning approaches are serving an important role in feature selection and weighting.

F1 Accuracy

Top Level

The overall F1 value for the 13 top-level categories is .572. Classifying short summaries of very heterogeneous web pages is a difficult task, so we did not expect to have exceptionally high accuracy. Performance on the original training set is only .649, so this is a difficult learning task and generalization to the test set is quite reasonable. This level of accuracy is sufficient to support some web search tasks [3], and could also be used as an aid to human indexers. In addition, we know that we can improve the absolute level of performance by 15%-20% using the full text of pages, and by optimizing the C parameter. What is of more interest to us here is the comparative performance of hierarchical and non-hierarchical approaches.

Second Level, Non-Hierarchical (Baseline)

The overall F1 value for the 150 second-level categories, treated as a flat non-hierarchical problem, is .476. There is a drop in performance in going from 13 to 150 categories. Performance for the 150 categories is better than the .364 value reported by McCallum et al. [17], but it is difficult to compare precisely because they use average precision not F1, the categories themselves are different, and they use the full text of pages. The .476 non-hierarchal value will serve as the baseline for looking at the use of hierarchical models.

Performance varies widely across categories. The five categories with highest and lowest F1 scores (based on at least 30 test instances) are shown in Table 2.

The most difficult categories to learn models for are those based, at least in part, on non-content distinctions. It might be useful to learn models for genre (e.g., for kids, magazines, web) in addition to our current use of content-based models.

Second Level, Hierarchical

As described above, we examined both a multiplicative scoring function, P(L1)*P(L2), and a Boolean scoring function, P(L1)&&P(L2). The former allows for matches that fall below individual thresholds, and the latter requires that the threshold be exceeded at every level. Both scoring rules allow items to be classified into multiple classes.

The overall F1 value for the P(L1)*P(L2) scoring function is .495, at the threshold of p=0.20 established on the validation set. This represents a 4% improvement over the baseline flat models, and is statistically significant using a sign test, p<.01. The overall F1 value for the P(L1)&&P(L2) scoring function is .497, at the thresholds of p1=0.20 and p2=0.50 established on the validation set. This again is a small improvement over the non-hierarchical model, and is significantly different than the baseline using a sign test, p<0.01. Somewhat surprisingly, there is no difference between the sequential Boolean rule and the multiplicative scoring rule. The added efficiencies that can be obtained with the hierarchical model and the Boolean rule make it a good overall choice, as we discuss in more detail below.

An upper bound on the performance of these hierarchical models can be obtained by assuming that the top-level classification is correct (i.e., P(L1)=1). In this case, performance is quite good with an F1 value of .711, so there is considerable room for improvement.

Category Name

Total

Train

Test

Automotive

4982

578

114

Business & Finance

31599

3508

703

Computers & Internet

46000

5718

1126

Entertainment & Media

88697

11163

2159

Health & Fitness

25380

3500

722

Hobbies & Interests

22959

3227

682

Home & Family

11484

1373

280

People & Chat

35157

3309

682

Reference & Education

58002

5574

1175

Shopping & Services

19667

2122

423

Society & Politics

38968

4855

946

Sports & Recreation

23559

3081

640

Travel & Vacations

43685

5409

1091

Total Unique

370597

50078

10024

Table 2 - Best and worst F1 score by category

Efficiency

Offline training

As noted earlier, linear SVM models can be learned quite efficiently with Platt’s SMO algorithm. The total training time for all the models described in the paper was only about 30 minutes. Training time for all the non-hierarchical models on 50078 examples in each of 150 categories was 729 CPU seconds. Training time for all 150 hierarchical second-level models was 128 CPU seconds. Training is faster here because the negative set is smaller, including only items in the same top-level category. Training time for the 13 top-level categories was 1258 CPU seconds. All times were computed on a standard 266MHz Pentium II PC running Windows NT.

Online evaluation

At run time we need to take the dot product of the learned weight vector for each category and the feature vector for the test item. The cost of this operation depends on the number of features and the number of categories. Efficiency can be improved if some category comparisons can be omitted without hurting accuracy. As we have seen above, the Boolean decision function accomplishes this with no decrease in classification accuracy. For the non-hierarchical model, each test instance must be compared with all 150 second-level categories. For the hierarchical model with the multiplicative scoring rule, each test instance must be compared to all 13 first-level categories and all 150 second-level categories. For the Boolean scoring rule, each test instance must be compared to all 13 first-level categories but only to second-level categories that pass the first-level criterion. For the top-level threshold value of p=0.20, only 7.4% of the second-level categories need to be examined. Top-level comparisons are still required for the Boolean decision rule, so overall 14.8% of the comparisons used for the multiplicative rule and 16.1% of the comparisons used for the non-hierarchical model are required, both representing a considerable savings at evaluation time.

CONCLUSION

The research described in this paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content to support classification of search results. We used SVMs, which have been found to be an efficient and effective learning method for text classification, but not previously explored for hierarchical problems. We use the hierarchical structure for two purposes. First, we train second-level category models using different contrast sets (either within the same top-level category in the hierarchical case, or across all categories in the non-hierarchical case). Second, we combine scores from the top- and second-level models using different combination rules, some requiring a threshold to be exceeded at the top level before comparisons at the second level are made.

We found small advantages in the F1 accuracy score for the hierarchical models, compared with a baseline flat non-hierarchical model. We found no difference between a multiplicative scoring function and a sequential Boolean function for combining scores from the two levels of the hierarchy. Since the sequential Boolean approach is much more efficient, requiring only 14%-16% of the number of comparisons, we find it to be a good choice.

There are a number of interesting directions for future research. We should be able to obtain further advantages in efficiency in the hierarchical approach by reducing the number of features needed to discriminate between categories within the same top-level category. Koller and Sahami [13] have shown that one can dramatically reduce the number of features for simple Reuters categories without decreasing accuracy too much, and this is definitely worth trying with our content and learning algorithm. There are some preliminary indications that this will work in our application. The SVM model assigns a weight of 0 to features that are not predictive (and could thus be omitted). We find a larger number of features with 0 weights in the hierarchical case (137 per 1000) than the non-hierarchical case (85 per 1000). There are also many near-zero weights that contribute little to the overall score. Varying the number of features for further efficiency gains seems promising to try with our content and learning model.

The sequential decision model provides large efficiency gains, so it will be important to see how it generalizes to more than two levels, where error cascading may be more of an issue. There are at least two ways to address the problem. One involves improved classifiers and methods for setting appropriate decisions thresholds at multiple levels. Another approach is through the use of interactive interfaces so that people are involved in making some of the critical decisions. This is a rich space for further exploration.

Our research adds to a growing body of work exploring how hierarchical structures can be used to improve the efficiency and efficacy of text classification. We worked with a large, heterogeneous collection of web content -- a scenario that is increasingly characteristic of the information management tasks that individuals and organizations face. And, we successfully extended SVM models to an application that takes advantage of hierarchical structure, for both category learning and run time efficiencies.

ACKNOWLEDGMENTS

We are grateful to John Platt for help with the Support Vector Machine code, and to four anonymous reviewers for their comments.

REFERENCES

1. Apte, C., Damerau, F. and Weiss, S. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), 233-251, 1994.

2. Chakrabarti, S., Dom, B., Agrawal, R. and Raghavan, P. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. The VLDB Journal 7, 163-178, 1998.

3. Chen, H. and Dumais, S. Bringing order to the web: Automatically categorizing search results. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI’00), 145-152, 2000.

4. Cohen, W.W. and Singer, Y. Context-sensitive learning methods for text categorization Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 307-315, 1996.

5. Cover, T. and Thomas, J. Elements of Information Theory. Wiley, 1991.

6. D'Alessio, S., Murray, M., Schiaffino, R. and Kershenbaum, A. Category levels in hierarchical text categorization. Proceedings of EMNLP-3, 3rd Conference on Empirical Methods in Natural Language Processing, 1998.

7. Dumais, S. T., Platt, J., Heckerman, D. and Sahami, M. Inductive learning algorithms and representations for text categorization. Proceedings of the Seventh International Conference on Information and Knowledge Management (CIKM’98), 148-155, 1998.

8. Fuhr, N., Hartmanna, S., Lustig, G., Schwantner, M., and Tzeras, K. Air/X – A rule-based multi-stage indexing system for lage subject fields. Proceedings of RIAO’91, 606-623, 1991.

9. Hayes, P.J. and Weinstein, S.P. CONSTRUE: A System for Content-Based Indexing of a Database of News Stories. Second Annual Conference on Innovative Applications of Artificial Intelligence, 1990.

10. Hearst, M., and Karadi, C. Searching and browsing text collections with large category hierarchies. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI’97), Conference Companion, 1997.

11. Hearst, M. and Pedersen, J. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. Proceedings of 19th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 1996.

12. Joachims, T. Text categorization with support vector machines: Learning with many relevant features. Proceedings of European Conference on Machine Learning (ECML’98), 1998

13. Koller, D. and Sahami, M. 1997. Hierarchically classifying documents using very few words. Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97), 170-178, 1997.

14. Landauer, T., Egan, D., Remde, J., Lesk, M., Lochbaum, C., and Ketchum, D. Enhancing the usability of text through computer delivery and formative evaluation: The SuperBook project. Hypertext – A Psychological Perspective. Ellis Horwood, 1993.

15. Larkey, L. Some issues in the automatic classification of U.S. patents. In Working Notes for the AAAI-98 Workshop on Learning for Text Categorization, 1998.

16. Lewis, D.D. and Ringuette, M.. A comparison of two learning algorithms for text categorization. Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR’94), 81-93, 1994.

17. McCallum, A., Rosenfeld, R., Mitchell, T. and Ng, A. Improving text classification by shrinkage in a hierarchy of classes. Proceedings of the Fifteenth International Conference on Machine Learning, (ICML-98), 359-367, 1998.

18. Mladenic, D. and Grobelnik, M. Feature selection for classification based on text hierarchy. Proceedings of the Workshop on Learning from Text and the Web, 1998.

19. Ng, H.T., Goh, W.B. and Low, K.L, Proceedings of 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), 67-73, 1997.

20. Platt, J. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods –Support Vector Learning. B. Schölkopf, C. Burges, and A. Smola, eds., MIT Press, 1999.

21. Ruiz, M.E. and Srinivasan, P. Hierarchical neural networks for text categorization. Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), 281-282, 1999.

22. Schütze, H., Hull, D. and Pedersen, J.O. A comparison of classifiers and document representations for the routing problem. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), 229-237, 1995.

23. Vapnik, V., Estimation of Dependencies Based on Data [in Russian], Nauka, Moscow, 1979. (English translation: Springer Verlag, 1982.)

24. Vapnik, V., The Nature of Statistical Learning Theory, Springer-Verlag, 1995.

25. Weigend, A.S., Wiener, E.D. and Pedersen, J.O. Exploiting hierarchy in text categorization. Information Retrieval, 1(3), 193-216, 1999.

26. Yang, Y. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), 13-22, 1994.

27. Yang, Y. and Lui, Y. A re-examination of text categorization methods. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), 42-49, 1999.

28. Yang, Y. and Pedersen, J.O. A comparative study on feature selection in text categorization. Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97), 412-420, 1997.

29. Zamir, O. and Etzioni, O. Web document clustering: A feasibility demonstration. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), 46-54, 1998.

30. http://www.looksmart.com

� EMBED Excel.Sheet.8 ��

w

� EMBED Excel.Sheet.8 ��

Space of possible inputs

Maximize distances to nearest points

Negative Examples

Positive Examples

Category Name

Best F1

0.841

Health & Fitness/Drugs & Medicines

0.797

Home & Family/Real Estate

0.781

Reference & Education/K-12 Education

0.750

Sports & Recreation/Fishing

0.741

Reference & Education/Higher & Cont. Ed.

Worst F1

0.034

Society & Politics/World Cultures

0.088

Home & Family/For Kids

0.122

Computers & Internet/News & Magazines

0.131

Computers & Internet/Internet & the Web

0.133

Business & Finance/Business Professions

_956404784.unknown

_1018252115.unknown

_1018262858.unknown

_1018264478.xls

LookSmartStats

Two-Level Analysis

13 top level

150 second level

missed items in 0.7 and 0.8 - redone 7/15/99

sum2 data 01/14/2000

MSLookSmart>250

TopLevelCategory NameTotalTrainTest

NumberNameNtotaltraintestAutomotive4982578114

0.0automotive49825781140.11601766360.0228823766Business & Finance315993508703

0.1business&finance3159935087030.11101617140.0222475395Computers & Internet4600057181126

0.2computers&internet46000571811260.12430434780.0244782609Entertainment & Media88697111632159

0.3entertainment&media886971116321590.12585544040.0243412968Health & Fitness253803500722

0.4health&fitness2538035007220.13790386130.0284475965Hobbies & Interests229593227682

0.5hobbies&interests2295932276820.14055490220.0297051265Home & Family114841373280

0.6home&family1148413732800.11955764540.0243817485People & Chat351573309682

0.7people&chat3515733096820.09412065880.0193986973Reference & Education5800255741175

0.8reference&education58002557411750.09610013450.0202579221Shopping & Services196672122423

0.9shopping&services1966721224230.10789647630.02150811Society & Politics389684855946

0.10society&politics3896848559460.12458940670.0242763293Sports & Recreation235593081640

0.11sports&recreation2355930816400.13077804660.027165839Travel & Vacations4368554091091

0.12travel&vacations43685540910910.12381824420.0249742475Total Unique3705975007810024

total - l14501395341710743

unique3705975007810024

1.06667598551.0717278532

total - l25584911257

1.11524022521.1230047885

SecondLevel0.0.131

0.0.08890.0.240

0.0.1310.0.740

0.0.2400.0.1150

0.0.3810.0.1062

0.0.4156380.0.381

0.0.577260.4.10102

0.0.6162240.7.4144

0.0.7400.3.10204

0.0.82230.4.7203

0.0.92140.7.7205

0.0.10620.7.11204

0.0.11500.0.9214

0.0.1236100.0.8223

0.1.0116300.1.7234

0.1.17561590.7.2258

0.1.23410.8.6259

0.1.3450900.4.9316

0.1.414722950.2.0336

0.1.5146290.6.33313

0.1.6125260.1.2341

0.1.72340.4.1349

0.1.892190.0.123610

0.1.9289550.7.5384

0.1.1096190.7.83914

0.2.03360.11.8394

0.2.113402510.7.04312

0.2.29221740.5.8449

0.2.36051210.12.95113

0.2.48011730.6.25411

0.2.5394770.7.35811

0.2.6481980.12.56016

0.2.7120260.9.106314

0.2.8235510.4.126414

0.2.9142330.6.7647

0.2.10112210.12.76414

0.2.119881960.0.57726

0.3.011312290.6.07812

0.3.17131440.11.11829

0.3.222254020.6.58421

0.3.3285540.3.58519

0.3.4195440.0.0889

0.3.585190.7.18818

0.3.616683150.4.38915

0.3.731416170.1.89219

0.3.8434770.10.59517

0.3.916773370.1.109619

0.3.102040.4.119723

0.4.0275620.10.110323

0.4.13490.9.610515

0.4.219914220.6.610724

0.4.389150.9.211025

0.4.4326580.11.111017

0.4.5244440.5.411125

0.4.6259570.2.1011221

0.4.72030.1.011630

0.4.8468820.2.712026

0.4.93160.9.912121

0.4.101020.9.012219

0.4.1197230.9.412221

0.4.1264140.11.612333

0.5.0152390.1.612526

0.5.1269670.9.312621

0.5.2147380.5.912716

0.5.37461450.11.413933

0.5.4111250.2.914233

0.5.5341790.5.614223

0.5.6142230.1.514629

0.5.7311710.5.214738

0.5.84490.5.015239

0.5.9127160.8.115426

0.5.107541700.0.415638

0.5.11171280.7.615729

0.6.078120.10.216139

0.6.1241530.0.616224

0.6.254110.9.816224

0.6.333130.11.716436

0.6.4395760.5.1117128

0.6.584210.10.917637

0.6.6107240.12.117639

0.6.76470.9.1218029

0.6.8353700.10.1118242

0.7.043120.8.518839

0.7.188180.3.419544

0.7.22580.9.719844

0.7.358110.11.921342

0.7.41440.12.821657

0.7.53840.2.823551

0.7.6157290.9.523561

0.7.72050.12.323944

0.7.839140.6.124153

0.7.925545200.11.024162

0.7.10266530.4.524444

0.7.112040.4.625957

0.8.012362510.10.326256

0.8.1154260.7.1026653

0.8.216623460.5.126967

0.8.3484940.8.727364

0.8.49592070.4.027562

0.8.5188390.3.328554

0.8.62590.1.928955

0.8.7273640.5.731171

0.8.84591100.10.631353

0.8.9321680.8.932168

0.9.0122190.9.132365

0.9.1323650.4.432658

0.9.2110250.12.232850

0.9.3126210.5.534179

0.9.4122210.11.534469

0.9.5235610.11.234770

0.9.6105150.6.835370

0.9.7198440.11.336162

0.9.8162240.10.737075

0.9.9121210.2.539477

0.9.1063140.6.439576

0.9.11424900.9.1142490

0.9.12180290.3.843477

0.10.0465920.1.345090

0.10.1103230.8.8459110

0.10.2161390.10.046592

0.10.3262560.4.846882

0.10.414012560.2.648198

0.10.595170.8.348494

0.10.6313530.2.3605121

0.10.7370750.3.1713144

0.10.87631440.10.10741156

0.10.9176370.5.3746145

0.10.107411560.5.10754170

0.10.11182420.1.1756159

0.11.0241620.10.8763144

0.11.1110170.2.4801173

0.11.2347700.2.2922174

0.11.3361620.8.4959207

0.11.4139330.12.4968204

0.11.5344690.2.11988196

0.11.6123330.11.101075233

0.11.7164360.3.01131229

0.11.83940.8.01236251

0.11.9213420.2.11340251

0.11.1010752330.10.41401256

0.11.118290.1.41472295

0.12.016123320.12.01612332

0.12.1176390.8.21662346

0.12.2328500.3.61668315

0.12.3239440.3.91677337

0.12.49682040.12.61921375

0.12.560160.4.21991422

0.12.619213750.3.22225402

0.12.764140.7.92554520

0.12.8216570.3.73141617

0.12.95113

5584911257

Top-m1000-Sum2.Sum2

SUM2-SUM2

Hier-TopLevel

53k training

>256-1000

default c=0.01

CategTrainTestSMOX 0.5

05007810024TrainTest

Net1Targ1Net1Targ0Net0Targ1Net0Targ0precrecallbeFNet1Targ1Net1Targ0Net0Targ1Net0Targ0precrecallbeF

0.0automotive5781140.0.p0.5.smox.train.sum.sum.out32270256494300.82142857140.55709342560.68926099850.66391752580.0.p0.5.smox.test.sum.sum.out74144098960.84090909090.6491228070.7450159490.7326732673

0.1business&finance35087030.1.p0.5.smox.train.sum.sum.out11844762324460940.7132530120.33751425310.52538363260.45820433440.1.p0.5.smox.test.sum.sum.out2049549992260.68227424750.29018492180.48622958460.4071856287

0.2computers&internet571811260.2.p0.5.smox.train.sum.sum.out28697032849436570.80319148940.50174886320.65247017630.61765339070.2.p0.5.smox.test.sum.sum.out55016257687360.77247191010.48845470690.63046330850.598476605

0.3entertainment&media1116321590.3.p0.5.smox.train.sum.sum.out674913534414375620.83300419650.60458658070.71879538860.70064884510.3.p0.5.smox.test.sum.sum.out125829490175710.81056701030.58267716540.69662208780.6779843708

0.4health&fitness35007220.4.p0.5.smox.train.sum.sum.out20483701452462080.8469809760.58514285710.71606191660.69212571810.4.p0.5.smox.test.sum.sum.out4087931492230.83778234090.56509695290.70143964690.6749379653

0.5hobbies&interests32276820.5.p0.5.smox.train.sum.sum.out9342642293465870.77963272120.28943290980.53453281550.42214689270.5.p0.5.smox.test.sum.sum.out1786750492750.72653061220.26099706740.49376383980.38403452

0.6home&family13732800.6.p0.5.smox.train.sum.sum.out407158966485470.72035398230.29643117260.50839257750.42002063980.6.p0.5.smox.test.sum.sum.out764320497010.63865546220.27142857140.45504201680.380952381

0.7people&chat33096820.7.p0.5.smox.train.sum.sum.out11063532203464160.75805346130.33423995160.54614670650.46392617450.7.p0.5.smox.test.sum.sum.out1998648392560.6982456140.29178885630.49501723520.411582213

0.8reference&education557411750.8.p0.5.smox.train.sum.sum.out25386353036438690.79987393630.455328310.62760112320.58031325030.8.p0.5.smox.test.sum.sum.out51513766087120.78987730060.43829787230.61408758650.5637657362

0.9shopping&services21224230.9.p0.5.smox.train.sum.sum.out4692131653477430.68768328450.22101790760.4543505960.33452211130.9.p0.5.smox.test.sum.sum.out894933495520.64492753620.21040189130.42766471370.3172905526

0.1society&politics48559460.10.p0.5.smox.train.sum.sum.out17255233130447000.76734875440.35530381050.56132628250.48571026330.10.p0.5.smox.test.sum.sum.out27912566789530.69059405940.29492600420.49276003180.4133333333

0.11sports&recreation30816400.11.p0.5.smox.train.sum.sum.out16952911386467060.85347432020.55014605650.70181018840.66903493190.11.p0.5.smox.test.sum.sum.out3048033693040.79166666670.4750.63333333330.59375

0.12travel&vacations540910910.12.p0.5.smox.train.sum.sum.out37354311674442380.8965434470.6905158070.7935296270.7801566580.12.p0.5.smox.test.sum.sum.out7019039088430.88621997470.64252978920.76437488190.7449521785

257815840276365917570.81531260870.48263661380.64897461130.60634069474835132159081182480.78541260560.45006050450.61773655510.5722232085

0.0automotive5780.66391752580.7326732673correlation=

0.6home&family13730.42002063980.3809523810.3466701657

0.9shopping&services21220.33452211130.3172905526

0.11sports&recreation30810.66903493190.59375

0.5hobbies&interests32270.42214689270.38403452

0.7people&chat33090.46392617450.411582213

0.4health&fitness35000.69212571810.6749379653

0.1business&finance35080.45820433440.4071856287

0.1society&politics48550.48571026330.4133333333

0.12travel&vacations54090.7801566580.7449521785

0.8reference&education55740.58031325030.5637657362

0.2computers&internet57180.61765339070.598476605

0.3entertainment&media111630.70064884510.6779843708

F-trainF-test

Top-m1000-Sum2.Sum2

training

test

Number of Training Examples

F1

F1 Perf as Fcn of Training

Hier012.02-m1000.Sum2.Sum2

Compute using netscore.pl

SUM2-SUM2

53k training

>256-1000

copied from first sheet

SUM2-SUM2 (top level only)

default c=0.01



Net1Targ1Net1Targ0Net0Targ1Net0Targ0precrecallbeNet1Targ1Net1Targ0Net0Targ1Net0Targ0precrecallbe

0.05781140.0.p0.5.smox.train.sum.sum.out32270256494300.82142857140.55709342560.68926099850.0.p0.5.smox.test.sum.sum.out74144098960.84090909090.6491228070.745015949













5341710743257815840276365917570.81531260870.48263661380.64897461134835132159081182480.78541260560.45006050450.6177365551

Second Level, n=150

Rollup up (netscore.pl)L2 score (p=0.5)

Does not take into account duplication of items -- e.g., item in 0.0.0 and 0.0.1 gets counted twice

822593361276010.03065420560.69491525420.3627847299

41229533151065840.12243684990.56671251720.3445746836

60525066221165550.19447123110.49307253460.3437718829

134321428991058800.38536585370.59901873330.4921922935

52155772761239380.08543784850.65370138020.3695696143

49721442131174340.18818629310.70.4440931465

191298796869420.06010069230.66550522650.3628029594

57777681051118380.06914319950.84604105570.4575921276

8133151401958750.20509586280.66968698520.437391424

26231001871267630.07792980370.5835189310.3307243673

53324884571168100.17643164520.53838383840.3574077418

45343272171152910.09476987450.6761194030.3854446387

7191933425971630.27111613880.62849650350.4498063211

700843669424914486740.13828758610.62254597140.3804167787

Second Level, n=150

Rollup up (netscore.pl)L2 score = 0.9


61340571298540.15211970070.51694915250.3345344266

2114655161090720.31213017750.29023383770.3011820076

3825658451184960.40337909190.31132844340.3573537676

94682612961071960.53386004510.42194469220.4779023687

3635304341289850.40649496080.45545796740.4309764641

42111152891184630.27408854170.59295774650.4335231441

1471021140889080.12585616440.5121951220.3190256432

45627242261168820.14339622640.66862170090.4060089636

619920595981060.40220922680.50988467870.4560469528

19110432581288200.15478119940.4253897550.2900854772

3278216631184770.28484320560.33030303030.3075731179

2985293721190890.36033857320.44477611940.4025573463

449366695987300.55092024540.39248251750.4717013814

487111265638614810780.30187159150.43270853690.3672900642

Second Level, n=150



4598731300960.31468531470.38135593220.3480206234

1181756091093620.40273037540.16231086660.282520621

2892469381188150.54018691590.23553382230.3878603691

66148615811075360.57628596340.29482604820.4355560058

2172035801293120.51666666670.27227101630.3944688415

3687433421188350.33123312330.51830985920.4247714912

116589171893400.16453900710.40418118470.2843600959

3228703601187360.27013422820.47214076250.3711374953

493424721986020.53762268270.40609555190.4718591173

1415573081293060.20200573070.31403118040.2580184555

2334467571188520.34315169370.23535353540.2892526145

2021604681194580.55801104970.30149253730.4297517935

269159875989370.62850467290.23513986010.4318222665

34745156778314871870.40254924680.30860797730.355578612

Second Level, n=150

Rollup up (netscore.pl)L1 && L2 score


6431541301630.67368421050.54237288140.6080285459

1511245761094130.54909090910.20770288860.3783968988

4012848261187770.58540145990.32681336590.4561074129

100146812411075540.68141592920.44647636040.5639461448

3421704551293450.667968750.42910915930.5485389547

177895331194890.66541353380.24929577460.4573546542

7358214898710.55725190840.25435540070.4058036545

189944931195120.6678445230.27712609970.4724853113

477180737988460.72602739730.39291598020.5594716887

80813691297820.49689440990.17817371940.3375340647

2252027651190960.52693208430.22727272730.3771024058

2341364361194820.63243243240.34925373130.4908430819

541164603989320.76737588650.47290209790.6201389922

39552081730214902620.65523525510.35133694590.5032861005

Second Level, n=150

Rollup up (netscore.pl)L1 && L2 score (0.50, 0.25)


6641521301530.61682242990.55932203390.5880722319

1802165471093210.45454545450.24759284730.3510691509

4524747751185870.48812095030.36837815810.4282495542

108167111611073510.61700913240.48215878680.5495839596

3692444281292710.60195758560.46298619820.5324718919

177955331194830.65073529410.24929577460.4500155344

7567212898620.52816901410.26132404180.3947465279

1911004911195060.65635738830.2800586510.4682080197

494261720987650.65430463580.40691927510.5306119554

851053641297580.44736842110.18930957680.3183389989

2422757481190230.46808510640.24444444440.3562647754

2571954131194230.56858407080.38358208960.4760830802

593296551988000.66704161980.51835664340.5926991316

42623040699514893030.58367570530.37860886560.4811422854

Second Level, n=150



7055481301390.560.5932203390.5766101695

2734054541091320.40265486730.37551581840.3890853428

5517036761183580.43939393940.44906275470.444228347

128410269581069960.55584415580.57270294380.5642735498

4093173881291980.56336088150.5131744040.5382676428

2231364871194420.62116991640.3140845070.4676272117

8495203898340.4692737430.29268292680.3809783349

2902973921193090.49403747870.42521994130.45962871

569436645985900.56616915420.46869851730.5174338358

1021383471297250.4250.22717149220.3260857461

3475256431187730.39793577980.35050505050.3742204152

3082673621193510.53565217390.45970149250.4976768332

666456478986400.59358288770.58216783220.5878753599

51764856608114874870.51594896330.45980278940.4878758763

Second Level, n=150



6639521301550.62857142860.55932203390.5939467312

2292344981093030.4946004320.31499312240.4047967772

4704057571186560.53714285710.38304808480.460095471

115167310911073490.63103070180.51338090990.5722058058

3762214211292940.62981574540.47176913430.5507924398

2231274871194510.63714285710.3140845070.4756136821

8279205898500.50931677020.28571428570.397515528

2842653981193410.51730418940.41642228740.4668632384

544309670987170.63774912080.44810543660.5429272787

931033561297600.47448979590.20712694880.3408083723

3163556741189430.47093889720.31919191920.3950654082

2811853891194330.60300429180.41940298510.5112036385

600251544988450.7050528790.52447552450.6147642017

47153246654214890970.59226227860.4188504930.5055563858

Second Level, n=150



6640521301540.62264150940.55932203390.59098177170.5892857143

2472884801092490.4616822430.33975240720.40071732510.3914421553

4884457391186160.52304394430.39771801140.46038097780.4518518519

117776710651072550.60545267490.52497769850.56521518670.5623506928

3922484051292670.61250.49184441660.55217220830.5455810717

2431464671194320.62467866320.34225352110.48346609220.4422202002

8483203898460.5029940120.29268292680.39783846940.3700440529

3203503621192560.47761194030.46920821110.47341007570.4733727811

572359642986670.61439312570.4711696870.54278140630.5333333333

1031113461297520.48130841120.22939866370.35535353750.3107088989

3414306491188680.44228274970.34444444440.39336359710.3872799546

3022013681194170.60039761430.45074626870.52557194150.5149190111

620277524988190.69119286510.5419580420.61657545350.6075453209

49553745630214885980.56954022990.44017056050.50485539520.4965676204

Second Level, n=150

Rollup up (netscore.pl)L1 * L2 score, .50 = $p


0006330551301640.67741935480.53389830510.60565883

000131975961094400.57456140350.18019257220.3773769879

0003702288571188330.6187290970.30154849230.4601387946

00092738713151076350.70547945210.4134701160.559474784

0003321314651293840.7170626350.41656210790.5668123714

000176845341194940.67692307690.24788732390.4624052004

0007349214898800.59836065570.25435540070.4263580282

000184814981195250.69433962260.26979472140.482067172

000466127748988990.78583473860.38385502470.5848448817

00078803711297830.49367088610.17371937640.3336951312

0002061597841191390.56438356160.20808080810.3862321849

0002251224451194960.64841498560.33582089550.4921179406

000522143622989530.7849624060.45629370630.6206280562

000000037531718750414906250.68598062510.33339255570.5096865904

Second Level, n=150

Rollup up (netscore.pl)L1 * L2 score, .35


6746511301480.5929203540.56779661020.5803584821

2242575031092800.46569646570.30811554330.3869060045

4894847381185770.50256937310.39853300730.4505511902

117074010721072820.6125654450.52185548620.5672104656

3902434071292720.61611374410.48933500630.5527243752

2201234901194550.64139941690.30985915490.4756292859

8280205898490.50617283950.28571428570.3959435626

2742344081193720.53937007870.40175953080.4705648048

543339671986870.61564625850.44728171330.5314639859

951193541297440.44392523360.21158129180.3277532627

3093466811189520.47175572520.31212121210.3919384687

2942243761193940.56756756760.43880597010.5031867689

633313511987830.66913319240.55332167830.6112274354

47903548646714887950.57447829220.42551301410.4999956531

Second Level, n=150

Rollup up (netscore.pl)L1 * L2 score, .25 = $p*$p


0006746511301480.5929203540.56779661020.5803584821

0002242575031092800.46569646570.30811554330.3869060045

0004894847381185770.50256937310.39853300730.4505511902

000117074010721072820.6125654450.52185548620.5672104656

0003902434071292720.61611374410.48933500630.5527243752

0002201234901194550.64139941690.30985915490.4756292859

0008280205898490.50617283950.28571428570.3959435626

0002742344081193720.53937007870.40175953080.4705648048

000543339671986870.61564625850.44728171330.5314639859

000951193541297440.44392523360.21158129180.3277532627

0003093466811189520.47175572520.31212121210.3919384687

0002942243761193940.56756756760.43880597010.5031867689

000633313511987830.66913319240.55332167830.6112274354

000000047903548646714887950.57447829220.42551301410.4999956531

Second Level, n=150

Rollup up (netscore.pl)L1 * L2 score, .20 = $p*$p

Does not take into account duplication of items -- e.g., item in 0.0.0 and 0.0.1 gets counted twice6951491301430.5750.58474576270.57987288140.5798319328

2553294721092080.43664383560.35075653370.39370018470.3890160183

5215757061184860.47536496350.42461287690.44998892020.4485578993

122689510161071270.57802923150.54683318470.56243120810.5619986248

4083023891292130.57464788730.51191969890.54328379310.5414731254

2381414721194370.62796833770.33521126760.48158980270.4370982553

8591202898380.48295454550.29616724740.38956089640.3671706263

3103103721192960.50.45454545450.47727272730.4761904762

580405634986210.58883248730.47775947280.53329598010.5275125057

1061373431297260.43621399180.23608017820.3361470850.3063583815

3424316481188670.44243208280.34545454550.39394331410.3879750425

3182553521193630.5549738220.47462686570.51480034380.5116653258

663402481986940.62253521130.57954545450.60104033290.6002716161

51214324613614880190.54219163580.45491694060.49855428820.4947348082

1024286481227229760380.54219163580.45491694060.49855428820.4947348082

Second Level, n=150



5315651301790.77941176470.44915254240.6142821535

80406471094970.66666666670.11004126550.3883539661

271989561189630.73441734420.22086389570.4776406199

70818315341078390.79461279460.31578947370.5552011341

288735091294420.79778393350.36135508160.5795695075

145545651195240.72864321610.20422535210.4664342841

6124226899050.71764705880.2125435540.4650953064

128255541195810.83660130720.18768328450.5121422958

36068854989580.84112149530.29654036240.5688309289

61463881298170.57009345790.1358574610.3529754595

141798491192190.64090909090.14242424240.3916666667

183624871195560.74693877550.27313432840.5100365519

41784727990120.83233532930.36451048950.5984229094

2896851836114914920.77288497460.2572621480.5150735613

Second Level, n=150



7477441301170.49006622520.62711864410.5585924346

3436893841088480.33236434110.47180192570.4020831334

65610945711179670.37485714290.53463732680.4547472348

145816747841063480.46551724140.65031222120.5579147313

4674783301290370.49417989420.58594730240.5400635983

3202463901193320.56537102470.45070422540.508037625

95125192898040.43181818180.3310104530.3814143174

3987792841188270.33814783350.58357771260.460862773

676721538983050.48389405870.55683690280.5203654807

1442273051296360.38814016170.32071269490.3544264283

4579185331183800.33236363640.46161616160.396989899

3854272851191910.4741379310.57462686570.5243823984

770826374982700.48245614040.67307692310.5777665317

62438281501414840620.42984026440.55458825620.4922142603

SecondLevel (0-12) w/ test on 0-2 (full 50078/10024) files

0.0.08890-12.0-2-sum2/0.0.0.p0.5.smox.test.sum2.sum2710229913












0.0.1236100-12.0-2-sum2/0.0.12.p0.5.smox.test.sum2.sum243169983822593361276010.03065420560.69491525420.3627847299










































































































































55849112570000000700843669424914486740.13828758610.62254597140.3804167787

BAD-Hier012.02-m1000.Sum2.Sum2

Compute using netscore.pl

SUM2-SUM2

53k training

>256-1000

copied from first sheet

SUM2-SUM2 (top level only)

default c=0.01



Net1Targ1Net1Targ0Net0Targ1Net0Targ0precrecallbeNet1Targ1Net1Targ0Net0Targ1Net0Targ0precrecallbe














5341710743257815840276365917570.81531260870.48263661380.64897461134835132159081182480.78541260560.45006050450.6177365551

Second Level, n=150

Rollup up (netscore.pl)L2 score (p=0.5)


69112491300820.38121546960.58474576270.4829806162

1641185631094190.58156028370.22558459420.403572439

3751658521188960.69444444440.30562347190.5000339582

105640411861076180.72328767120.47100802850.5971478499

42616543711278610.20480769230.53450439150.3696560419

3713153391192630.54081632650.52253521130.5316757689

110196177897330.35947712420.38327526130.3713761928

50744111751151950.10309068730.74340175950.4232462234

610392604986340.60878243510.50247116970.5556268024

124983251297650.55855855860.2761692650.4173639118

2321507581191480.60732984290.23434343430.4208366386

4044152661192030.49328449330.60298507460.548134784

613423531986730.59169884170.53583916080.5637690013

50618853619614834900.36373436830.44958692370.406660646

Second Level, n=150



389801301850.80851063830.32203389830.5652722683

50106771095270.83333333330.06877579090.4510545621

1871110401190500.94444444440.1524042380.5484243412

5125817301079640.8982456140.2283675290.5633065715

171256261294900.87244897960.21455457970.5435017796

2711224391194560.689567430.38169014080.5356287854

7273215898560.49655172410.25087108010.3737114021

153625291195440.7116279070.2243401760.4679840415

34642868989840.89175257730.28500823720.5883804073

43214061

%!ps-adobe-3.0susandumais.com/sigir00.doc · web viewthe hierarchical model showed 2-4%...

Documents