tub @ mediaeval 2012 tagging task: feature selection methods for bag-of-(visual)-words approaches

17
Schmiedeke, Kelm and Sikora Communication Systems Group Technische Universität Berlin 4 October, 2012 Feature Selection Methods for Bag-of-(visual)-Words Approaches

Upload: mediaeval2012

Post on 19-Dec-2014

578 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke, Kelm and SikoraCommunication Systems Group

Technische Universität Berlin

4 October, 2012

Feature Selection Methods for Bag-of-(visual)-Words Approaches

Page 2: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Motivation 2

sports

Page 3: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Lessons from last year

- Features derived from metadata (esp. tags) outperform visual and ASR ones• Metadata: Naive Bayes (non translated)• Visual feat.: SVM (avg. pooled histograms)• ASR transcripts: kNN (JSD)

- Uploader mainly contribute to a single category

3

Page 4: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

This year‘s question

- Does feature selection improve results achieved with BoW model?

4

Page 5: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Feature Selection/ Transformation

- Mutual information:

- Term Frequency:

- PCA (Eigenvalue decomposition):

5

Page 6: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Feature Selection

- Concepts for terms selection:

6

Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)unleaven(0.0782)eeli (0.0782)davideel(0.0781)ministri(0.0780)

daytripp (0.0)adagio (0.0)acustica (0.0)

Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)grittv (0.0881)flander (0.0861)laura (0.0855)economi(0.0747)

sonnet (0.0)screenplai (0.0)acustica (0.0)

Top terms for health:jama (0.0495)health (0.0378)report (0.0357)harta (0.0227)exceric (0.0211)yoga (0.0203)study (0.0192)

ilsr (0.0)resystem (0.0)acustica (0.0)

Page 7: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Feature Selection 7

Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)unleaven(0.0782)eeli (0.0782)davideel(0.0781)misistri(0.0780)

daytripp (0.0)adagio (0.0)acustica (0.0)

Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)grittv (0.0881)flander (0.0861)laura (0.0855)economi(0.0747)

sonnet (0.0)screenplai (0.0)acustica (0.0)

Top terms for health:jama (0.0495)health (0.0378)report (0.0357)harta (0.0227)exceric (0.0211)yoga (0.0203)study (0.0192)

ilsr (0.0)resystem (0.0)acustica (0.0)

- Top-k-Union:

Page 8: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

- Top-k:

Feature Selection 8

Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)unleaven(0.0782)eeli (0.0782)davideel(0.0781)misistri(0.0780)

daytripp (0.0)adagio (0.0)acustica (0.0)

Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)grittv (0.0881)flander (0.0861)laura (0.0855)economi(0.0747)

sonnet (0.0)screenplai (0.0)acustica (0.0)

Top terms for health:jama (0.0495)health (0.0378)report (0.0357)harta (0.0227)exceric (0.0211)yoga (0.0203)study (0.0192)

ilsr (0.0)resystem (0.0)acustica (0.0)

Page 9: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

- Union>th:

0.0002 0.0002 0.0001

Feature Selection 9

Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)unleaven(0.0782)eeli (0.0782)davideel(0.0781)misistri(0.0780)

daytripp (0.0)adagio (0.0)acustica (0.0)

Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)grittv (0.0881)flander (0.0861)laura (0.0855)economi(0.0747)

sonnet (0.0)screenplai (0.0)acustica (0.0)

Top terms for health:jama (0.0495)health (0.0378)report (0.0357)harta (0.0227)exceric (0.0211)yoga (0.0203)study (0.0192)

ilsr (0.0)resystem (0.0)acustica (0.0)

Page 10: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

- Intersection>Th:

0.0002 0.0002 0.0001

Feature Selection 10

Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)…webpythonxboxbigexpo…daytripp (0.0)adagio (0.0)acustica (0.0)

Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)…applgooglteenmusictv…sonnet (0.0)screenplai (0.0)acustica (0.0)

Top terms for health:jama (0.0495)health (0.0378)report (0.0357)…gossipinterviewiphonsantexa…ilsr (0.0)resystem (0.0)acustica (0.0)

Page 11: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Official runs

- Bag of clustered SURF features transformed using PCA• Result does not benefit from transformation

11

official run without FS/FT

mAP 0.2301 0.2309

CA 41.63 % 41.71 %

Page 12: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Official runs

- Bag of filtered ASR transcripts terms (Union>Th)• Result does benefit from selection

12

official run without FS/FT

mAP 0.1035 0.0522

CA 32.53 % 26.54 %

Page 13: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Official runs

- Bag of clustered SURF features filtered using MI and intersection>th strategy• Result does slightly benefit from selection

13

official run without FS/FT

mAP 0.2259 0.2221

CA 40.80 % 40.78 %

Page 14: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Official runs

- Bag of filtered terms derived from tags, title and descriptions (Union>Th)• Result does benefit from selection

14

official run without FS/FT

mAP 0.5225 0.4146

CA 58.18 % 55.70 %

Page 15: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Official runs

- Bag of clustered SURF features transformed using PCA and decision fusion using uploader• Result does benefit from transformation

15

official run without FS/FT

mAP 0.3304 0.2988

CA 52.14 % 49.19 %

Page 16: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Conclusion & Future Work

- FS showed potential for improving the results

- Choice of using MI or TF is not critical, both methods achieve roughly same results

• Metadata (mAP) : MI12004 (0.5277) vs. TF14976 (0.5275)

- Investigation in different scaling schemes (NB)

- Use of class-independent selection score (MI)

16

Page 17: TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches

Schmiedeke: “Feature Selection Methods for BoW Approaches”

Backup 17

Thank you!Questions ?