eurocast’01eurocast’01 marta e. zorrilla, josé l. crespo and eduardo mora department of applied...

EUROCAST’01EUROCAST’01EUROCAST’01EUROCAST’01

Marta E. Zorrilla, José L. Crespo and Eduardo Mora

Department of Applied Mathematics and Computer Science

University of Cantabria

An Online Information Retrieval Systems by means of Artificial Neural Networks

An Online Information Retrieval Systems by means of Artificial Neural Networks


Introduction IIntroduction IIntroduction IIntroduction I

What’s about ‘Information

Retrieval Systems’?

structured field search full-text search


Indexing andStoring

Search Interface

Relevance classification

Indexes

DocumentsDocument

transfer

Documents

General processGeneral processGeneral processGeneral process


Documents database

Original documents

‘Pure text’ files Files to index

• Stopwords

• Stemming

• Thesaurus

• List of terms

Text extraction

FilteringLink to

documents

Indexation

Storing

Indexes

Indexing and storingIndexing and storingIndexing and storingIndexing and storing


Classification of Information Retrieval SystemsClassification of Information Retrieval Systems

ClassificationClassificationClassificationClassification

Free dictionary

Clustering

Latent Semantic Indexing

Statistics

Self-organising ANN

In words

In n-gramsInverse indexes

Pre-established dictionary

Vectorial representation


WORD d p s w WORD D A

Ordered word list DictionaryText indexes Text files

Plant 2 3

Plant

Plant

Plant

d = document codep = paragraph numbers = sentence numberw = word in the sentence

D = number of documentsA = number of appearances

Inverse indexInverse indexInverse indexInverse index


inputs

bmu

Neighbourhoodradio

Kohonen’s topological map

Fritzke’s growing topological maps

Self-organising ANNSelf-organising ANNSelf-organising ANNSelf-organising ANN


Distances* * * * * * * * * * * * * * * * * * *

0.14 0.170.230.270.310.380.420.470.570.610.690.730.790.840.890.910.960.99

7 clusters

Clustering statisticsClustering statisticsClustering statisticsClustering statistics


Ak

m x n

D1 D2 D3 Dn-1 Dn

T1T2T3….….….

T m-1

T m

f11

f21

f31

…

…

…

fm-1,1

fm 1

f21

f22

f32

…

…

…

fm-1,2

fm 2

f13

f23

f33

…

…

…

fm-1, 3

fm 3

…… …… ….. ……. f1 n-1

f2 n-1

f3 n-1

…

…

…

fm-1, n-1

fm, n-1

f1 n

f2 n

f3 n

…

…

…

fm-1, n

fm n

Documents

Terms

Singular Value Decomposition

= U Vt

Term vectors

Document vectorsk

k

m x r

r x r r x n

X X

1kkεUqq t

Query

New documents1kkεUdd t

New terms1kkεVtt t

LSILSILSILSI


• Competitive networks (self-organising, p.e.)

a processor in output layer with non-null response

• Radial basis networks

a continuos response , generally in one layer

• Multilayer perceptrons

similar to radial networks, except in activation function and operations made at the connections

ANN for classificationANN for classificationANN for classificationANN for classification


ProposalProposalProposalProposal

Information Retrieval System

doc1 doc2 doc3 doc4 .......... doc n

w1 w2 w3 w4 .......... wn

dictionary word in binary representation

documents in output layer

COES: Spanish dictionary developed by Santiago Rodriguez and Jesús Carretero

Documents: Spanish Civil Code Articles


Test and Results ITest and Results ITest and Results ITest and Results I

Results:

•Error function tends to mistaken minima (gradient is essentially zero)

•The Neural Network needs a processor for each word in the dictionary, i.e., the network isn’t compact

Neural network with Radial Basis Functions

Error function: mean squared error, entropia

Nº documents: 93 Nº words in dictionary: 140

Conclusion: ordinary RBF approach is not appropriate, it is necessary a change of approach or a change of network; we present another network: MLP


Test and Results IITest and Results IITest and Results IITest and Results II

Multilayer Perceptron with tanh activation function

Error function: mean squared error, entropia

Nº documents: 10 Nº words in dictionary: 14

Architecture: 10x5x10 ; 10x7x10 ; 10x10x10

Optimisation methods: Conjugate Gradient, Quasi-Newton with lineal and parabolic minimisation.

Results:

•A 10x5x10 architecture can learn the training set

•The optimisation method can have a definitive importance

•The same method, in different programs, offers different results

•The error function does not make much of a difference

Conclusion: In order to gain insight into optimisation process, we program the network

Results:

•A 10x5x10 architecture almost learn the training set, in 10x10x10 is perfect

•Quasi-Newton with parabolic minimisation is the most efficient method

•Mean squared error offers better results than entropia.

•Sorting the training set by number of occurrences or scaling the output between 0 and 1 doesn’t offer better results.


Test and Results IIITest and Results IIITest and Results IIITest and Results III

Future:

Growing output layer with more documents. It will be necessary to increase the number of hidden neurons when the error becomes high.

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

0,5

5 6 7 8 9 10

Nº hidden neurons

Me

an

sq

ua

red

err

or

Quasi-Newton (Golden)

Quasi-Newton (Brent)

Gc (Golden)

Gc (Brent)


What’s an Information Retrieval System What’s an Information Retrieval System

What does it work?What does it work?

Classification of IRSClassification of IRS

ProposalProposal

A neural network in which each output layer processor represents a document and the input layer receives words in binary representation

ResultsResults

Promising results of MLP in a toy problem

ConclusionsConclusionsConclusionsConclusions