human translation - machine translation natural language processing (nlp) and translation anca...

61
Human Translation - Machine Translation Natural Language Processing (NLP) and Translation Anca Christine Pascu Université de Bretagne Occidentale, LabSTICC, Brest, France

Upload: clifford-rose

Post on 22-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Human Translation - Machine Translation

Natural Language Processing (NLP) and Translation

Anca Christine PascuUniversité de Bretagne Occidentale, LabSTICC, Brest, France

A. P. Genova, May 2015

Outline

Cognition – Language – Translation

The Natural Language Processing (NLP) and TranslationModelling in Translation Computational LogicLogic and TranslationComputation and TranslationConcepts and Objects in TranslationThe Text Structure

The Lattice Structure of a TextFormal Concept Analysis and the Text Structure

Human Translation – Machine Translation

2

A. P. Genova, May 2015

Cognition – Language – Translation

Some Basic Ideas

3

A. P. Genova, May 2015 4

G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969in Desclés, J-P. (1998), « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, FribourgDédié à Evandro Agazzi

A. P. Genova, May 2015 5

It is true that we can express the same meaning (tought) in different languages; but the psychologic trappings (harness), the tought dressing will be osten different. That is why, the foreiner languages learning is useful for the education in logic. We learn to better distinguish the verbal peel from the kernel to which it is organically linked in any language. This is how the differences between natural languages can facilitate our apprehension of that which is logic.

G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969 (Posthumous Writings)in Desclés, J-P. (1998), « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, FribourgDedicate to Evandro Agazzi

A. P. Genova, May 2015

Cognition – Language - Translation

K. Cognition: a set of processes related to knowledge:attention, memory, psychologyjudgement, reasoning, « computation », problem

solving, decision making logic, computer science

comprehention and production of language linguistics, psychology

6

A. P. Genova, May 2015 7

Reasoning

Judgement

Computation

Problem solving

Decision makong

Cognition

Attention

Memory

Psychology

Logic, CS

Linguistics,Psychology

Language comprehention

Language production

A. P. Genova, May 2015

Some Questions about Language and Cognition

Natural languages are they representations of the world ?

Each natural language can projects itself on the external world ?

Each natural language can construct its own cognitive representations ?

Do natural languages refer to a universal system of mental representations ?

Jean-Pierre Desclés, « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg, 1998.

8

A. P. Genova, May 2015

Three Epistemological Hypotheses

Relativistic hypothesis – Saphir-Whorf (Whorf, 1966);

Anti-relativistic hypothesis – Fodor (Fodor, 1975) Shaumyan (Shaumyan, 1977)

Anti-anti-relativistic hypothesis – Desclés (Desclés, 1998 )

9

A. P. Genova, May 2015

Translationgeneral schema

SOURCE Languag

eTrasfert

TARGET Languag

e

10

A. P. Genova, May 2015

Vauquois Triangle

11

A. P. Genova, May 2015

Natural Language Processing (NLP) and Translation

12

A. P. Genova, May 2015

Linguistics - Logic

Natural Language –Language

Linguistics: Lexis, Morphology, Syntaxe, Semantics – Discourse - Text

Logic: Hypoteses, Inferences, Conclusions –Reasonning Inferences: Deduction, Induction, Abduction

Meaning Item (Unit) – Translation Item (Unit) (Ballard, 2004)

Ordered Structure of a Text : Argumentatif Structure, Descriptif Structure

13

NLP Fields via Linguistics

Lexical levelErrors detection and correction Automatic documentation, indexing, search engine

Morphological levelMorphologic annotation

Syntactic levelGrammars and parsers

Semantic level Automatic processing of the meaning Automatic text comprehention

Machine translationA. P. Genova, May 2015 14

NPL fields via applications

Automatic Annotation of CorporaMorphologic annotationSemantic annotation

Text Mining; Indexing Automatic summarizingText GenerationMachine Translation: Automatic

translation Computer-Assisted Translation

A. P. Genova, May 2015 15

Definition

Natural Language Processin (NLP) : multidisciplinary field studying a set of: Theories (linguistics, mathematic, logic....);

Methods (procedures, algorithmes....); Computer Science Systems (languages, procedures......)

For analysis-synthesis in natural languages solving problems related to language and natural

languages

A. P. Genova, May 2015 16

Lexical LevelWord Processing

Spell Checker Lexical Labeling: word labeling with linguistic labelsConcordancers: a computer program searching for a word

all its occurrences in a text with their contexts (http://ecolore.leeds.ac.uk/xml/materials/overview/tools/concordancer.xml?lang=fr)Concordancers are used to build linguistic corpora La La forme du

mot : lemme, forme fléchie ......

Lemmatizers : lemma –inflected form

A. P. Genova, May 2015 17

Syntactic Level Grammars and Parsers

The techniques of analysis are almost the same as these used in Formal Languages.

Formal Grammar = a system of rules which allow, starting from a vocabulary : to analyse a string to generate a string

Formal Language = finite set of words

Word = concatenated string of elements of a vocabulary.

A. P. Genova, May 2015 18

Grammars and ParsersTypes of Formal Grammars

Chomsky’s classification:

L3 L⊂ 2 L⊂ 1 L⊂ 0 ;

Categorial Grammar (Grammaires catégorielles) (CG)

Lexical Functional Grammars (Grammaires lexicales fonctionnelles) (LFG)

Generalized Phrase Structure Grammar (Grammaires syntagmatiques généralisées) (GPSG)

Tree Adjoint Grammar (Grammaires d'arbres adjoints) (TAG)

Head Phrase Structure Grammar (Grammaires syntagmatiques guidées par les têtes) (HPSG)

Dependency Grammar (Grammaires de dépendences) (DG)

A. P. Genova, May 2015 19

Grammars and Parsers

The steps of a syntactic analysis:

Segmentation (tagger) ;

Lemmatisation (identifying words in their canonic form)

Labeling (identifying the morpho-syntactic category)

La relation Syntax – Semantics :

Surface Structure – Deep structure

Typing Lexical Units (Categorial grammars).

A. P. Genova, May 2015 20

Example of CG

Jean aime Marie

N (S\N)/N N

Types : N, S basic types(S\N)/N derived type

A. P. Genova, May 2015 21

A. P. Genova, May 2015 22

CG Rules

Right Application:

OPER : T1/T2 OP : T2>

(OPER OP) : T1

Left Application:

OPER : T1\T2 OP : T2<

(OPER OP) : T1

A. P. Genova, May 2015 23

Analysis :

Jean aime MarieN (S\N)/N N

S\N

S

>

<

Computer Text Comprehention

Meaning problem: there are two main positions in the formalisation of the meaning:An independent linguistic levelThe interdependence between the

linguistic level and the level of mind (which implies the degree of dependence)

A. P. Genova, May 2015 24

Computer Text Comprehention and Automatic Processing

Semantics:

Verifunctionel (truth conditions);

Intensional (based on corresponding concepts);

Extetional (based on corresponding objets) ;

Componential (word decomposition into primitive units of meaning

Procedural (an expression is a procedure containing a set of actions);

Argumentative (the chain of speech acts).

A. P. Genova, May 2015 25

Computer Text Comprehention and Automatic Processing

Structural Approaches of the Text

Text Grammars (D. Rumelhart, 1975):Story = Exposition + Theme + Intrigue + Resolution

Rhetorical Structure Theory (W. Mann, S. Thompson, 1987):

A text is a set of units related by relations

A. P. Genova, May 2015 26

Computer Text Comprehention and Automatic Processing

Text Thematic Analysis:

Analysis based on knowledge representation (semantic network, concept maps);

Analysis using statistic tools.

A. P. Genova, May 2015 27

Computer Text Comprehention and Automatic Processing

Concept mapshttp://en.wikipedia.org

/wiki/Concept_map

WORDNET http://wordnet.princeton.edu

Ontology = a network of objects and concepts related by relations; it is specific to a domain)

A. P. Genova, May 2015 28

A. P. Genova, May 2015

Computer Text Comprehention and Automatic Processing

Argumentative Structure of a Text: the text is organise in «argumentation units»Hypothesis ConclusionRules of inferenceElements outside of text

29

Semantic Annotation

Text Annotation: labeling the text accordig to a set of categories a priori defined.

Semantic Annotation: categories are semantic classes (classes of meaning based on relations). CausalityDefintionUtteranceQuotation

A. P. Genova, May 2015 30

A. P. Genova, May 2015

Problems in Translation related to Modelling

for Machine Translation

31

A. P. Genova, May 2015

Translation unit

Translation Unit (T U) (Balard, 2004): elementary unit of meaning in source language (Ls) which can be tranfered in the target language (Lt).

Computer Science: the form of the source file after it is passed by C-preprocessor – in this case the output is deterministic and it depends only of the input and the rules.

Translation: A pair (TUs-TUt) with the property that it is an « equivalence » between TUs and TUt. It depends on:Concepts, Sentence, phrase, paragraphe

32

A. P. Genova, May 2015 33

Concepts, concept network, ontologies

Concept (C) : Set of specific features (more primitive

than the notion) (Int C) ;The concept is expressed in a natural

language by a word ;Some authors denote this pair by term

(T). We consider it as a concept with its «language code» (the word).

C = (Int C, W).

A. P. Genova, May 2015 34

Concepts, concept network, ontologies

The concept in a language is dependent of it, i.e. of the cognitive representations in this language

Concepts are organised in networks

They have not the same status (position)

The network in a language is different of the network in other (Desclés, 2006)

Int C as a network (Desclés, Pascu, 2011):

A. P. Genova, May 2015 35

officer of the watch

officer to watch

officier de quart

officier

quart

surveiller

quarter

Il est logique d'interpréter cette assertion par......It makes sense to interpret this statement by ......

Int s

............

Int c

....... .........

..... .....

Two intensions of the same concept

A. P. Genova, May 2015

Computer Science: cloud computing – traitement des données hautement distribuées

Mathematics: rough set – ensemble approximatif (ensemble grossier)

36

Ext E

E Int E

Fr E

Examples

A. P. Genova, May 2015 37

ConceptsLinks between concepts – global network

Inheritence –comprehension relation

..... .....

The Logic of Determination of Objects (LDO)

A. P. Genova, May 2015 38

The Logic of Determination of Objects (LDO)

ObjectsLinks between objects – local networkDetermination –relation between

objects

σ

A. P. Genova, May 2015 39

The Logic of Determination of Objects (LDO)

The link between objects and concepts

f--- f

ordered set - filter

ordered set - ideal

A. P. Genova, May 2015

FORMAL CONCEPT ANALYSIS (FCA)

40

A. P. Genova, May 2015

FCA-exemple

A1 A2 A3

o1 1 1 1

o2 1

o3 1 1 1

o4 1 1

41

A. P. Genova, May 2015

FCA

OBJ –the set of objects

ATT – the set of attributes

R – binary relation between OBJ and ATT

K = (OBJ, ATT, R) – formal context

O OBJ: O↑ is the set of all attibutes commun to all objects in ⊆O

A ATT: A⊆ ↓ is the set of all objects commun to all attributes in A

42

A. P. Genova, May 2015

Formal Concept

Formal Concept: (Ext, Int) such that :Ext↑ = Int Int↓ = Ext

Subconcept – superconcept (A1, B1)<= (A2, B2) iff

A1⊆ A2 (B2 ⊆ B1)

43

A. P. Genova, May 2015

Contexte formel : (OBJ, ATT, R)C1 = ({o1,o3}, {A1, A2})C2 = ({o1,o3}, {A1, A2, A3})C3 = ({o1,o4}, {A1, A3})C4 = ({o1,o2,o3, o4 }, {A1})C5 = ({o1,o3}, {A2})C6 = ({o1,o3, o4}, {A3})

44

Example Concepts

A. P. Genova, May 2015

Galois Lattice

Two ordered sets: (OBJ, <OBJ), (ATT, <ATT)

Two mappings:φ: OBJ ATT, ψ: ATT OBJ such that

If o1<OBJ o2 then φ(o1) >ATT φ(o2)

If A1<ATT A2 then ψ (o1) >ATT ψ (o2)

o <OBJ ψ(φ(o)) and A <ATT φ(ψ(A))45

A. P. Genova, May 2015

The Context Lattice ∅1, 2, 3, 4o o o o

A1o1,o2,o3,o4 A2

o1,o3A3o1,o3,o4

A1,A2o1,o3

A1,A3o1,o3,o4

A2,A3o1,o3

A1,A2,A3o1,o3

46

A. P. Genova, May 2015

The Great Gatsby – the last paragraphe

47

A. P. Genova, May 2015

P1 P2 P3 P4 P5 P6 P7 P8 P9

O1 1 1 1

O2 1 1 1

O3 1

O4 1 1

O5 1

O6 1

O7 1 1

O8 1

O9 1 1 1

O10 1 1 1

O11 1 1

O12 1

O13 1

O14 1

O15 1

O16 1

O17 1 1 1

48

P11,2,3,4

P1P21,4

P21,4,5,6,7

P37,8,9,10,11

P42,9,11,12,17

P510,13,14

P610,15

P72,9,17

P8∅

P916,17

P1P31

P1P4...2

P2P3...1,7

P4P7...2,9,17

P3P4...9,11

P1P2P31

P1P4P72

P3P4P79

P4P7P917

P1P2P3P4P5P5P7P8P9 ∅

P7P917

............................

............................

A. P. Genova, May 2015 49

A. P. Genova, May 2015

Interpretation

No differeces between the two lattices

The idea of « the pursuit of happinness »

50

A. P. Genova, May 2015

Applications of the FCA Model to Translation

Object Attributes Independent/Together

Semantic classes Segments of text Independent

Segments of text Semantic classes Independent

Segments of text Semantic classes Independent

Segments of text Semantic classes

Together

51

A. P. Genova, May 2015

Conclusions about FCA It gives the lattice structure of a text depending of the choice of

objects and attributes

The lattice structure can be used to model the translation unit and to implement it in a translation engine

The choice of objects:semantic classesstyle elements

The choice of attributes:

Segments of text; type of segmentation

To apply FCA model in an appropriate manner to a corpus of texts

52

A. P. Genova, May 2015

Human Translation-Machine Translation

53

A. P. Genova, May 2015 54

Translation Engine Types

Rules Based - Grammars

Learning-Model Based - Statistics

A. P. Genova, May 2015

DISSCUSSION Modelling

Define : Translation Unit – Meaning Unit and their Computer Model

Transfer Rules based on these primitivesLinguistic Architecture versus Computer Architecture – to give

a degree of unificationArchitecture

Translation Systems containing: Semantic AnnotatorKey Word SearcherDomain Ontology of Source Language – Target LanguageAppropriate Tools for Translation Data Mining

55

A. P. Genova, May 2015

References

BALLARD M., (2004), « La théorisation comme structuration de l’action du traducteur », in La Linguistique, n. 40, Linguistique et traductologie, 2004/1, pp. 51-65. http://www.cairn.info/revue-la-linguistique-2004-1-page-51.htm.

BAKER M., (1992), In Other Words: A Coursebook on Translation, Londres/New York, Routledge, 1992.

CURRY H. B., FEYS R., (1958), Combinatory Logic, vol.1, North Holland.

56

References

DESCLES J.-P (2003), «La grammaire Applicative et Cognitive construit-elle des représentations universelles ? »,http://linx.revues.org/226

DESCLES, J-P. (1998), « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg.

ENGLAND R., HANSON S., (2008), « Technical Translation and a Role for FCA », International Conference on Advanced Language Processing and Web Information Technology, IEEE, 2008,pp 99-103.

57A. P. Genova, May 2015

A. P. Genova, May 2015

References

FODOR, J.A. (1975), The Language of Tought, Harvard University Press, Cambridge Mass.

GANTER B., STUMME G., WILLE R., (2005), FormalConcept Analysis, Foundations and Applications, Springer,2005.

PASCU A., DESCLES J.-P (2005), « Modélisation sémantique et logique de la catégorisation », LALICC, Paris-Sorbonne, http://lalic.paris-sorbonne.fr/AXESRECHERCHE/operation5.html

SHAUMYAN, S. (1977), Applicational Grammar as a Semantic Theory of Natural Language, Chicago University Press.

WHORF, B.L. (1966), Linguistique et anthropologie, Payot, Paris (Language Thought and Reality, Wiley and Sons, New York, 1958).

58

A. P. Genova, May 2015 60

Fred Sommers, The Logic of Natural Languages, Oxford University Press, 1984

« There is as much truth in beauty as is beauty in truth. »

A. P. Genova, May 2015 61

THANK YOU !