language technologies for semantic markup in mathematics

2
Language technologies for semantic markup in mathematics Olga Caprotti Department of Mathematics and Statistics, P.O. Box 68 (Gustaf H¨ allstr¨ omin katu 2b), FI-00014 University of Helsinki, Helsinki, Finland. The Web Advanced Learning Technologies project showed how, by applying state of the art language technologies to semantic markup of mathematics, it is possible to automatically produce renderings of mathematical content in a variety of languages. The intended application area of this work is that of computer assisted assessment and testing in mathematics where standard- ized entry examinations for perspective students could be used independently of country and language. A similar multilingual approach can be developed for any situation in which it is possible to identify a specific mathematical jargon. 1 Introduction There are many reasons why semantic markup of mathematics ought to be preferred to pure presentation markup when dealing with communication and manipulation of mathematics on the computer. For one, presentation markup can be associated to a semantic representation by conventional stylesheet transformations that nowadays are supported by browsers natively. Moreover, the disambiguity of semantic representations allows for manipulations by computer algebra systems, for semantic searches and for automated translation of the content to a number of natural languages. In our increasingly connected world in which we are running the risk of loosing our cultural identities, the possibility of maintaining linguistic differences becomes important. Mathematical markup languages like OpenMath and MathML [1,2] offer the possibility to represent mathematical content in a level of abstraction that is not dependent on localized information by focusing on the semantics of the mathematical object. All localization aspects of mathematics, such as those influenced by notation, by culture, and by language, can be postponed to the rendering phase in charge of displaying the content’s markup. While typesetting of mathematical markup has been the object of a numerous efforts, from MathML-presentation to SVG converters, the rendering of mathematics in a ”verbalized” jargon in natural language has not yet received similar attention. The WebALT EU eContent project [3] concentrated on the application of language technologies to the automatic generation of text from mathematical markup. The work focused on creating software tools and solutions for interactive mathematical exercises and drills used in online testing and assessment. Mathematical jargon is an important aspect of the education of students. Not only does a teacher train pupils in problem solving skills, but she also makes sure that they acquire a proper way of expressing mathematical concepts. To our knowledge, digital e-learning resources have used representations in which text is intermixed with mathematical expressions even in situations where the actual abstract representation, for instance of the statement of a theorem, can be reduced to a single mathematical object. One reason for this representation choice is that the rendering process would otherwise produce a symbolic, typeset mathematical formula that might prove too difficult to understand for the students or simply just too hard to read. However, by representing this kind of mathematical text in a language-independent format such as the one provided by markup languages, language technologies are able to generate the same text in a variety of languages including English, Spanish, Finnish, Swedish, French and Italian. Note that, in case of e-learning materials, the multilingual methodology multiplies many times over the value of the language-independent content, and moreover it also allows to produce content that can be used across borders thus con- tributing to standardizing curricula. 2 Language technologies for mathematics Computational linguistics usually is most effective in constrained domains which adopt a specific jargon. Mathematics is one example of such a domain, and in particular, the expressions used when formulating exercises and drill questions are of few simple types [4,5] that can be effectively treated by language technologies. Moreover, mathematics also has the property of being exact and unambiguous when its representation captures the meaning of the mathematical object and not a format solely intended for typesetting. Hence, when given a representation of the mathematical meaning of an expression, it is likely to be able to produce natural language presentations without loss of information. As it turns out, interactive exercises, to be executed in a computer environment, are specified in a way that separates the logic, algorithmic aspects from the statement of the question [6] in different layers, each embedded in the outer one: Corresponding author E-mail: olga.caprotti@helsinki.fi PAMM · Proc. Appl. Math. Mech. 7, 1010503–1010504 (2007) / DOI 10.1002/pamm.200700256 © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Upload: olga-caprotti

Post on 06-Jul-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Language technologies for semantic markup in mathematics

Language technologies for semantic markup in mathematics

Olga Caprotti∗

Department of Mathematics and Statistics, P.O. Box 68 (Gustaf Hallstromin katu 2b), FI-00014 University of Helsinki,Helsinki, Finland.

The Web Advanced Learning Technologies project showed how, by applying state of the art language technologies to semanticmarkup of mathematics, it is possible to automatically produce renderings of mathematical content in a variety of languages.The intended application area of this work is that of computer assisted assessment and testing in mathematics where standard-ized entry examinations for perspective students could be used independently of country and language. A similar multilingualapproach can be developed for any situation in which it is possible to identify a specific mathematical jargon.

1 Introduction

There are many reasons why semantic markup of mathematics ought to be preferred to pure presentation markup when dealingwith communication and manipulation of mathematics on the computer. For one, presentation markup can be associatedto a semantic representation by conventional stylesheet transformations that nowadays are supported by browsers natively.Moreover, the disambiguity of semantic representations allows for manipulations by computer algebra systems, for semanticsearches and for automated translation of the content to a number of natural languages.

In our increasingly connected world in which we are running the risk of loosing our cultural identities, the possibility ofmaintaining linguistic differences becomes important. Mathematical markup languages like OpenMath and MathML [1, 2]offer the possibility to represent mathematical content in a level of abstraction that is not dependent on localized informationby focusing on the semantics of the mathematical object. All localization aspects of mathematics, such as those influenced bynotation, by culture, and by language, can be postponed to the rendering phase in charge of displaying the content’s markup.While typesetting of mathematical markup has been the object of a numerous efforts, from MathML-presentation to SVGconverters, the rendering of mathematics in a ”verbalized” jargon in natural language has not yet received similar attention.The WebALT EU eContent project [3] concentrated on the application of language technologies to the automatic generationof text from mathematical markup. The work focused on creating software tools and solutions for interactive mathematicalexercises and drills used in online testing and assessment.

Mathematical jargon is an important aspect of the education of students. Not only does a teacher train pupils in problemsolving skills, but she also makes sure that they acquire a proper way of expressing mathematical concepts. To our knowledge,digital e-learning resources have used representations in which text is intermixed with mathematical expressions even insituations where the actual abstract representation, for instance of the statement of a theorem, can be reduced to a singlemathematical object. One reason for this representation choice is that the rendering process would otherwise produce asymbolic, typeset mathematical formula that might prove too difficult to understand for the students or simply just too hardto read. However, by representing this kind of mathematical text in a language-independent format such as the one providedby markup languages, language technologies are able to generate the same text in a variety of languages including English,Spanish, Finnish, Swedish, French and Italian.

Note that, in case of e-learning materials, the multilingual methodology multiplies many times over the value of thelanguage-independent content, and moreover it also allows to produce content that can be used across borders thus con-tributing to standardizing curricula.

2 Language technologies for mathematics

Computational linguistics usually is most effective in constrained domains which adopt a specific jargon. Mathematics is oneexample of such a domain, and in particular, the expressions used when formulating exercises and drill questions are of fewsimple types [4, 5] that can be effectively treated by language technologies. Moreover, mathematics also has the property ofbeing exact and unambiguous when its representation captures the meaning of the mathematical object and not a format solelyintended for typesetting. Hence, when given a representation of the mathematical meaning of an expression, it is likely to beable to produce natural language presentations without loss of information.

As it turns out, interactive exercises, to be executed in a computer environment, are specified in a way that separates thelogic, algorithmic aspects from the statement of the question [6] in different layers, each embedded in the outer one:

∗ Corresponding author E-mail: [email protected]

PAMM · Proc. Appl. Math. Mech. 7, 1010503–1010504 (2007) / DOI 10.1002/pamm.200700256

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Page 2: Language technologies for semantic markup in mathematics

Mathematics: captures the semantic representation of the mathematical concept in OpenMath or Content MathML

Sentence: encodes the statement of the question or problem and provides multilinguality by natural language extension toOpenMath

Problem: represents the algorithmic flow and parametric instantiation

This layered representation supports multilinguality when the sentence fragment, used for posing the problem, is encodedin a language-independent format which captures the meaning unambiguously and the flavor of the natural language to begenerated. This technique applies only to problem sentences that are strictly related to a mathematical object or notion, forinstance problems about computing the value of an integral or a limit of a function. Other types of mathematical exercises,involving a story or a narrative question cannot be tackled by this approach since this would amount to natural languagetranslation in an arbitrary domain. As an example, consider the semantic representation in OpenMath

attrib([nlg:mood nlg:imperativenlg:tense nlg:present,nlg:directive nlg:determine],plangeo1:are\_on\_line(A,B,C))}.

It will generate the following verbalizations, which show slight linguistic differences:

• Determine if A, B and C are collinear.

• Maarita ovatko A, B ja C suoralla.

• Determina si A, B y C son colineales.

• Determiner si A, B et C sont sur une droite.

• Determina se A, B e C sono su una linea.

• Bestam om A, B och C ar p en linje.

The WebALT mathematical grammar libraries have been developed as components for the GF Grammatical Frameworksoftware [7] for computational linguistics. They can be used via a web service and have been integrated in applications likeeditors and testing and assessment platforms [8]. The current version of the mathematical grammar library handles English,Swedish, Finnish, Spanish, Italian, Catalan and French with varying degrees of accuracy.

3 Concluding Remarks

Preserving the semantic content of the mathematical fragments used in online documents allows for unexpected applicationsof technologies that might, at first, look very distant from mathematics. We have shown how, by choosing a content-preservingand language independent representation for interactive exercises, software for computational linguistics can derive localizedpresentations of the exercises in a natural language of choice. The project WebALT has produced a showcase consisting of acollection of multi-lingual interactive exercises together with supporting software applications for editing and for playing theexercises. The mathematical grammar library for GF is maintained by Oy WebALT and will be extended to a wider coveragein the future. A similar approach will work well for any linguistic domain which employs a strictly codified jargon, typicallyany scientific field.

References

[1] S. Buswell, O. Caprotti, D. Carlisle, M. Dewar, M. Gaetano, and M. Kohlhase, The OpenMath Standard 2.0, The OpenMath Consor-tium, http://www.openmath.org/standard, 2004.

[2] Mathematical Markup Language (MathML) Version 3.0.[3] WebALT, Web advanced learning technologies, http://www.webalt.net, 2005-2006, EDC-22253.[4] L. Carlson, J. Saludes, and A. Strotmann, Study of the state of the art in multilingual and multicultural creation of digital mathematical

content, Deliverable D1.2, WebALT Project EDC-22253, 2005.[5] O. Caprotti, L. Carlson, M. Seppl, and A. Strotmann, Web advanced learning technologies for assessment in mathematics, in: Proceed-

ings of the III International Conference on multimedia and ICT’s in Education, edited by Formatex, Recent Research Developmentsin Learning Technologies Vol. 3 ().

[6] M. Mavrikis, MathQTI Draft Specification, http://www.maths.ed.ac.uk/mathqti/, 2005.[7] A. Ranta, The Journal of Functional Programming 14(2), 145–189 (2004), http://www.cs.chalmers.se/ aarne/GF/.[8] M. Seppala, S. Xamb, and O. Caprotti, Novel aspects of the use of ict in mathematics education, in: Proceedings of the International

Conference on Engineering Education, Instructional Technology, Assessment, and E-learning (EIAE 06, (December 2006).

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

ICIAM07 Minisymposia – 01 Computing 1010504