language technologies for semantic markup in mathematics
TRANSCRIPT
![Page 1: Language technologies for semantic markup in mathematics](https://reader035.vdocuments.mx/reader035/viewer/2022081822/575024ad1a28ab877eb00462/html5/thumbnails/1.jpg)
Language technologies for semantic markup in mathematics
Olga Caprotti∗
Department of Mathematics and Statistics, P.O. Box 68 (Gustaf Hallstromin katu 2b), FI-00014 University of Helsinki,Helsinki, Finland.
The Web Advanced Learning Technologies project showed how, by applying state of the art language technologies to semanticmarkup of mathematics, it is possible to automatically produce renderings of mathematical content in a variety of languages.The intended application area of this work is that of computer assisted assessment and testing in mathematics where standard-ized entry examinations for perspective students could be used independently of country and language. A similar multilingualapproach can be developed for any situation in which it is possible to identify a specific mathematical jargon.
1 Introduction
There are many reasons why semantic markup of mathematics ought to be preferred to pure presentation markup when dealingwith communication and manipulation of mathematics on the computer. For one, presentation markup can be associatedto a semantic representation by conventional stylesheet transformations that nowadays are supported by browsers natively.Moreover, the disambiguity of semantic representations allows for manipulations by computer algebra systems, for semanticsearches and for automated translation of the content to a number of natural languages.
In our increasingly connected world in which we are running the risk of loosing our cultural identities, the possibility ofmaintaining linguistic differences becomes important. Mathematical markup languages like OpenMath and MathML [1, 2]offer the possibility to represent mathematical content in a level of abstraction that is not dependent on localized informationby focusing on the semantics of the mathematical object. All localization aspects of mathematics, such as those influenced bynotation, by culture, and by language, can be postponed to the rendering phase in charge of displaying the content’s markup.While typesetting of mathematical markup has been the object of a numerous efforts, from MathML-presentation to SVGconverters, the rendering of mathematics in a ”verbalized” jargon in natural language has not yet received similar attention.The WebALT EU eContent project [3] concentrated on the application of language technologies to the automatic generationof text from mathematical markup. The work focused on creating software tools and solutions for interactive mathematicalexercises and drills used in online testing and assessment.
Mathematical jargon is an important aspect of the education of students. Not only does a teacher train pupils in problemsolving skills, but she also makes sure that they acquire a proper way of expressing mathematical concepts. To our knowledge,digital e-learning resources have used representations in which text is intermixed with mathematical expressions even insituations where the actual abstract representation, for instance of the statement of a theorem, can be reduced to a singlemathematical object. One reason for this representation choice is that the rendering process would otherwise produce asymbolic, typeset mathematical formula that might prove too difficult to understand for the students or simply just too hardto read. However, by representing this kind of mathematical text in a language-independent format such as the one providedby markup languages, language technologies are able to generate the same text in a variety of languages including English,Spanish, Finnish, Swedish, French and Italian.
Note that, in case of e-learning materials, the multilingual methodology multiplies many times over the value of thelanguage-independent content, and moreover it also allows to produce content that can be used across borders thus con-tributing to standardizing curricula.
2 Language technologies for mathematics
Computational linguistics usually is most effective in constrained domains which adopt a specific jargon. Mathematics is oneexample of such a domain, and in particular, the expressions used when formulating exercises and drill questions are of fewsimple types [4, 5] that can be effectively treated by language technologies. Moreover, mathematics also has the property ofbeing exact and unambiguous when its representation captures the meaning of the mathematical object and not a format solelyintended for typesetting. Hence, when given a representation of the mathematical meaning of an expression, it is likely to beable to produce natural language presentations without loss of information.
As it turns out, interactive exercises, to be executed in a computer environment, are specified in a way that separates thelogic, algorithmic aspects from the statement of the question [6] in different layers, each embedded in the outer one:
∗ Corresponding author E-mail: [email protected]
PAMM · Proc. Appl. Math. Mech. 7, 1010503–1010504 (2007) / DOI 10.1002/pamm.200700256
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
![Page 2: Language technologies for semantic markup in mathematics](https://reader035.vdocuments.mx/reader035/viewer/2022081822/575024ad1a28ab877eb00462/html5/thumbnails/2.jpg)
Mathematics: captures the semantic representation of the mathematical concept in OpenMath or Content MathML
Sentence: encodes the statement of the question or problem and provides multilinguality by natural language extension toOpenMath
Problem: represents the algorithmic flow and parametric instantiation
This layered representation supports multilinguality when the sentence fragment, used for posing the problem, is encodedin a language-independent format which captures the meaning unambiguously and the flavor of the natural language to begenerated. This technique applies only to problem sentences that are strictly related to a mathematical object or notion, forinstance problems about computing the value of an integral or a limit of a function. Other types of mathematical exercises,involving a story or a narrative question cannot be tackled by this approach since this would amount to natural languagetranslation in an arbitrary domain. As an example, consider the semantic representation in OpenMath
attrib([nlg:mood nlg:imperativenlg:tense nlg:present,nlg:directive nlg:determine],plangeo1:are\_on\_line(A,B,C))}.
It will generate the following verbalizations, which show slight linguistic differences:
• Determine if A, B and C are collinear.
• Maarita ovatko A, B ja C suoralla.
• Determina si A, B y C son colineales.
• Determiner si A, B et C sont sur une droite.
• Determina se A, B e C sono su una linea.
• Bestam om A, B och C ar p en linje.
The WebALT mathematical grammar libraries have been developed as components for the GF Grammatical Frameworksoftware [7] for computational linguistics. They can be used via a web service and have been integrated in applications likeeditors and testing and assessment platforms [8]. The current version of the mathematical grammar library handles English,Swedish, Finnish, Spanish, Italian, Catalan and French with varying degrees of accuracy.
3 Concluding Remarks
Preserving the semantic content of the mathematical fragments used in online documents allows for unexpected applicationsof technologies that might, at first, look very distant from mathematics. We have shown how, by choosing a content-preservingand language independent representation for interactive exercises, software for computational linguistics can derive localizedpresentations of the exercises in a natural language of choice. The project WebALT has produced a showcase consisting of acollection of multi-lingual interactive exercises together with supporting software applications for editing and for playing theexercises. The mathematical grammar library for GF is maintained by Oy WebALT and will be extended to a wider coveragein the future. A similar approach will work well for any linguistic domain which employs a strictly codified jargon, typicallyany scientific field.
References
[1] S. Buswell, O. Caprotti, D. Carlisle, M. Dewar, M. Gaetano, and M. Kohlhase, The OpenMath Standard 2.0, The OpenMath Consor-tium, http://www.openmath.org/standard, 2004.
[2] Mathematical Markup Language (MathML) Version 3.0.[3] WebALT, Web advanced learning technologies, http://www.webalt.net, 2005-2006, EDC-22253.[4] L. Carlson, J. Saludes, and A. Strotmann, Study of the state of the art in multilingual and multicultural creation of digital mathematical
content, Deliverable D1.2, WebALT Project EDC-22253, 2005.[5] O. Caprotti, L. Carlson, M. Seppl, and A. Strotmann, Web advanced learning technologies for assessment in mathematics, in: Proceed-
ings of the III International Conference on multimedia and ICT’s in Education, edited by Formatex, Recent Research Developmentsin Learning Technologies Vol. 3 ().
[6] M. Mavrikis, MathQTI Draft Specification, http://www.maths.ed.ac.uk/mathqti/, 2005.[7] A. Ranta, The Journal of Functional Programming 14(2), 145–189 (2004), http://www.cs.chalmers.se/ aarne/GF/.[8] M. Seppala, S. Xamb, and O. Caprotti, Novel aspects of the use of ict in mathematics education, in: Proceedings of the International
Conference on Engineering Education, Instructional Technology, Assessment, and E-learning (EIAE 06, (December 2006).
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ICIAM07 Minisymposia – 01 Computing 1010504