Download - THE DATA TRANSCRIPTION AND ANALYSIS (DTA) TOOL. HANDS ON WORKSHOP Development of Linguistic Linked Open Data (LLOD) Resources for Collaborative Data-Intensive

THE DATA TRANSCRIPTION AND ANALYSIS (DTA) TOOL. HANDS ON WORKSHOP

Development of Linguistic Linked Open Data (LLOD) Resources for Collaborative Data-Intensive Research in the Language Sciences

María Blume, Pontificia Universidad Católica del Perú, Isabelle Barrière, Long Island University & Yeled V'Yalda Early Childhood CenterCristina Dye, Newcastle University, andTed Caldwell, GorgesWith the invaluable help of Carissa Kang and Jonathan Masci, Cornell University

July 25th, 2015

This tool funded by

National Science Foundation. CI-TEAM program. “Transforming the Primary Research Process Through Cybertool Dissemination: An Implementation of a Virtual Center for the study of language acquisition”. (María Blume and Barbara Lust). NSF OCI-0753415

Purpose and goals

To create a culture of national and international collaboration among researchers and their labs.

to create shared principles and methods of data documentation, management and collaboration

to enable the practice of these principles and methods through the use of cybertools.

To provide a new generation of researchers and students, including those with diverse disciplinary, geographical and cultural backgrounds, with a solid foundation in these principles and methods through the use of these new cybertools.

GoalPurpose

Purpose and goals

To create a tool for collaboration that can allow for the management, documentation and analysis of crosslinguistic language data.

To provide a resource that allows users to manage data across datasets and projects, including the ability to reuse previously collected data.

Purpose Goal

http://vcla.clal.cornell.edu/

Virtual Center for the Study of Language Acquisition (VCLA)

http://vcla.clal.cornell.edu/

The Virtual Center for Language Acquisition Research

A community of researchers that are linked in their assumption that the most fundamental questions of language acquisition require interdisciplinary collaboration, both theoretical and empirical methods, and a cross-linguistic approach.

Eight founding member institutions. One international collaborator in Peru.

The VCLA website

A center that unites through the web a series of research labs across the country and the world.

[Its] mission is to foster collaborative research among researchers working in the area of language acquisition, collaborations which are potentially interdisciplinary, which may be at a distance geographically and which may involve the comparative study of multiple languages, and interactions on shared data, as well as a variety of lab methods.

The VCLA

List of projects by VCLA members to give undergraduate and graduate students and other researchers ideas for future research and collaboration.

Courses

Courses

We have created a series of courses centered on research methodology, best practices and the intensive hands-on experience with cybertools such as the Experiment Bank and Web DTA.

The Web Conferences: Elluminate

Cornell and UTEP students meeting during our first course.

The Web Conferences: Elluminate

A UTEP student presents her research proposal to peers and faculty at UTEP and Pontificia Universidad Católica del Perú.

http://clal.cornell.edu/vll

The Virtual Linguistics Lab (VLL)

http://clal.cornell.edu/vll

The Virtual Linguistics Lab

The VLL portal provides structured access to the components of a virtual linguistic lab:

Materials for the scientific and collaborative study of language acquisition.

web-based courses, integrating synchronous and asynchronous forms of interactive information distribution.

Meeting the Challenges through a Virtual Linguistics Lab

The VLL includes a series of web-based courses, integrating synchronous

and asynchronous forms of interactive information distribution,

a web-based experiment bank and data transcription and analysis tool, with an associated set of data collected over 20 years by the Cornell Language Acquisition Lab and other labs across the USA.

a series of structured audio-visual demonstrations and related learning modules.

These materials are integrated into a university-supported cyberinfrastructure to ensure the high availability needs of a distance learning program

VLL Components

Laboratory methods: Research methods manual. Standards. Courses

Teaching materials. Audio/visual samples (lessons, assignments,

data). Web conferences. Discussion board.

VLL Portal Topics

Teaching Modules

Provide graduate and undergraduate students with a set of interactive web-based lessons which teach them the specific procedures of investigating language knowledge.

These link to Audio/video examples. Glossary The experiment bank The methods manual The Data Transcription and analysis tool. Published or unpublished papers Specific exercises/homework.

The modules

provide students with selected excerpts of language data to be studied and analyzed

give students a virtual experience of an interview of a subject and real experience of analysis of the subject's language.

allow students to learn a method to use in own research or practice

allow students to learn how to analyze previously collected data

A teaching module

The teaching modules give students access to:

•Audio/Visual examples.

•PowerPoint presentations explaining the methods.

•Readings

•Interactive assignments.

Audio/Visual Materials

Teaching the procedure for the Act Out task.

Audio/Visual Materials

An experimental study showing the Elicited Imitation task done with a 2-year-old in Peru.

An interactive assignment

Elicited Imitation assignment comparing monolingual and bilingual children.

These assignments train students to transcribe and analyze data, and compare their results to the original paper’s results.

An interactive assignment

A child subject enjoys the experiment.

These samples give the student a virtual experience of data collection.

Cybertools

Cybertools

Multilingualism questionnaire. Data Transcription and Analysis Tool

(DTA) includes an Experiment Bank gives access to Libraries of comparable

data. DTA User’s Manual.

Virtual workshops.

Cybertool access through VLL

Data quality: the opportunities Technology can enable:

Precision and completeness in data-capture procedures

Capacity for many levels of structural description and analysis

Capacity to link points of data along multiple dimensions

Why do we need the DTA tool in the study of language acquisition and use?

Multiple languages Multiple formats Multiple methods of data collection

observational vs. experimental, cross-sectional or longitudinal.

Multiple aspects of metadata age and/or developmental/cognitive stage of

speaker. social and pragmatic context culture.

Data management and use

Different labs practice distinct forms of data management.

The scientific use of any single record requires access to many levels of data, ranging from raw (establishing provenance) to structured and analyzed data (establishing intellectual worth).

Data Transcription and Analysis Tool (WebDTA)

31

http://webdta.clal.cornell.edu/site/login A primary research tool which provides the

user with a web interface which guides him/her through steps for generating, storing and accessing data.

Users contribute data in a structured, uniform manner.

Users access calibrated data from a shared relational database.

Diverse data become comparable at many levels.

http://webdta.clal.cornell.edu/site/login

The Data Transcription and Analysis Tool (WebDTA)

32

Collects all information related to a study (experimental or observational) in the same location.

Makes all information about the study available to the public. Researchers seeking to replicate or criticize it. Students studying the particular method or

research topic. Trains researchers and students on how to

organize research data.

WebDTA Tool33

It stores its data in a relational database on a centralized server (other systems store flat text files).

It supports both Natural Speech and Experimental data.

It can be used for both Research and Education (structured teaching modules).

It is open-ended. New specialized coding screens can be added.

It has robust query capabilities based on its relational database structure.

Brief development history

Virtual Language Laboratory (VLL), its Data Transcription and Analysis Tool (DTA, WebDTA) and the proprietary methodology that supports these were developed over 30 years of personal effort by Prof. Lust and student and peer contribution.

Several rudimentary versions of the DTA were sketched out and crafted in old software. However, when user friendly relational databases became common place, research and student users were able to define a new approach. A more powerful version of the DTA using FoxPro as the engine was developed. Katharina Boser, Reiko Mazuka, Julie Eisele, Paul Navarre,

David Parkinson, Shamitha Somashekar, and María Blume.

Brief development history

Cliff Crawford provoked the CLAL's development of a web-based interface for the DTA tool and has held major responsibility for programming of the first web-based interface, using PostgreSQL.

The current version of the DTA tool, unifying the previously independent cybertool Experiment Bank with the DTA was developed by Ted Caldwell and Greg Kops at Gorges, Web Development and Internet Solutions (http://www.gorges.us/) with María Blume and Barbara Lust, and input from students of the Cornell Language Acquisition Lab (Natalia Buitrago, Gabriel Clandorf, Poornima Guna, Jennie Lin, and Jordan Whitlock and UTEP Marina Kalashnikova and Martha Rayas).

http://www.gorges.us/

DTA Schema

Structure

The current version of the WebDTA tool is built on Yii, a PHP web development framework that uses the "Model-View-Controller" pattern to structure the application and the "Active Record" pattern to manage records from the database.

MySQL is used for the database platform. All are open source technologies.

External links

We are collaborating with Cornell University’s Albert Mann Library in their current pilot program, DataStaR (Data Staging Repository) intended to help researchers create high quality metadata in the formats required by external repositories…”(Steinhart 2010: 1) (Funded by the National Science Foundation (Grant No. 111-0712989)

The program adopts a semantic web approach to metadata. At present, one VCLA dataset (Sinhala language) from more than

400 children studied in Sri Lanka has been entered in DataStaR, linking the VCLA database to the Library staging repository, and is available for collaborative use through this repository.

DataStaR uses RDF (Resource Description Framework (RDF)) statements and OWL (Web Ontology Language) classes in order to integrate different metadata frameworks across disciplines.

http://datastar.mannlib.cornell.edu/display/n6291 and http://www.news.cornell.edu/stories/Oct11/SinhalaTools.html

http://datastar.mannlib.cornell.edu/display/n6291

http://www.news.cornell.edu/stories/Oct11/SinhalaTools.html

Project sample

An Experimental Project https://webdta.clal.cornell.edu/projects/151/overview A Natural Speech Corpus

https://webdta.clal.cornell.edu/projects/15/datasets/40/sessions

Coding Gesture https://webdta.clal.cornell.edu/projects/235/datasets/249/sessions/2648/transcriptions/1669/utterances

Code-switching project https://webdta.clal.cornell.edu/projects/230/overview

Queries https://webdta.clal.cornell.edu/queries

https://webdta.clal.cornell.edu/projects/151/overview





https://webdta.clal.cornell.edu/projects/235/datasets/249/sessions/2648/transcriptions/1669/utterances





https://webdta.clal.cornell.edu/queries

https://webdta.clal.cornell.edu/queries

Accessing the DTA

Permission required due to Human Subjects Issues.

Need to contact Barbara Lust ([email protected]) or María Blume ([email protected])

Go to https://webdta.clal.cornell.edu/ (the link is in a doc called DTA address which we e-mailed you along with documents containing data from children which you can use to practice. We have removed the identifying data.)

mailto:[email protected]

mailto:[email protected]

https://webdta.clal.cornell.edu/

https://webdta.clal.cornell.edu/

Acknowledgments María Blume and Barbara Lust. 2008.

Transforming the Primary Research Process Through Cybertool Dissemination: An Implementation of a Virtual Center for the Study of Language Acquisition. NSF OCI-0753415

Lust, Barbara. 2003. Planning Grant: A Virtual Center for Child Language Acquisition Research. National Science Foundation. NSF BCS-0126546

VCLA founding members: Cornell: Marianella Casasola, Claire Cardie, James Gair, and Qi

Wang. NeuroFocus: Elise Temple Boston College: Claire Foley Rutgers University at New Brusnwick: Liliana Sánchez. Rutgers University at Newark: Jennifer Austin California State University at San Bernardino: YuChin Chien. Southern Illinois University at Carbondale: Usha Lakshmanan.

Acknowledgments VCLA affiliates:

City University of New Yors: Gita Martohardjono, Valerie Shafer, and Isabelle Barrière .

Newcastle University: Cristina Dye. Ben Gurion University at the Negev: Yarden Kedar Tyndale University College and Seminary: Sujin Yang. Columbia University: Joy Hirsch. University of Texas at El Paso: Ellen Courtney and Alfredo

Urzúa. University of California at San Diego: Sarah Callahan. Pontificia Universidad Católica Del Perú: Jorge Iván Pérez

Silva Kyungsung University: Kwee Ock Lee Central Institute of English and Foreign Languages: R.

Amritavalli Osmania University: A. Usha Rani.

Acknowledgments

Janet McCue and Barbara Lust 2004-2006. National Science Foundation Award: Planning Information Infrastructure Through a New Library-Research Partnership. (SGER=Small Grant for Exploratory Research)

American Institute for Sri Lankan Studies, Cornell University Einaudi Center.

Cornell University Faculty Innovation in Teaching Awards, Cornell Institute for Social and Economic Research (CISER).

New York State Hatch grant.

Our application developers Ted Caldwell and Greg Kops (GORGES).

Our consultants Cliff Crawford and Tommy Cusick;

Our student RAs: Darlin Alberto, Gabriel Clandorf, Natalia Buitrago, Poornima Guna, Jennie Lin, and Jordan Whitlock formerly at Cornell now MIT, and Marina Kalashnikova. Martha Rayas Tanaka, Lizzeth Pattison, María Jiménez, and Mónica Martínez at UTEP.

The students at all the participating institutions that helped us with comments and suggestions.

References

Berners-Lee, Tim. .3/2009. Ted Lecture. Tim Berners-Lee on the next Web. http://en.wikipedia.org/wiki/Linked_data .

Bickel, Balthasar, Bernard Comrie, and Martin Haspelmath. 2008. Leipzig Glossing Rules: Conventions for Interlinear Morpheme-by-Morpheme Glosses. Available online at http://www.eva.mpg.de/lingua/resources/glossing-rules.php .

Blume, María and Barbara Lust, 2011a and in prep. Data Transcription and Analysis Tool User’s Manual. (with the collaboration of Shamitha Somashekar, and Tina Ogden).

Blume, María and Barbara Lust. 2011b. Presentation to the National Science Foundation. CI Team Principal Investigator’s Meeting. University of Illinois at Urbana Champaign, Ill. May 24-26. Transforming the Primary Research Process Through Cybertool Dissemination: An Implementation of a Virtual Center for the Study of Language Acquisition. NSF OCI-0753415.

Farrar, S.O. and Langendoen, D.T. 2003 A linguistic ontology for the semantic web. GLOT International, 7(3) 97-100.

Khan, Huda, Brian Caruso, Brian Lowe, Jon Corson-Rikert, Diane Dietrich and Gail Steinhart. 2011. DataStaR: Using the Semantic Web approach for Data Curation. International Journal of Digital Curation 2(6): 209-221.

Lowe, Brian. 2009. DataStaR: Bridging XML and OWL in Science Metadata Management. Metadata and Semantics Research 46: 141-150. http://www.springerlink.com/content/q0825vj78ul38712/

http://en.wikipedia.org/wiki/Linked_data

http://www.eva.mpg.de/lingua/resources/glossing-rules.php

http://www.springerlink.com/content/q0825vj78ul38712/

References

Lust, Barbara, Suzanne Flynn, María Blume, Elaine Westbrooks, and Theresa Tobin. (2010). Constructing Adequate Documentation for Multi-faceted Cross Linguistic Language Data: A Case Study from a Virtual Center for Study of Language Acquisition. In Grenoble, Lenore and Louanna Furbee, (eds.), Language Documentation: Theory, Practice and Values. pp. 127-152. Amsterdam/Philadelphia: John Benjamins.

Open Archives Initiative (OAI), http://www.openarchives.org/ (15 Mar. 2005). Open Language Archives Community (OLAC), http://www.language-archives.org/ (24

Feb. 2011). Simons, G. Farrar, JS., Fitzsimons, B., Lewis, W., Langendoen, D.T. and Gonzalez, H.

2004a. The semantics of markup: Mapping legacy markup schemas to a common semantics.

Simons, G., Fitzsimons, B., Langendoen, D.T., Lewis, Wm., Farrar, S., Lanham, A., Basham, R. and Gonzalez H. 2004b. http://emeld.org/workshop/2004/langendoen-paper.html

Steinhart, Gail. 2010. DataStaR: A Data Staging Repository to Support the Sharing and Publication of Research Data. 31st Annual IATUL Conference - The Evolving World of e-Science: Impact and Implications for Science and Technology Libraries. June 20-24, 2010. West Lafayette, IN. http://docs.lib.purdue.edu/iatul2010/conf/day2/8/.

http://www.language-archives.org/

http://docs.lib.purdue.edu/iatul2010/conf/day2/8/

DTA: Project list

DTA: Project info

Metadata on Experimental or Naturalistic research.

These screens help students and researchers save/access the basic information for a research study and also keep track of publications, presentations, related studies, and bibliography related to a research project.

DTA Metadata: Subject info

Subject information that allows for one subject’s data to be used in multiple datasets.

DTA: Research Design

These screens help students and researchers save/access the research study’s design.

DTA: Summary Report

This report shows the data at the project level.

DTA: Summary Report

From the project report one can access the summary reports for the different datasets of the project.

DTA: Summary Report

DTA Metadata: Session info

This screen provides info for every time a subject was recorded for a given dataset.

DTA: Recordings

One can include several “recordings” for each session, including audio, video, and previous transcripts.

DTA: Transcription

This screen allows one to transcribe, switch between recordings, and time-align recordings and transcripts.

DTA: Basic Natural Speech Coding

Basic levels of linguistic coding to train students.

Additional levels of general or project-specific coding can be created by users.

DTA: Experimental Coding for Grammaticality Judgment Task.

An example of a project specific coding created for an experimental task.

DTA: Query

A multi-condition query.

Different queries can be created and saved by users as needed.

The Virtual Workshops

The Virtual Workshops Topics

Virtual Workshops teach users how to navigate our cybertools.

The Virtual Workshops: The DTA Manual

It allows users to take notes and has quizzes to check for understanding of the cybertool.

The Virtual Workshops: A video demonstration

Prof. Lust explains the purpose and motivation of the cybertool to students and researchers at Cornell, Rutgers New Brunswick, MIT and UTEP.

The Virtual Workshops: A video demonstration

The DTA tool programmer, Ted Caldwell, shows students the different DTA screens and their purpose with added comentary by María Blume and Barbara Lust.

Download - THE DATA TRANSCRIPTION AND ANALYSIS (DTA) TOOL. HANDS ON WORKSHOP Development of Linguistic Linked Open Data (LLOD) Resources for Collaborative Data-Intensive

Top Related