max-planck-institute for psycholinguistics research and technical facilities

12
Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Max-Planck-Institute for Psycholinguistics

Research

and

Technical Facilities

Page 2: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Structure and TasksResearch

Technical Group

Acquisition Group

Comprehension Group

ProductionGroup

Language &Cognition

Neurocognition

Interfacultaire Werkgroup

Taal & Spraak

FC DondersCenter for

Neuro-imaging

Directorate

Acquisition Group: principles underlying acquisiton of languages by adults and childrenComprehension Group: principles to make us understand what people are sayingProduction Group: principles to allow us to form ideas into utterancesLanguage & Cognition: underlying relation between language and thoughtNeurocognition Group: functional architecture of the brain, processes observed with MRI, MEG, EEGIWTS: group established at the KUN with complementary taskFCDC: brandnew center for neuroimaging - collaboration with MPI

institute is very much oriented to experimental and observational workit’s a data driven approach (similar to physics)

TG: support for all technical and methodological aspectsdevelopment of tools, setups, methods

Page 3: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Structure and TasksTechnical Group

TG

Overhead Development Support

Information Services

Desktop Services

AV Services

Electronic Services

Experiment Services

Server Network

ScriptsPrograms

NESU2

BrowsableCorpus

EUDICO Tool Set

AnimationsArtwork

Digital Media

Corpus Manag

• Linguistics DB• Equipment DB• MEID Intranet• Picture DB• PrePrint Server• Web-Site• scientific DBs

• in house setups• out of house • eye tracker labs• gesture lab• ERP labs• child labs• MRI exp• ext collaborators • exp devices

• av maintenance• electronic boxes • fieldwork support• PC hardware• PC setup• mechanical work

• video lab• observation labs• av copying• av editing

• helpdesk • SW support • PC Images • printer support• guest support• test new tech

• helpdesk• NT server • Unix server • storage sys• backup• email, web • network HW • network services

• digi setups• digi SW• conversion• cutting• editing • standards

• workflow • metadata org • archive org • scripts • conversion• copying • DVD burning

• scientific scripts• scientific programs

• multimedia annotation • multimedia visualiz\ation• multimedia search• UNICODE

• metadata standard • MD editor• MD browser• conversion• DC mapping

• NESU 2 builder • NESU 2 runner• graphics • X technology• NESU HW

• 3D animations• life-like creatures• posters• graphic effects• photos

• administration • organization • training • management• BAR member• EC councils • intern. Org.

32% 62% 6%

external money

0.8 2.3 2.1 0.6 2.3 0.6 3.2 1.6 1.7 1.3 0.3 1.7 1.0 1.1

Page 4: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

MEID IntranetTechnical Group

MEID (Max-Planck-Institute’s Electronic Information Desk the central source for various sort of information

Calender InformationRoom Reservation

Absence Information

Various Forms toRequest Technical

Services

Electronic Journals and Scientific

Info DB Research Information (Experiment Schedule,

Picture DB, ExperimentDesign DB, …)

Technical Information

of all sort

Access to Archives

(Preprint Server, …)

MEID is an highly automaized, interactive information system; behind the User Interface elements are universal databases such as the Personal database

Page 5: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Server-, Storage- and Network-SystemsTechnical Group

Fat Servers

MPI-Net 2001

MPI-Storage2001

MPI-Computers2001

Power Clients/Small Servers

FastEthernet10Mbit/s FastEthernet10Mbit/s

Gigabit Switch

Gigabit Switch

Low Power Clients

Power Clients/Small Servers

Fat Servers

Network highlights

SURFnet to/from MPI: 155Mbit/s

MPI internal

• Gigabit Switch (12x 1000 Mbit/s, 240x 100 Mbit/s, n x 10 Mbit/s)

• 10 Mbit/s used by thin clients

•100 Mbit/s used by power users

•1000 Mbit/s used by fat servers

Network security

Network security is mainly achieved at the moment by port filtering on the router.

Storage highlightsOn the most UNIX fat server systems JBoDs are used for configuring various volumes of different sizes for different kind of data. The main categories are programs, user data, archive and corpora data.

Two systems are parts of a SAN system. Other components of this SAN are a SAN Fibre Channel Switch and a RAID storage system.

The various categories of data are stored on different servers. One servers is mainly used as files server for user data and programs the other is for storing archive and corpora data.

A third server is functioning as backup master for all client systems (UNIX and NT).

Backup system

Tape Library ETL 7/3500: 4x DLT 7000 (35-70 GB), 95 slots, max total capacity = 3,5 TB, robot handling mechanism with barcode reader, SCSI-interface , transfer rate oof 36 GB/hr.

This tape library was also part of the Hirarchical Storage Management system (HSM) which will be replaced this year by a more sophisticated system.

Computer highlights

Fat servers:

UNIX server

3 SUN E450, 1 SUN E250, 1 HP D275

•SUN server all are SPARC II systems with

2 or 4 CPUs (300 MHz or 400Mhz), 1-2 GB RAM, 1000 Mbit/s Network Interface,

150 GB -440 GB local disk space

• SAN solution with RAID 5 volumes (5x6x72 GB disks = 2.1TB brutto)

connected to 2 SUN server via Fibre Chanel hostadapter

NT-server

3 transtec 2500, 1 transtec 2600

•transtec server all are INTEL III systems with

2 CPUs (400 MHz), 750 MB RAM, 1000 Mbit/s Network Interfaces,

100 GB local disk space

•SAN solution with RAID 5 volume on one NT server (6x72 BG disks = 430 GB)

Page 6: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Experimental FacilitiesTechnical Group

NESU Nijmegen Experiment

SetupVersion 2

universal experiment

builder and runner

Stimulus-Response Experiments

Groups Experiment10 parallel subjects

Groups Experiment 4 parallel subjects

11 single subject rooms

many portablesetups

(notebooks)

Eye Tracking Experiments

Single Reflection

System

Eyelink System 1

Eyelink System 2

Eyeview System

Child Experiments

Child Exp Setup

Child Exp SetupwithEEG

Cognition Experiments

ERP Lab 1

ERP Lab 2

out of house MRI Setup MEG Setup

Gesture Experiments

GestureLab

Major NESU characteristics• Win 2000 support• realtime guarantees (< 1ms)• fast audiovisual stimuli from computer• experiment browser• reverse experiment designing• graphical experiment builder (simple design by mouse)• easy to use experiment runner• short prototyping cycle• hardwareless prototyping• orthogonal design (separation of timing and structure)• special hardware for high accuracy measurements• easily adaptable to external equipment such as MRI• adaptable object-oriented code (Smalltalk)• application of DirectX technology• fast hardware drivers • included performance analysis

Screen shot Experiment Builder Screen shot Experiment Runner

Screen shot Experiment Performance Analyzer

Page 7: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Eye Tracking LabsTechnical Group

• 2 PC's as tracker and subject display systems, connected by ethernet adaptors • Subject PC runs under NESU and shows stimuli to subject. Also controls the tracker PC• Eye tracker extension for NESU controls the flow of the experiment. • Subject PC delivers visual and auditive stimuli• Tracker PC records both eyes (selectable) and saves the data to a binary file• Eye lighting with an infrared light source (LED array). Seen by CCD chip cameras• Tracking possible with max. 250 Hz (selectable)• Automatic correction for head movements• Automatic detection of events (fixations, saccades etc.)• Tracking based on center of a circle representing the pupil• Measurement of pupil diameter possible• interactive fixation analysis (not automatic)• interactive association between fixations, mouse movements, and graphical objects• feedback experiments where screen contents are manipulated depending on gaze

Eye Tracking Setup

Subject Screen

Subject PC

Tracker PC

Head Movement Compensation

Headband

Dedicated Network

Measurement Principles

Two miniature high speed (250 Hz) IR-cameras record pupil position and shape. Fast hardware is used for geometrical calculations of gaze. During calculation head movements are compensated by means of recording marker positions.

Data Flow & Analaysis Keypoints

ReactionResponses

GazeData

NESUExp Runner

EyelinkSystem

Responses

Controls

Feedback

FixationAnalysis

StaticalAnalysis

Typical graphical representation of fixationpatterns while doing interactive fixation analysis. In the sample experiment mouse trails were also recorded (see black points).

Page 8: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

temporary storage

Digitization Process

Corpus Building WorkflowTechnical Group

from audio/video recordings to the multi-media archive

tape library 25 TB

RAID system 3 TB

Hierarchical Storage Management System

Field Recordings Metadata Browse & SearchUniverse

User

2nd copies

• The user is only seeing the metadata universe, i.e. he operates in a concep- tional browse and search domain• The corpus manager has to organize the digitization process and organize the corpus storage and the MD domain.• The system manager is responsible for reliable storage mechanisms, enough capacity and fast access.

video tape

audio tape

digitizationcomputer

System Manager

Corpus Manager

4 video and 4 audio setups

Page 9: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Fieldwork & Expedition SupportTechnical Group

Expedition Schedule and Planning Typical Field Equipment Set Equipment Database

In 2001 about 25 field trips were prepared and equipped. Each field trip is entered in a planning document.

A typical equipment set for an expedition icnludesvarious power supply devices, recording and

annotation equipment.

The Technial Group has a central database which covers allequipment we have ordered and all persons at the institute.

This DB is used to control the flow of equipment and the status of every unit.

Software setup for Field Trips

The screen shot indicates the type of software which is installed on a field notebook. It contains digitization tools, media inspection tools, experiment tools, and

tools to create metadata descriptions, notes, and annotations.

Equipment Check & Maintenance Cycle

before tripcheck &

maintenance

after tripcheck &

maintenance

magazin

When being returned from the field the equipment is briefly checked for severe damages. The before trip

maintenance is done very carefully, since guarantees have to be given to the field researchers.

Miniaturization & Robustness

Miniaturization and robustness are the two major requirements for field work. Therefore

the MPI is always looking for newest technology. However, only experience in the field can tell us, whether both go together.

Page 10: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Browsable Corpus ToolsTechnical Group

SESSION METADATA project participant content ... ... media file

transcription file

media carrier

media file metadata

transcription file metadata

media carrier metadata

URL

URL

tape archive reference

ISLE Metadata Initiative

Open International Standard

Childspeakers

CC

C

C C

CC

Adultspeakers

Male childspeakers

Male adultspeakers

Fem. childspeakers

Fem. adultspeakers

Metadata Vision MPI 98Lund

MPILeipzig

Helsinki

Childes

AIATSIS

ICE

SIL

Lancaster

???

LDC

MPINijmegenELRA

Japan

China

Lacito

connection bysimple URL mechanism!!

User friendly generation of metadata descriptions which adhere to open standards; creation of a conceptual domain

Metadata Browsers allow to operate in a conceptual domain including all metadata descriptions adhering to standards

world-wide interconnected

domain

typical browsable hierarchy

immediate execution of a useful tool on the chosen set of files

IMDI

Page 11: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

EUDICO Tool SetTechnical Group

time scales, independent streams, partial time alignment, large nr. tiers, hierarchies, (labeled) references

tvideo camera 1 (t1)video camera 2 (t2)

eye tracker 1 (t3)

transcriptionmorphology

left eyer.h. gesture

r.h. gesture phase r. hand

easily > 50 tiers

t1*t2*t3*

Complexity of Multi-Modal Annotations EUDICO Architecture Enabling Distributed Operation

XML

EUDICO

CHAT

GDB

Tipster

AIF

GATE

ATLASAppl.

Abstract Corpus Modelthe Nucleus of EUDICO

ACM was designedto represent many of the current annotations formats to achieve format Independence

flexible tier definition

EUDICO Annotation Tool EUDICO Visualization Tool

Subtitle Viewer

Compact ViewerTime Line Viewer

Grid Viewer

Different user adjustable viewers allow theuser to view his data in a flexible way. Other useful viewers will be added.

EUDICO Visualization Tool EUDICO Search Tool

EUDICO Annotation Formatflexible XML-based format

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE ANNOTATION_DOCUMENT ><ANNOTATION_DOCUMENT DATE="July 17, 2001" AUTHOR="Hennie Brugman" VERSION="1.0">

<HEADER TIME_UNITS="milliseconds" MEDIA_FILE="file:/server/media.mpg"/><TIME_ORDER>

<TIME_SLOT TIME_SLOT_ID="ts1" TIME_VALUE="1000"/><TIME_SLOT TIME_SLOT_ID="ts2" TIME_VALUE="2000"/><TIME_SLOT TIME_SLOT_ID="ts3" TIME_VALUE="3000"/><TIME_SLOT TIME_SLOT_ID="ts4"/><TIME_SLOT TIME_SLOT_ID="ts5" TIME_VALUE="5000"/><TIME_SLOT TIME_SLOT_ID="ts6" TIME_VALUE="6000"/>

</TIME_ORDER><TIER TIER_ID="t1" LINGUISTIC_TYPE_REF="orthography" PARTICIPANT="jan" DEFAULT_LOCALE="IPA-96">

<ANNOTATION><ALIGNABLE_ANNOTATION

ANNOTATION_ID="a1" TIME_SLOT_REF1="ts1" TIME_SLOT_REF2="ts3">

<ANNOTATION_VALUE>utterance 1</ANNOTATION_VALUE></ALIGNABLE_ANNOTATION>

</ANNOTATION><ANNOTATION>

<ALIGNABLE_ANNOTATION>

powerful audio and videosegmentdefinition

input methods for various character setssuch as IPA

character selection support for Chinese

The annotation tool combines all modernconcepts of definingsegments in audio and video signals, hasinput methods for many languages andcharacter sets, and generates UNICODEand XML-structuredfiles.

EUDICO generatesXML-structured files following theflexible EAF schema

Page 12: Max-Planck-Institute for Psycholinguistics Research and Technical Facilities

Exchanging Documents with Unicode and XMLTechnical Group

• No mixture of fonts and character sets anymore• No font conversion necessary anymore• No double usage of ordinal positions anymore• Unifying characters as much as possible

Eudico programs support Unicode by the use of a special editor. A wide variety of languages is

supported by offering virtual keyboards.

Unicode Character Allocation taken from “The Unicode Standard Version 3.0”,

Addison-Wesley

<adlf> <block> <sentence name="transcription">aaaaaa aaaaaa a ‡ a2a2a2a2a2a2a2a2 ‡Œ </sentence> <sentence name="english">bbbbbb bb bbbbbbb </sentence> <sentence name="morpho">dddd ddddddddd </sentence> </block> <block> <sentence name="transcription">111 111 11111</sentence> <sentence name="english">2222 222 2222</sentence> <sentence name="morpho">3333 3333 33 3333</sentence> </block></adlf>

aaaaaa aaaaaa a ‡bbbbbb bb bbbbbbb dddd ddddddddd a2a2a2a2a2a2a2a2 ‡Œ

111 111 111112222 222 22223333 3333 33 3333

+

ADL

BLOCK = WRAPED TIER transcription, BOLD TIER english, ITALIC TIER morpho

Word file

ADL file

XML file

For some languages such as Chinese lookup windows are generated on screen to offer all characters as

selectable items.

Goals of Unicode

Unicode Overview

Unicode Usage

Conversion of MS Word documents to XML

Often transcripts and other linguistic documents are written in MS Word applying idiosyncratic structures although this format is not open and not suitable for archiving and further analysis. At MPI we developed a flexible

converter which allows the user to describe his file structure in simple terms (Annotation Description Language - ADL) such that XML files following the

EUDICO Annotation Format (EAF) are created.