max-planck-institute for psycholinguistics research and technical facilities
Post on 18-Dec-2015
218 views
TRANSCRIPT
Max-Planck-Institute for Psycholinguistics
Research
and
Technical Facilities
Structure and TasksResearch
Technical Group
Acquisition Group
Comprehension Group
ProductionGroup
Language &Cognition
Neurocognition
Interfacultaire Werkgroup
Taal & Spraak
FC DondersCenter for
Neuro-imaging
Directorate
Acquisition Group: principles underlying acquisiton of languages by adults and childrenComprehension Group: principles to make us understand what people are sayingProduction Group: principles to allow us to form ideas into utterancesLanguage & Cognition: underlying relation between language and thoughtNeurocognition Group: functional architecture of the brain, processes observed with MRI, MEG, EEGIWTS: group established at the KUN with complementary taskFCDC: brandnew center for neuroimaging - collaboration with MPI
institute is very much oriented to experimental and observational workit’s a data driven approach (similar to physics)
TG: support for all technical and methodological aspectsdevelopment of tools, setups, methods
Structure and TasksTechnical Group
TG
Overhead Development Support
Information Services
Desktop Services
AV Services
Electronic Services
Experiment Services
Server Network
ScriptsPrograms
NESU2
BrowsableCorpus
EUDICO Tool Set
AnimationsArtwork
Digital Media
Corpus Manag
• Linguistics DB• Equipment DB• MEID Intranet• Picture DB• PrePrint Server• Web-Site• scientific DBs
• in house setups• out of house • eye tracker labs• gesture lab• ERP labs• child labs• MRI exp• ext collaborators • exp devices
• av maintenance• electronic boxes • fieldwork support• PC hardware• PC setup• mechanical work
• video lab• observation labs• av copying• av editing
• helpdesk • SW support • PC Images • printer support• guest support• test new tech
• helpdesk• NT server • Unix server • storage sys• backup• email, web • network HW • network services
• digi setups• digi SW• conversion• cutting• editing • standards
• workflow • metadata org • archive org • scripts • conversion• copying • DVD burning
• scientific scripts• scientific programs
• multimedia annotation • multimedia visualiz\ation• multimedia search• UNICODE
• metadata standard • MD editor• MD browser• conversion• DC mapping
• NESU 2 builder • NESU 2 runner• graphics • X technology• NESU HW
• 3D animations• life-like creatures• posters• graphic effects• photos
• administration • organization • training • management• BAR member• EC councils • intern. Org.
32% 62% 6%
external money
0.8 2.3 2.1 0.6 2.3 0.6 3.2 1.6 1.7 1.3 0.3 1.7 1.0 1.1
MEID IntranetTechnical Group
MEID (Max-Planck-Institute’s Electronic Information Desk the central source for various sort of information
Calender InformationRoom Reservation
Absence Information
Various Forms toRequest Technical
Services
Electronic Journals and Scientific
Info DB Research Information (Experiment Schedule,
Picture DB, ExperimentDesign DB, …)
Technical Information
of all sort
Access to Archives
(Preprint Server, …)
MEID is an highly automaized, interactive information system; behind the User Interface elements are universal databases such as the Personal database
Server-, Storage- and Network-SystemsTechnical Group
Fat Servers
MPI-Net 2001
MPI-Storage2001
MPI-Computers2001
Power Clients/Small Servers
FastEthernet10Mbit/s FastEthernet10Mbit/s
Gigabit Switch
Gigabit Switch
Low Power Clients
Power Clients/Small Servers
Fat Servers
Network highlights
SURFnet to/from MPI: 155Mbit/s
MPI internal
• Gigabit Switch (12x 1000 Mbit/s, 240x 100 Mbit/s, n x 10 Mbit/s)
• 10 Mbit/s used by thin clients
•100 Mbit/s used by power users
•1000 Mbit/s used by fat servers
Network security
Network security is mainly achieved at the moment by port filtering on the router.
Storage highlightsOn the most UNIX fat server systems JBoDs are used for configuring various volumes of different sizes for different kind of data. The main categories are programs, user data, archive and corpora data.
Two systems are parts of a SAN system. Other components of this SAN are a SAN Fibre Channel Switch and a RAID storage system.
The various categories of data are stored on different servers. One servers is mainly used as files server for user data and programs the other is for storing archive and corpora data.
A third server is functioning as backup master for all client systems (UNIX and NT).
Backup system
Tape Library ETL 7/3500: 4x DLT 7000 (35-70 GB), 95 slots, max total capacity = 3,5 TB, robot handling mechanism with barcode reader, SCSI-interface , transfer rate oof 36 GB/hr.
This tape library was also part of the Hirarchical Storage Management system (HSM) which will be replaced this year by a more sophisticated system.
Computer highlights
Fat servers:
UNIX server
3 SUN E450, 1 SUN E250, 1 HP D275
•SUN server all are SPARC II systems with
2 or 4 CPUs (300 MHz or 400Mhz), 1-2 GB RAM, 1000 Mbit/s Network Interface,
150 GB -440 GB local disk space
• SAN solution with RAID 5 volumes (5x6x72 GB disks = 2.1TB brutto)
connected to 2 SUN server via Fibre Chanel hostadapter
NT-server
3 transtec 2500, 1 transtec 2600
•transtec server all are INTEL III systems with
2 CPUs (400 MHz), 750 MB RAM, 1000 Mbit/s Network Interfaces,
100 GB local disk space
•SAN solution with RAID 5 volume on one NT server (6x72 BG disks = 430 GB)
Experimental FacilitiesTechnical Group
NESU Nijmegen Experiment
SetupVersion 2
universal experiment
builder and runner
Stimulus-Response Experiments
Groups Experiment10 parallel subjects
Groups Experiment 4 parallel subjects
11 single subject rooms
many portablesetups
(notebooks)
Eye Tracking Experiments
Single Reflection
System
Eyelink System 1
Eyelink System 2
Eyeview System
Child Experiments
Child Exp Setup
Child Exp SetupwithEEG
Cognition Experiments
ERP Lab 1
ERP Lab 2
out of house MRI Setup MEG Setup
Gesture Experiments
GestureLab
Major NESU characteristics• Win 2000 support• realtime guarantees (< 1ms)• fast audiovisual stimuli from computer• experiment browser• reverse experiment designing• graphical experiment builder (simple design by mouse)• easy to use experiment runner• short prototyping cycle• hardwareless prototyping• orthogonal design (separation of timing and structure)• special hardware for high accuracy measurements• easily adaptable to external equipment such as MRI• adaptable object-oriented code (Smalltalk)• application of DirectX technology• fast hardware drivers • included performance analysis
Screen shot Experiment Builder Screen shot Experiment Runner
Screen shot Experiment Performance Analyzer
Eye Tracking LabsTechnical Group
• 2 PC's as tracker and subject display systems, connected by ethernet adaptors • Subject PC runs under NESU and shows stimuli to subject. Also controls the tracker PC• Eye tracker extension for NESU controls the flow of the experiment. • Subject PC delivers visual and auditive stimuli• Tracker PC records both eyes (selectable) and saves the data to a binary file• Eye lighting with an infrared light source (LED array). Seen by CCD chip cameras• Tracking possible with max. 250 Hz (selectable)• Automatic correction for head movements• Automatic detection of events (fixations, saccades etc.)• Tracking based on center of a circle representing the pupil• Measurement of pupil diameter possible• interactive fixation analysis (not automatic)• interactive association between fixations, mouse movements, and graphical objects• feedback experiments where screen contents are manipulated depending on gaze
Eye Tracking Setup
Subject Screen
Subject PC
Tracker PC
Head Movement Compensation
Headband
Dedicated Network
Measurement Principles
Two miniature high speed (250 Hz) IR-cameras record pupil position and shape. Fast hardware is used for geometrical calculations of gaze. During calculation head movements are compensated by means of recording marker positions.
Data Flow & Analaysis Keypoints
ReactionResponses
GazeData
NESUExp Runner
EyelinkSystem
Responses
Controls
Feedback
FixationAnalysis
StaticalAnalysis
Typical graphical representation of fixationpatterns while doing interactive fixation analysis. In the sample experiment mouse trails were also recorded (see black points).
temporary storage
Digitization Process
Corpus Building WorkflowTechnical Group
from audio/video recordings to the multi-media archive
tape library 25 TB
RAID system 3 TB
Hierarchical Storage Management System
Field Recordings Metadata Browse & SearchUniverse
User
2nd copies
• The user is only seeing the metadata universe, i.e. he operates in a concep- tional browse and search domain• The corpus manager has to organize the digitization process and organize the corpus storage and the MD domain.• The system manager is responsible for reliable storage mechanisms, enough capacity and fast access.
video tape
audio tape
digitizationcomputer
System Manager
Corpus Manager
4 video and 4 audio setups
Fieldwork & Expedition SupportTechnical Group
Expedition Schedule and Planning Typical Field Equipment Set Equipment Database
In 2001 about 25 field trips were prepared and equipped. Each field trip is entered in a planning document.
A typical equipment set for an expedition icnludesvarious power supply devices, recording and
annotation equipment.
The Technial Group has a central database which covers allequipment we have ordered and all persons at the institute.
This DB is used to control the flow of equipment and the status of every unit.
Software setup for Field Trips
The screen shot indicates the type of software which is installed on a field notebook. It contains digitization tools, media inspection tools, experiment tools, and
tools to create metadata descriptions, notes, and annotations.
Equipment Check & Maintenance Cycle
before tripcheck &
maintenance
after tripcheck &
maintenance
magazin
When being returned from the field the equipment is briefly checked for severe damages. The before trip
maintenance is done very carefully, since guarantees have to be given to the field researchers.
Miniaturization & Robustness
Miniaturization and robustness are the two major requirements for field work. Therefore
the MPI is always looking for newest technology. However, only experience in the field can tell us, whether both go together.
Browsable Corpus ToolsTechnical Group
SESSION METADATA project participant content ... ... media file
transcription file
media carrier
media file metadata
transcription file metadata
media carrier metadata
URL
URL
tape archive reference
ISLE Metadata Initiative
Open International Standard
Childspeakers
CC
C
C C
CC
Adultspeakers
Male childspeakers
Male adultspeakers
Fem. childspeakers
Fem. adultspeakers
Metadata Vision MPI 98Lund
MPILeipzig
Helsinki
Childes
AIATSIS
ICE
SIL
Lancaster
???
LDC
MPINijmegenELRA
Japan
China
Lacito
connection bysimple URL mechanism!!
User friendly generation of metadata descriptions which adhere to open standards; creation of a conceptual domain
Metadata Browsers allow to operate in a conceptual domain including all metadata descriptions adhering to standards
world-wide interconnected
domain
typical browsable hierarchy
immediate execution of a useful tool on the chosen set of files
IMDI
EUDICO Tool SetTechnical Group
time scales, independent streams, partial time alignment, large nr. tiers, hierarchies, (labeled) references
tvideo camera 1 (t1)video camera 2 (t2)
eye tracker 1 (t3)
transcriptionmorphology
left eyer.h. gesture
r.h. gesture phase r. hand
easily > 50 tiers
t1*t2*t3*
Complexity of Multi-Modal Annotations EUDICO Architecture Enabling Distributed Operation
XML
EUDICO
CHAT
GDB
Tipster
AIF
GATE
ATLASAppl.
Abstract Corpus Modelthe Nucleus of EUDICO
ACM was designedto represent many of the current annotations formats to achieve format Independence
flexible tier definition
EUDICO Annotation Tool EUDICO Visualization Tool
Subtitle Viewer
Compact ViewerTime Line Viewer
Grid Viewer
Different user adjustable viewers allow theuser to view his data in a flexible way. Other useful viewers will be added.
EUDICO Visualization Tool EUDICO Search Tool
EUDICO Annotation Formatflexible XML-based format
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE ANNOTATION_DOCUMENT ><ANNOTATION_DOCUMENT DATE="July 17, 2001" AUTHOR="Hennie Brugman" VERSION="1.0">
<HEADER TIME_UNITS="milliseconds" MEDIA_FILE="file:/server/media.mpg"/><TIME_ORDER>
<TIME_SLOT TIME_SLOT_ID="ts1" TIME_VALUE="1000"/><TIME_SLOT TIME_SLOT_ID="ts2" TIME_VALUE="2000"/><TIME_SLOT TIME_SLOT_ID="ts3" TIME_VALUE="3000"/><TIME_SLOT TIME_SLOT_ID="ts4"/><TIME_SLOT TIME_SLOT_ID="ts5" TIME_VALUE="5000"/><TIME_SLOT TIME_SLOT_ID="ts6" TIME_VALUE="6000"/>
</TIME_ORDER><TIER TIER_ID="t1" LINGUISTIC_TYPE_REF="orthography" PARTICIPANT="jan" DEFAULT_LOCALE="IPA-96">
<ANNOTATION><ALIGNABLE_ANNOTATION
ANNOTATION_ID="a1" TIME_SLOT_REF1="ts1" TIME_SLOT_REF2="ts3">
<ANNOTATION_VALUE>utterance 1</ANNOTATION_VALUE></ALIGNABLE_ANNOTATION>
</ANNOTATION><ANNOTATION>
<ALIGNABLE_ANNOTATION>
powerful audio and videosegmentdefinition
input methods for various character setssuch as IPA
character selection support for Chinese
The annotation tool combines all modernconcepts of definingsegments in audio and video signals, hasinput methods for many languages andcharacter sets, and generates UNICODEand XML-structuredfiles.
EUDICO generatesXML-structured files following theflexible EAF schema
Exchanging Documents with Unicode and XMLTechnical Group
• No mixture of fonts and character sets anymore• No font conversion necessary anymore• No double usage of ordinal positions anymore• Unifying characters as much as possible
Eudico programs support Unicode by the use of a special editor. A wide variety of languages is
supported by offering virtual keyboards.
Unicode Character Allocation taken from “The Unicode Standard Version 3.0”,
Addison-Wesley
<adlf> <block> <sentence name="transcription">aaaaaa aaaaaa a ‡ a2a2a2a2a2a2a2a2 ‡Œ </sentence> <sentence name="english">bbbbbb bb bbbbbbb </sentence> <sentence name="morpho">dddd ddddddddd </sentence> </block> <block> <sentence name="transcription">111 111 11111</sentence> <sentence name="english">2222 222 2222</sentence> <sentence name="morpho">3333 3333 33 3333</sentence> </block></adlf>
aaaaaa aaaaaa a ‡bbbbbb bb bbbbbbb dddd ddddddddd a2a2a2a2a2a2a2a2 ‡Œ
111 111 111112222 222 22223333 3333 33 3333
+
ADL
BLOCK = WRAPED TIER transcription, BOLD TIER english, ITALIC TIER morpho
Word file
ADL file
XML file
For some languages such as Chinese lookup windows are generated on screen to offer all characters as
selectable items.
Goals of Unicode
Unicode Overview
Unicode Usage
Conversion of MS Word documents to XML
Often transcripts and other linguistic documents are written in MS Word applying idiosyncratic structures although this format is not open and not suitable for archiving and further analysis. At MPI we developed a flexible
converter which allows the user to describe his file structure in simple terms (Annotation Description Language - ADL) such that XML files following the
EUDICO Annotation Format (EAF) are created.