pc in tb manfred thaller planets tb meeting, denhaag, sept 28th. '06

25
PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Upload: leah-cook

Post on 26-Mar-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

PC in TB

Manfred ThallerPLANETS TB meeting, DenHaag,

Sept 28th. '06

Page 2: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

PC* in TB

* as represented by PC/2, PC/4 and PP/5

or: The XCEL / XCDL concept.

Page 3: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Page 4: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Page 5: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Page 6: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Page 7: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Page 8: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block I

A language, which allows a program to read "any file specification" based on

==> "eXtensible Characterisation Extraction

Language"

Formulate the humanly readable specifications of TIFF, RTF, WAV …in a language, which a general purpose program can read.

General enough that any existing format specification can be expressed in it. (LATeX, MAX, VRML …)

Page 9: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block I - Warning

After the alphabet had been designed ...

Page 10: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block I - Warning

After the alphabet had been designed ... ... somebody had still to write all those books.

Page 11: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block I - Warning

After the alphabet had been designed ... ... somebody had still to write all those books.

Page 12: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block II

A language, which allows a program to describe "any file content" using a

==> "eXtensible Characterisation Definition

Language"

Formulate the content of any file in an abstract language, which captures the complete information contained in it.

General enough that any existing content can be expressed in it.

Page 13: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block III

A program, which is able to interpret a format description in XCEL, and, using that, extracts from any file of that format a XCDL description of its content.

Production level quality. Indicative performance: <= 1 second / file.

Page 14: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block IV

A program, which takes two XCDL descriptions and delivers a statement about the similarity of the information described.

Page 15: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Relationship to DOWPC/2 defines the languages.(Starting: month 1 – [ finished month 18 ]. ) Deliverable: End month 5.Reuses PRONOM / DROID.

PC/4 implements the extraction mechanism(Starting: month 1, ups, 4 – [ finished month 18 ]. )

Reuses any existing tools.

PP/5 implements comparison mechanism and metrics of similarity of "information".

(Starting: month 15.)

Page 16: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metadata Derivation

File format A: # of color bands

File format B: depth<xsd:complexType name="bitDepth">

<xsd:complexContent><xsd:extension base="symbolType">

<xsd:sequence><xsd:element name="validValues" type="integerList"/>

</xsd:sequence></xsd:extension>

</xsd:complexContent></xsd:complexType>

Page 17: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metadata Derivation

From observed file properties

==> Property Ontology

Page 18: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Basic Elements:Byte OrderEncodingsPosition Types...

Structuring Elements:Item (logical unit that contains at least one sub-item)Symbol (smallest logical unit)

Image Schema:Colour TypeWidthHeightBit Depth…

Text Schema:Font-StyleFont-FamilySizeLanguage…

Multimedia Schema:PitchSamplerateChannelsFramerate...

PNG Instance

RTF Instance

TIFFInstance

PDFInstance

WAVInstance

MPEG4Instance

Processing Instructions:filepointerssymbol-counters…

Schema Architecture

Page 19: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metrics of Comparison I

"Information" will be grouped according to three levels:

–Descriptive (width, height,photogrammetric interpretation, aka “1 = red” )

–History (compression,photogrammetric interpretation, aka “1 = red”)

–Content (bytestream)

Page 20: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metrics of Comparison II

–Descriptive (width, height,photogrammetric interpretation, aka “1 = red” ) Can this be the same object?

–History (compression,photogrammetric interpretation, aka “1 = red”) Can this have been the same object?

–Content (bytestream) Is this the same object?

Page 21: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metrics of Comparison III

– Is the sequence of (UTF16) characters the same?

–Are properties with the same symbolic name applied to the same areas within the UTF16 sequence?

–Are the properties related to the same objects?

Page 22: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

XCDL: Observation

An XCDL description at the content level is actually a "universal virtual file format" …

… though inflated to about 210 % of the original size.

Page 23: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

PC (XCEL/XCDL) ==> TB

Provide:

comparison tool.

[ profiling tool. ]

[ validation. ]

[ identification. ]

Page 24: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

TB ==> PC (XCEL/XCDL)

Quis custodiet ipsos custodes?

Or: Who tests the testing tool?

Or: Beta (and possibly pre-Beta) “testing”.

Behaviour.

Performance.

Calibration.

Reference objects.

Page 25: PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06

The end

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06