version 17svoboda/courses/201-b4m36ds... · 2020. 9. 19. · lecture 2 - data formats \(5. 10....
TRANSCRIPT
![Page 1: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/1.jpg)
B4M36DS2, BE4M36DS2: Database Systems 2h p://www.ksi.mff.cuni.cz/~svoboda/courses/201-B4M36DS2/
Lecture 2
Data FormatsMar n Svobodamar [email protected]
5. 10. 2020
Charles University, Faculty of Mathema cs and PhysicsCzech Technical University in Prague, Faculty of Electrical Engineering
![Page 2: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/2.jpg)
Lecture OutlineData formats
• XML – Extensible Markup Language• JSON – JavaScript Object Nota on• BSON – Binary JSON• RDF – Resource Descrip on Framework• CSV – Comma-Separated Values
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 2
![Page 3: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/3.jpg)
XMLExtensible Markup Language
![Page 4: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/4.jpg)
Introduc onXML = Extensible Markup Language
• Representa on and interchange of semi-structured data+ a family of related technologies, languages, specifica ons, …
• Derived from SGML, developed byW3C, started in 1996• Design goals
Simplicity, generality and usability across the Internet• File extension: *.xml, content type: text/xml• Versions: 1.0 and 1.1• W3C recommenda on
h p://www.w3.org/TR/xml11/
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 4
![Page 5: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/5.jpg)
Example
<?xml version="1.1" encoding="UTF-8"?><movie year="2007">
<title>Medvídek</title><actors>
<actor><firstname>Jiří</firstname><lastname>Macháček</lastname>
</actor><actor>
<firstname>Ivan</firstname><lastname>Trojan</lastname>
</actor></actors><director>
<firstname>Jan</firstname><lastname>Hřebejk</lastname>
</director></movie>
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 5
![Page 6: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/6.jpg)
Document StructureDocument
• Prolog: XML version + some other stuff• Exactly one root element
Contains other nested elements and/or other content
<?xml<?xml versionversion == "" versionversion "" ...... ?>?> ......
elementelement
Example<?xml version="1.1" encoding="UTF-8"?><movie>
...</movie>
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 6
![Page 7: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/7.jpg)
ConstructsElement
• Marked using opening and closing tags… or just an abbreviated tag in case of empty elements
• Each element can be associated with a set of a ributes
<< namename
attributeattribute
>> element contentelement content << // namename >>
<< namename
attributeattribute
// >>
Examples<title>...</title><actors/>
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 7
![Page 8: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/8.jpg)
ConstructsTypes of element content
• Empty content• Text content• Element content
Sequence of nested elements• Mixed content
Elements arbitrarily interleaved with text values
texttext
elementelement
elementelement
texttext
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 8
![Page 9: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/9.jpg)
ConstructsA ribute
• Name-value pair
namename == "" valuevalue ""
Escaping sequences (predefined en es)• Used within values of a ributes or text content of elements• E.g.:
< for <> for >" for "…
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 9
![Page 10: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/10.jpg)
XML Conclusion
XML constructs• Basic: element, a ribute, text• Addi onal: comment, processing instruc on, …
Schema languages• DTD, XSD (XML Schema), RELAX NG, Schematron
Query languages• XPath, XQuery, XSLT
XML formats = par cular languages• XSD, XSLT, XHTML, DocBook, ePUB, SVG, RSS, SOAP, …
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 10
![Page 11: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/11.jpg)
JSONJavaScript Object Nota on
![Page 12: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/12.jpg)
Introduc onJSON = JavaScript Object Nota on
• Open standard for data interchange• Design goals
Simplicity: text-based, easy to read and writeUniversality: object and array data structures
– Supported by majority of modern programming languages– Based conven ons of the C-family of languages
(C, C++, C#, Java, JavaScript, Perl, Python, …)
• Derived from JavaScript (but language independent)• Started in 2002• File extension: *.json• Content type: application/json• h p://www.json.org/
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 12
![Page 13: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/13.jpg)
Example
{"title" : "Medvídek","year" : 2007,"actors" : [
{"firstname" : "Jiří","lastname" : "Macháček"
},{
"firstname" : "Ivan","lastname" : "Trojan"
}],"director" : {
"firstname" : "Jan","lastname" : "Hřebejk"
}}
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 13
![Page 14: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/14.jpg)
Data StructureObject
• Unordered collec on of name-value pairs (proper es)Correspond to structures such as objects, records, structs,dic onaries, hash tables, keyed lists, associa ve arrays, …
• Values can be of different types, names should be unique
{{
stringstring :: valuevalue
,,
}}
Examples• { "name" : "Ivan Trojan", "year" : 1964 }• { }
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 14
![Page 15: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/15.jpg)
Data StructureArray
• Ordered collec on of valuesCorrespond to structures such as arrays, vectors, lists,sequences, …
• Values can be of different types, duplicate values are allowed[[
valuevalue
,,
]]
Examples• [ 2, 7, 7, 5 ]• [ "Ivan Trojan", 1964, -5.6 ]• [ ]
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 15
![Page 16: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/16.jpg)
Data StructureValue
• Unicode stringEnclosed with double quotesBackslash escaping sequencesExample: "a \n b \" c \\ d"
• NumberDecimal integers or floatsExamples: 1, -0.5, 1.5e3
• Nested object• Nested array• Boolean value: true, false• Missing informa on: null
stringstring
numbernumber
objectobject
arrayarray
truetrue
falsefalse
nullnull
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 16
![Page 17: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/17.jpg)
JSON Conclusion
JSON constructs• Collec ons: object, array• Scalar values: string, number, boolean, null
Schema languages• JSON Schema
Query languages• JSONiq, JMESPath, JAQL, …
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 17
![Page 18: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/18.jpg)
BSONBinary JSON
![Page 19: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/19.jpg)
Introduc onBSON = Binary JSON
• Binary-encoded serializa on of JSON documentsExtends the set of basic data types of values offered by JSON(such as a string, …) with a few new specific ones
• Design characteris cs: lightweight, traversable, efficient• Used byMongoDB
Document NoSQL database for JSON documentsData storage and network transfer format
• File extension: *.bson• h p://bsonspec.org/
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 19
![Page 20: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/20.jpg)
ExampleJSON
{ "title" : "Medvídek", "year" : 2007 }
BSON24 00 00 00 02 74 69 74 6C 65 00 0A 00 00 00 4D 65 64 76 C3 AD 64 65 6B00 10 79 65 61 72 00 D7 07 00 00 00
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 20
![Page 21: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/21.jpg)
ExampleJSON
{"title" : "Medvídek","year" : 2007
}
BSON24 00 00 0002 74 69 74 6C 65 00 0A 00 00 00 4D 65 64 76 C3 AD 64 65 6B 0010 79 65 61 72 00 D7 07 00 0000
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 21
![Page 22: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/22.jpg)
Document StructureDocument = serializa on of one JSON object or array
• JSON object is serialized directly• JSON array is first transformed to a JSON object
Property names derived from numbers of posi onsE.g.:[ "Trojan", "Svěrák" ] →{ "0" : "Trojan", "1" : "Svěrák" }
• StructureDocument size (total number of bytes)Sequence of elements (encoded JSON proper es)Termina ng hexadecimal 00 byte
int32int32 elementelement 0000
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 22
![Page 23: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/23.jpg)
Document StructureElement = serializa on of one JSON property
......
0202 namename int32int32 bytebyte 0000
0101 namename doubledouble
1010 namename int32int32
1212 namename int64int64
0303 namename documentdocument
0404 namename documentdocument
0808 namename 0000
0101
0A0A namename
0909 namename int64int64
1111 namename int64int64
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 23
![Page 24: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/24.jpg)
Document StructureElement = serializa on of one JSON property
• StructureType selector
– 02 (string)– 01 (double), 10 (32-bit integer), 12 (64-bit integer)– 03 (object), 04 (array)– 08 (boolean)– 0A (null)– 09 (date me), 11 ( mestamp)– …
Property name– Unicode string terminated by 00
bytebyte 0000
Property value
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 24
![Page 25: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/25.jpg)
RDFResource Descrip on Framework
![Page 26: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/26.jpg)
Introduc onRDF = Resource Descrip on Framework
• Language for represen ng informa on about resourcesin the World Wide Web
+ a family of related technologies, languages, specifica ons, …Used in the context of the Seman c Web, Linked Data, …
• Developed byW3C• Started in 1997• Versions: 1.0 and 1.1• W3C recommenda ons
h ps://www.w3.org/TR/rdf11-concepts/– Concepts and Abstract Syntax
h ps://www.w3.org/TR/rdf11-mt/– Seman cs
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 26
![Page 27: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/27.jpg)
StatementsResource
• Any real-world en tyReferents = resources iden fied by IRI
– E.g. physical things, documents, abstract concepts, …Values = resources for literals
– E.g. numbers, strings, …
Statement about resources = one RDF triple• Three components: subject, predicate, and object
Examples<http://db.cz/movies/medvidek><http://db.cz/terms#actor><http://db.cz/actors/trojan> .
<http://db.cz/movies/medvidek><http://db.cz/terms#year>"2007" .
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 27
![Page 28: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/28.jpg)
StatementsTriple components
• SubjectDescribes a resource the given statement is aboutIRI or blank node iden fier
• PredicateDescribes the property or characteris c of the subjectIRI
• ObjectDescribes the value of that propertyIRI or blank node iden fier or literal
Although triples are inspired by natural languages, they havenothing to do with processing of natural languages
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 28
![Page 29: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/29.jpg)
Example
<http://db.cz/movies/medvidek><http://db.cz/terms#actor> <http://db.cz/actors/machacek> .
<http://db.cz/movies/medvidek><http://db.cz/terms#actor> <http://db.cz/actors/trojan> .
<http://db.cz/movies/medvidek><http://db.cz/terms#year> "2007" .
<http://db.cz/movies/medvidek><http://db.cz/terms#director> _:n18 .
_:n18<http://db.cz/terms#firstname> "Jan" .
_:n18<http://db.cz/terms#lastname> "Hřebejk" .
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 29
![Page 30: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/30.jpg)
Iden fiers and LiteralsIRI = Interna onalized Resource Iden fier
• Absolute (not rela ve) IRIs with op onal fragment iden fiers• RFC 3987• Unicode characters• Examples
http://db.cz/movies/medvidekhttp://db.cz/terms#actormailto:[email protected]:issn:0167-6423
• URLs are o en used in prac ce→ informa on about givenresources are then intended to be published / retrieved viastandard HTTP
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 30
![Page 31: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/31.jpg)
Iden fiers and LiteralsLiterals
• Plain valuesE.g.: "Medvídek", "2007"
• Typed valuesE.g.: "Medvídek"^^xs:string, "2007"^^xs:integerXML Schema simple data types are adopted and used
• Strings with language tagsE.g.: "Medvídek"@cs
• Types and language tags cannot be mutually combined
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 31
![Page 32: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/32.jpg)
Iden fiers and LiteralsBlank node iden fiers
• Blank nodes (anonymous resources)Allow to express statements about resources without explicitlynaming (iden fying) them
• Blank node iden fiers only have local scope of validityE.g. within a given file, query expression, …
• Par cular syntax depends on a serializa on formatE.g.: _:node18
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 32
![Page 33: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/33.jpg)
Data ModelDirected labeled mul graph
• Ver cesOne vertex for each IRI or literal value
• EdgesOne edge for each individual tripleEdges are directed subject predicate−−−−−→ objectProperty names (predicate IRIs) are used as edge labels
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 33
![Page 34: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/34.jpg)
Example
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 34
![Page 35: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/35.jpg)
Serializa onAvailable approaches
• N-Triples nota onh ps://www.w3.org/TR/n-triples/
• Turtle nota on (Terse RDF Triple Language)h ps://www.w3.org/TR/turtle/
• RDF/XML nota onXML syntax for RDFh ps://www.w3.org/TR/rdf-syntax-grammar/
• JSON-LD nota onJSON-based serializa on for Linked Datah ps://www.w3.org/TR/json-ld/
• …
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 35
![Page 36: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/36.jpg)
N-Triples Nota onRDF N-Triples nota on = A line-based syntax for an RDF graph
• Simple, line-based, plain text format• File extension: *.rdf• h ps://www.w3.org/TR/n-triples/
Example• Already presented…
Document• Statements are terminated by dots, delimited by EOL
tripletriple
0D 0A0D 0A tripletriple
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 36
![Page 37: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/37.jpg)
N-Triples Nota onStatement
• Individual triple components are delimited by spaces
subjectsubject predicatepredicate objectobject ..
Triple components: subject, predicate, object
IRI referenceIRI reference
blank node idblank node id
IRI referenceIRI reference IRI referenceIRI reference
blank node idblank node id
literalliteral
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 37
![Page 38: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/38.jpg)
N-Triples Nota onIRI reference
• IRIs are enclosed in angle brackets<< IRIIRI >>
Blank node iden fier
__ :: labellabel
Literal• Literals are enclosed in double quotes
"" valuevalue ""
^^^^ IRI referenceIRI reference
@@ language taglanguage tag
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 38
![Page 39: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/39.jpg)
Turtle Nota onTurtle = Terse RDF Triple Language
• Compact text format,various abbrevia ons for common usage pa erns
• File extension: *. l• Content type: text/turtle• h ps://www.w3.org/TR/turtle/
Example@prefix i: <http://db.cz/terms#> .@prefix m: <http://db.cz/movies/> .@prefix a: <http://db.cz/actors/> .m:medvidek
i:actor a:machacek , a:trojan ;i:year "2007" ;i:director [ i:firstname "Jan" ; i:lastname "Hřebejk" ] .
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 39
![Page 40: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/40.jpg)
Turtle Nota onDocument
• Contains a sequence of triples and/or declara ons• Prefix declara ons
Prefixed names can then be used instead of full IRI references• Groups of triples
Individual groups are terminated by dots
@prefix@prefix prefixprefix :: IRI referenceIRI reference ..
triplestriples
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 40
![Page 41: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/41.jpg)
Turtle Nota onTriples
• Triples sharing the same subject and object or at leastthe same subject can be grouped together
object list for a shared subject and predicatepredicate-object list for a shared subject
• Brackets can be used to define blank nodessubjectsubject predicatepredicate objectobject
,,
;;
[[ predicatepredicate objectobject
,,
;;
]]
predicatepredicate objectobject
,,
;;
..
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 41
![Page 42: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/42.jpg)
Turtle Nota onTriple components: subject, predicate, object
IRI referenceIRI reference
prefixed nameprefixed name
blank node idblank node id
IRI referenceIRI reference
prefixed nameprefixed name
IRI referenceIRI reference
prefixed nameprefixed name
blank node idblank node id
literalliteral
[[ predicate-object listpredicate-object list ]]
IRI reference / prefixed name
<< IRIIRI >> prefixprefix :: local namelocal name
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 42
![Page 43: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/43.jpg)
Turtle Nota onLiteral
• Tradi onal literals+ new abbreviated forms of numeric and boolean literals
"" valuevalue ""
^^^^ IRI referenceIRI reference
prefixed nameprefixed name
@@ language taglanguage tag
truetrue
falsefalse
integer / decimal / doubleinteger / decimal / double
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 43
![Page 44: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/44.jpg)
ExampleExample revisited
@prefix i: <http://db.cz/terms#> .@prefix m: <http://db.cz/movies/> .@prefix a: <http://db.cz/actors/> .m:medvidek
i:actor a:machacek , a:trojan ;i:year "2007" ;i:director [ i:firstname "Jan" ; i:lastname "Hřebejk" ] .
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 44
![Page 45: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/45.jpg)
RDF Conclusion
RDF statements• Subject, predicate, and object components
Schema languages• RDFS (RDF Schema)• OWL (Web Ontology Language)
Query languages• SPARQL (SPARQL Protocol and RDF Query Language)
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 45
![Page 46: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/46.jpg)
CSVComma-Separated Values
![Page 47: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/47.jpg)
Introduc onCSV = Comma-Separated Values
• Unfortunately not fully standardizedDifferent field separators (commas, semicolons)Different escaping sequencesNo encoding informa on
• RFC 4180, RFC 7111• File extension: *.csv• Content type: text/csv
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 47
![Page 48: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/48.jpg)
Example
firstname,lastname,yearIvan,Trojan,1964Jiří,Macháček,1966Jitka,Schneiderová,1973Zdeněk,Svěrák,1936Anna,Geislerová,1976
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 48
![Page 49: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/49.jpg)
Document StructureDocument
• Op onal header + list of records
namename ,, namename 0D 0A0D 0A
recordrecord 0D 0A0D 0A recordrecord
Record• Comma separated list of fields
valuevalue ,, valuevalue
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 49
![Page 50: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/50.jpg)
![Page 51: Version 17svoboda/courses/201-B4M36DS... · 2020. 9. 19. · Lecture 2 - Data Formats \(5. 10. 2020\) - Version 17 Author: Martin Svoboda - martin.svoboda@fel.cvut.cz - Czech Technical](https://reader033.vdocuments.mx/reader033/viewer/2022052816/60aac7c9d75f875b3138e864/html5/thumbnails/51.jpg)
Lecture ConclusionData formats
• Tree: XML, JSON• Graph: RDF• Rela onal: CSV
Binary serializa ons• BSON
B4M36DS2, BE4M36DS2: Database Systems 2 | Lecture 2: Data Formats | 5. 10. 2020 51