![Page 1: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/1.jpg)
Efficient Schemas in Motion with Kafka and Schema Registry
Pat Patterson
Community Champion
@metadaddy
![Page 2: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/2.jpg)
Enterprise Data DNA
Commercial Customers Across Verticals
250,000+ downloads
50+ of the Fortune 100
Doubling each quarter
Strong Partner Ecosystem Open Source Success
Mission: empower enterprises to harness their data in motion.
Who is StreamSets?
![Page 3: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/3.jpg)
Avro
Schema Registry
Demo
Agenda
![Page 4: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/4.jpg)
Joined ASF as a Hadoop subproject in 2009
Record-oriented serialization format
Binary (most common) and JSON (human readable) encodings
Apache Avro
![Page 5: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/5.jpg)
Avro Prehistory
![Page 6: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/6.jpg)
Schema defined in JSON• Relatively readable
Schema evolution• Can add new fields, rename fields in schema• Existing data can still be read under the new schema
Untagged binary data• Space-efficient!
Avro Advantages
![Page 7: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/7.jpg)
{
"type": "record",
"namespace": "com.example",
"name": "Person",
"fields": [
{ "name": "first_name", "type": "string" },
{ "name": "last_name", "type": "string" }
]
}
Avro Schema Definition
![Page 8: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/8.jpg)
• null: 0 bytes
• boolean: 1 byte
• int/long: variable-length, zig-zag encoded
• float/double: 4/8 bytes
• bytes: length as long, then data
• string: length as long, then UTF-8-encoded data
Avro Binary Encoding - Simple Types
![Page 9: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/9.jpg)
• Record: concatenate the field encodings
• Enum: zero-based index of symbol, as int
• Array: blocks of items, each preceded by a long count; zero count terminates array
• Map: blocks of K-V pairs, each preceded by a long count; zero count terminates array
• Union: position of item in schema as a long, then the item
• Fixed: the number of bytes defined in the schema
Avro Binary Encoding - Complex Types
![Page 10: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/10.jpg)
{
"type": "record",
"namespace": "com.example",
"name": "Person",
"fields": [
{ "name": "first_name", "type": "string" },
{ "name": "last_name", "type": "string" },
{ "name": "age", "type": "int", "default": -1 }
]
}
Avro Schema Evolution
![Page 11: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/11.jpg)
Compatibility Rules:• New fields must have a default• Deleted field must have had a default• Doc/Order can be added/removed/changed• Field default can be added/changed• Field/type aliases can be added/removed• Non-union can be converted to union with just that type, or vice
versa
General rule is that old data can be read under the new schema
Avro Schema Evolution
![Page 12: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/12.jpg)
Avro Schema Serialization
Various options, depending on file/message orientation, but, generally:• Metadata, including the schema• Data
Great for files - schema is sent just once, but what about messages?• Send just once? Periodically?• Send per message?• Agree out of band?
![Page 13: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/13.jpg)
Schema Overhead
Demo
![Page 14: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/14.jpg)
Online schema repository• Simple REST APIEach schema has an ID• Unique within the repositorySchemas versioned within subjects• Supports schema evolution• Subject loosely corresponds to topic• Subject + version -> ID
Schema Registry
![Page 15: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/15.jpg)
Register schema, registry returns an ID
Sender passes schema ID in each message
Recipient looks up ID in registry
Solves the Avro-by-Message Problem
![Page 16: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/16.jpg)
Schema By Reference
Demo
![Page 17: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/17.jpg)
Just register a new (compatible) schema via the same topic
Schema is assigned a new ID
Evolution with Schema Registry
![Page 18: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/18.jpg)
Schema Evolution
Demo
![Page 19: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/19.jpg)
Landoop schema-registry-uihttps://github.com/Landoop/schema-registry-ui
Bonus Feature: Web UI
![Page 20: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/20.jpg)
Schema Evolution Part Deux
Demo
![Page 21: Efficient Schemas in Motion with Kafka and Schema Registry](https://reader033.vdocuments.mx/reader033/viewer/2022051521/5a6ed43b7f8b9a42298b5881/html5/thumbnails/21.jpg)
Conclusion
Avro: a row-oriented, self-describing format for data serialization
Default Avro is inefficient in a message-passing setting
Referencing schema by ID dramatically reduces the volume of network traffic