We will demonstrate the ability to both serialize and deserialize
using
Avro with a simple Contact
schema.
The following is the schema definition for a language independent
Contact object:
Let us understand the Avro schema
shown in Listing.1 above:
The schema is defined using the JSON
notation
The schema is defined in a file named Contact.avsc
The type field identifies the
field type. The type
of the top-level schema must always be a record
The namespace field is similar
to
Java package
The name field identifies the
name
of the data object (similar to Java class
name)
fields defines the actual schema
layout.
It is an array of fields, where each field is an object with a name and a type
For our simple Contact schema, we
have
defined the fields: First, Last, Email, Work, Home,
and Mobile
Notice the type definition for the
fields Work, Home,
and Mobile; it is defined as an array of two
values ["string", "null"].
This is actually an union type
indicating
either a string or a null
The following is the Java test class Contact
that demonstrates the ability to serialize and deserialize (both binary
and
JSON) of the schema defined in Listing.1
without
code generation using the Avro API:
Compiling and executing the test code from Listing.2
produces the following output:
Output.1
=> Serialized binary data size: 44 => Serialized json: {"First":"John","Last":"Doe","Email":"john.doe@space.com","Work":null,"Home":null,"Mobile":{"string":"123-456-7890"}} => Serialized json data size: 117 => Binary deserialized record: John Doe - john.doe@space.com => Json deserialized record: John Doe - 123-456-7890
We have sucessfully demonstrated our first Avro
example.
Next, we will demonstrate the ability to both serialize and
deserialize
a schema that contains nested records. We will use a simple
Customer schema in this example.
The following is the schema definition for a language independent
Customer object:
Notice the type definition for the field
named Contacts; it is an inner record.
The following is the Java test class Customer
that demonstrates the ability to serialize and deserialize (both binary
and
JSON) of the schema defined in Listing.3
without
code generation using the Avro API:
Compiling and executing the test code from Listing.4
produces the following output:
Output.2
=> Serialized binary data size: 44 => Serialized json: {"First":"John","Last":"Doe","Contacts":{"Email":"john.doe@space.com","Work":null,"Home":null,"Mobile":{"string":"123-456-7890"}}} => Serialized json data size: 130 => Binary deserialized record: John Doe - john.doe@space.com => Json deserialized record: John Doe - 123-456-7890
We have sucessfully demonstrated our second Avro
example with nested inner record.
Finally, we will demonstrate schema evolution where the serializer
may
use version 1 of schema and the deserializer may use an updated to
version 2
of the schema. We will use the Customer
schema from the earlier example.
Avro supports schema evolution by using
a separate schemas for the serializer and the deserialzer.
For our example, the serializer will use the schema from
Listing.3.
The following is the updated schema definition for the
Customer object:
The serializer will use the schema from Listing.3
while the deserialzer will use the schema from Listing.5.
A schema can evolve as a result of:
New field(s) being added
Existing field(s) being renamed
Existing field(s) being deleted
Comparing Listing.3 with Listing.5,
it is clear that the changes are in the inner record Contacts.
The field Zipcode has been added (new
field). Since the
serialized data will not have this field, what value would the
deserialized record have ?
Hence the need to specify the default
attribute with a value of empty string.
The field Mobile has been renamed
to Primary. To indicate this change
in the new schema, we specify the aliases
attribute.
Similarly, the field Home has been
renamed to Secondary.
The following is the Java test class SchemaChange
that demonstrates the ability to serialize using the schema
in Listing.3 deserialize using the schema
in Listing.5 without
code generation using the Avro API:
Compiling and executing the test code from Listing.6
produces the following output:
Output.3
=> Serialized json (V1): {"First":"John","Last":"Doe","Contacts":{"Email":"john.doe@space.com","Work":null,"Home":null,"Mobile":{"string":"123-456-7890"}}} => Serialized json (V1) data size: 130 => Json (V2) deserialized record: John Doe - 123-456-7890, Zip:
We have sucessfully demonstrated our final Avro
example on schema evolution.