Data Serialization Formats
Bhaskar S | 12/02/2010 |
Overview
In todays enterprises, we see heterogeneous computing environment with various applications running on different hardware and software platforms. The reality is that often times we are faced with the challenge of integrating some of these applications to solve a critical business need. To solve such integration problems, the current trend is to expose each of the well-defined business functionality as a service that lives and operates in its own environment independent of the other. To invoke the business functionality, the service exposes a well-defined interface through which one can exchange information using a predefined message format.
For Java applications, this means exchange of object(s) by invoking one or more remote (over the network) services. In order to exchange Java object(s) over the network, Java object(s) need to be serialized by the invoker (client) and deserialized by the service provider (server).
In this article, we will look at some of the popular data serialization formats, namely, XML, JSON, Java Serialization and Hessian and study their performance characteristics.
XML
XML stands for EXtensible Markup Language. XML is a flexible human-readable text based representation of structured information.
The following are some of the most important terms that are used to describe XML data:
Document (the entire XML data)
Tag (the name between the left angle bracket < and the right angle bracket >)
Start Tag (a left angle bracket <, a name, and a right angle bracket >)
End Tag (a left angle bracket, a slash /, a name, and a a right angle bracket >)
Element (a Start Tag, an End Tag and everything in between the tags)
Attribute (a name-value pair inside a Start Tag)
A valid XML document contains a single element called the root element which in turn contains all other elements.
XML is completely language independent and has parsers available for most of the popular programming languages.
JSON
JSON stands for JavaScript Object Notion . JSON is a human-readable lightweight text based object representation.
In JSON, an object is represented as a collection of name-value pairs enclosed in curly braces {}. JSON supports the following data types:
Boolean (true or false)
Numbers (Integer or Real)
String (Enclosed in double-quotes as unicode with escapement)
Array (An ordered sequence of comma separated values enclosed in square brackets [])
Null (null)
Though it has its roots to Javascript, it is completely language independent and has parsers available for most of the popular programming languages.
JSON is much more simple and compact compared to XML. As a result, JSON is faster to parse.
Java Serialization
Java Serialization is a mechanism that is built into the core Java library that allows any Java object to be represented as a binary sequence of bytes. For a Java object to be serialized, that object must implement either the Serializable or the Externalizable interface.
Since Java Serialization is a built-in mechanism, it supports all the data types support in Java including Objects.
Hessian
Hessian (pronounced Hesh-en) is a simple and compact binary object representation.
Hessian supports the following data types:
Boolean ('T' or 'F')
Int (32-bit signed integer)
Long (64-bit signed long)
Double (64-bit double)
Date (time in UTC as milliseconds since epoch)
String (UTF-8 encoded)
Null ('N')
List (for lists and arrays)
Map (for maps and dictionary)
Object (for objects)
Ref (for shared and circular object references)
Raw binary data
Hessian is both self-describing and portable across most of the popular programming languages.
Performance
Now that we have briefly described some of the popular data serialization formats, lets us study the performance characteristics for each of those formats.
To study the performance characteristics, we will use a fictitious object called Stock that represents the market data price (stock price) of a fictitious symbol.
The following Java object Range encapsulates the low and high price of a stock:
The following Java object Price encapsulates the open, close, bid, and ask price of a stock in addition to the volume and the date. Note that the Price object uses the Range object as its member:
And, finally the following Java object Stock encapsulates all the necessary market data details pertaining to the fictitious symbol. Note that the Stock object uses the Range object as well as an array of Price objects as its members:
We will use the Stock object listed above in our study of the performance characteristics of the above mentioned serialization formats, namely, XML, JSON, Java Serialization, and Hessian.
In our tests, we will measure the data size, the time for 1000 serialize operations and the time for 1000 deserialize operations. So without further ado, lets get started.
XML Serialization
For XML serialization, we will use the popular and flexible open source framework called XStream (http://xstream.codehaus.org/).
The following is the code listing for the Java class called XmlTest that performs the XML serialization tests using XStream:
Executing the Java class XmlTest produces the following results:
XML Output:
<Stock>
<symbol>ABCD</symbol>
<targetPrice>25.0</targetPrice>
<todaysPrice>
<open>17.25</open>
<close>17.2</close>
<bid>17.27</bid>
<ask>17.25</ask>
<volume>1000</volume>
<date>1975-02-01 10:32:45.761 EST</date>
<range>
<low>17.05</low>
<high>17.5</high>
</range>
</todaysPrice>
<fiftyTwoWeekRange>
<low>10.25</low>
<high>19.25</high>
</fiftyTwoWeekRange>
<fiveYearPrices>
<Price>
<open>10.25</open>
<close>11.5</close>
<bid>10.5</bid>
<ask>10.5</ask>
<volume>100</volume>
<date>1970-03-01 10:32:45.762 EST</date>
<range>
<low>10.25</low>
<high>11.5</high>
</range>
</Price>
<Price>
<open>11.25</open>
<close>12.5</close>
<bid>11.5</bid>
<ask>11.5</ask>
<volume>200</volume>
<date>1971-05-02 10:32:45.762 EDT</date>
<range>
<low>11.25</low>
<high>12.5</high>
</range>
</Price>
<Price>
<open>12.25</open>
<close>13.5</close>
<bid>12.5</bid>
<ask>12.5</ask>
<volume>300</volume>
<date>1972-07-03 10:32:45.762 EDT</date>
<range>
<low>12.25</low>
<high>13.5</high>
</range>
</Price>
<Price>
<open>14.25</open>
<close>15.5</close>
<bid>14.5</bid>
<ask>14.5</ask>
<volume>400</volume>
<date>1973-09-04 10:32:45.762 EDT</date>
<range>
<low>14.25</low>
<high>15.5</high>
</range>
</Price>
<Price>
<open>15.25</open>
<close>16.5</close>
<bid>15.5</bid>
<ask>15.5</ask>
<volume>700</volume>
<date>1974-11-05 10:32:45.762 EST</date>
<range>
<low>15.25</low>
<high>16.5</high>
</range>
</Price>
</fiveYearPrices>
</Stock>
XML Data size: 1837
Time for 1000 xml serialize operations: 973 ms
Stock Output:
ABCD, 25.0, 17.25, 17.2, 17.27, 17.25, 1000, Sat Feb 01 10:32:45 EST 1975, 17.05, 17.5
10.25, 19.25
<10.25, 11.5, 10.5, 10.5, 100, Sun Mar 01 10:32:45 EST 1970, 10.25, 11.5>
<11.25, 12.5, 11.5, 11.5, 200, Sun May 02 10:32:45 EDT 1971, 11.25, 12.5>
<12.25, 13.5, 12.5, 12.5, 300, Mon Jul 03 10:32:45 EDT 1972, 12.25, 13.5>
<14.25, 15.5, 14.5, 14.5, 400, Tue Sep 04 10:32:45 EDT 1973, 14.25, 15.5>
<15.25, 16.5, 15.5, 15.5, 700, Tue Nov 05 10:32:45 EST 1974, 15.25, 16.5>
Time for 1000 xml deserialize operations: 1154 ms
From the XML serialization test, we gather that the data size is about 1840 bytes, the time for 1000 serialize operations is about 970 ms, and the time for 1000 deserialize operations is about 1150 ms.
JSON Serialization
For JSON serialization, we will use the popular and flexible open source framework called Google Gson (http://code.google.com/p/google-gson/).
The following is the code listing for the Java class called JsonTest that performs the JSON serialization tests using Google Gson:
Executing the Java class JsonTest produces the following results:
JSON Output:
{"symbol":"ABCD","targetPrice":25.0,"todaysPrice":{"open":17.25,"close":17.2,"bid":17.27,
"ask":17.25,"volume":1000,"date":"Feb 1, 1975 10:35:52 AM","range":{"low":17.05,"high":17.5}},
"fiftyTwoWeekRange":{"low":10.25,"high":19.25},"fiveYearPrices":[{"open":10.25,"close":11.5,
"bid":10.5,"ask":10.5,"volume":100,"date":"Mar 1, 1970 10:35:52 AM","range":{"low":10.25,
"high":11.5}},{"open":11.25,"close":12.5,"bid":11.5,"ask":11.5,"volume":200,
"date":"May 2, 1971 10:35:52 AM","range":{"low":11.25,"high":12.5}},{"open":12.25,
"close":13.5,"bid":12.5,"ask":12.5,"volume":300,"date":"Jul 3, 1972 10:35:52 AM",
"range":{"low":12.25,"high":13.5}},{"open":14.25,"close":15.5,"bid":14.5,"ask":14.5,
"volume":400,"date":"Sep 4, 1973 10:35:52 AM","range":{"low":14.25,"high":15.5}},
{"open":15.25,"close":16.5,"bid":15.5,"ask":15.5,"volume":700,"date":"Nov 5, 1974 10:35:52 AM",
"range":{"low":15.25,"high":16.5}}]}
JSON Data size: 899
Time for 1000 json serialize operations: 865 ms
Stock Output:
ABCD, 25.0, 17.25, 17.2, 17.27, 17.25, 1000, Sat Feb 01 10:35:52 EST 1975, 17.05, 17.5
10.25, 19.25
<10.25, 11.5, 10.5, 10.5, 100, Sun Mar 01 10:35:52 EST 1970, 10.25, 11.5>
<11.25, 12.5, 11.5, 11.5, 200, Sun May 02 10:35:52 EDT 1971, 11.25, 12.5>
<12.25, 13.5, 12.5, 12.5, 300, Mon Jul 03 10:35:52 EDT 1972, 12.25, 13.5>
<14.25, 15.5, 14.5, 14.5, 400, Tue Sep 04 10:35:52 EDT 1973, 14.25, 15.5>
<15.25, 16.5, 15.5, 15.5, 700, Tue Nov 05 10:35:52 EST 1974, 15.25, 16.5>
Time for 1000 json deserialize operations: 921 ms
From the JSON serialization test, we gather that the data size is about 900 bytes, the time for 1000 serialize operations is about 865 ms, and the time for 1000 deserialize operations is about 920 ms.
Java Serialization
The following is the code listing for the Java class called JavaTest that performs the Java Serialization tests:
Executing the Java class JavaTest produces the following results:
Java Serialization Output:
ac ed 00 05 sr 00 %com.polarsparc.data.interchange.Stock 00 00 00 00 00 00 00 01 02
00 05 F 00 0b targetPriceL 00 11 fiftyTwoWeekRanget 00
'Lcom/polarsparc/data/interchange/Range;[ 00 0e fiveYearPricest 00
([Lcom/polarsparc/data/interchange/Price;L 00 06 symbolt 00 12 Ljava/lang/String;L 00 0b
todaysPricet 00 'Lcom/polarsparc/data/interchange/Price;xpA c8 00 00 sr 00
%com.polarsparc.data.interchange.Range 00 00 00 00 00 00 00 01 02 00 02 F 00 04
highF 00 03 lowxpA 9a 00 00 A$ 00 00 ur 00 ([Lcom.polarsparc.data.interchange.Price; f9
ae 3 ca ae a5 de af 02 00 00 xp 00 00 00 05 sr 00
%com.polarsparc.data.interchange.Price 00 00 00 00 00 00 00 01 02 00 07 F 00 03
askF 00 03 bidF 00 05 closeF 00 04 openJ 00 06 volumeL 00 04 datet 00 10
Ljava/util/Date;L 00 05 rangeq 00 ~ 00 01 xpA( 00 00 A( 00 00 A8 00 00 A$ 00 00 00
00 00 00 00 00 00 dsr 00 0e java.util.Datehj 81 01 KYt 19 03 00 00 xpw 08 00 00
00 01 3308xsq 00 ~ 00 06 A8 00 00 A$ 00 00 sq 00 ~ 00 0a A8 00 00 A8 00 00 AH 00 00
A4 00 00 00 00 00 00 00 00 00 c8 sq 00 ~ 00 0d w 08 00 00 00 09 c9 f7 b5
b8 xsq 00 ~ 00 06 AH 00 00 A4 00 00 sq 00 ~ 00 0a AH 00 00 AH 00 00 AX 00 00 AD 00
00 00 00 00 00 00 00 01 ,sq 00 ~ 00 0d w 08 00 00 00 12 f 19 85 b8 xsq 00 ~ 00
06 AX 00 00 AD 00 00 sq 00 ~ 00 0a Ah 00 00 Ah 00 00 Ax 00 00 Ad 00 00 00 00 00
00 00 00 01 90 sq 00 ~ 00 0d w 08 00 00 00 1b 02 ;U b8 xsq 00 ~ 00 06 Ax 00 00
Ad 00 00 sq 00 ~ 00 0a Ax 00 00 Ax 00 00 A 84 00 00 At 00 00 00 00 00 00 00
00 02 bc sq 00 ~ 00 0d w 08 00 00 00 # 99 m b8 8xsq 00 ~ 00 06 A 84 00 00 At 00
00 t 00 04 ABCDsq 00 ~ 00 0a A 8a 00 00 A 8a ( f6 A 89 99 9a A 8a 00 00 00 00
00 00 00 00 03 e8 sq 00 ~ 00 0d w 08 00 00 00 %^ 9d X7xsq 00 ~ 00 06 A 8c 00
00 A 88 ff
Java Serialization Data size: 941
Time for 1000 java serialize operations: 178 ms
Stock Output:
ABCD, 25.0, 17.25, 17.2, 17.27, 17.25, 1000, Sat Feb 01 10:39:19 EST 1975, 17.05, 17.5
10.25, 19.25
<10.25, 11.5, 10.5, 10.5, 100, Sun Mar 01 10:39:19 EST 1970, 10.25, 11.5>
<11.25, 12.5, 11.5, 11.5, 200, Sun May 02 10:39:19 EDT 1971, 11.25, 12.5>
<12.25, 13.5, 12.5, 12.5, 300, Mon Jul 03 10:39:19 EDT 1972, 12.25, 13.5>
<14.25, 15.5, 14.5, 14.5, 400, Tue Sep 04 10:39:19 EDT 1973, 14.25, 15.5>
<15.25, 16.5, 15.5, 15.5, 700, Tue Nov 05 10:39:19 EST 1974, 15.25, 16.5>
Time for 1000 java deserialize operations: 314 ms
From the Java serialization test, we gather that the data size is about 940 bytes, the time for 1000 serialize operations is about 180 ms, and the time for 1000 deserialize operations is about 310 ms.
Hessian Serialization
For Hessian serialization, we will use the popular and flexible open source framework called Caucho Hessian (http://hessian.caucho.com/).
The following is the code listing for the Java class called HessianTest that performs the Hessian serialization tests using Caucho Hessian:
Executing the Java class HessianTest produces the following results:
Hessian Output:
p 02 00 C0%com.polarsparc.data.interchange.Stock 95 06 symbol 0b targetPrice 0b todaysPrice
11 fiftyTwoWeekRange 0e fiveYearPrices` 04 ABCD] 19 C0%com.polarsparc.data.interchange.Price
97 04 open 05 close 03 bid 03 ask 06 volume 04 date 05 rangea_ 00 00 CbD@133@ 00 00 00
D@1E 1e c0 00 00 00 _ 00 00 Cb fb e8 J 00 00 00 %^ a1 ab e2
C0%com.polarsparc.data.interchange.Range 92 03 low 04 highbD@1 0c cc c0 00 00 00 _ 00
00 D\b_ 00 00 ( 0a _ 00 00 K2u0&[com.polarsparc.data.interchange.Pricea_ 00 00 ( 0a _ 00
00 , ec _ 00 00 ) 04 _ 00 00 ) 04 f8 dJ 00 00 00 01 37 83 e4 b_ 00 00 ( 0a _ 00 00 ,
ec a_ 00 00 + f2 _ 00 00 0 d4 _ 00 00 , ec _ 00 00 , ec f8 c8 J 00 00 00 09 c9 fc
09 db_ 00 00 + f2 _ 00 00 0 d4 a_ 00 00 / da _ 00 00 4 bc _ 00 00 0 d4 _ 00 00 0 d4 f9
,J 00 00 00 12 f 1d d9 db_ 00 00 / da _ 00 00 4 bc a_ 00 00 7 aa _ 00 00 < 8c _ 00
00 8 a4 _ 00 00 8 a4 f9 90 J 00 00 00 1b 02 ? a9 db_ 00 00 7 aa _ 00 00 < 8c a_ 00
00 ; 92 _ 00 00 @t_ 00 00 < 8c _ 00 00 < 8c fa bc J 00 00 00 # 99 r 0b e4 b_ 00 00 ;
92 _ 00 00 @tz
Hessian Data size: 567
Time for 1000 hessian serialize operations: 98 ms
Stock Output:
ABCD, 25.0, 17.25, 17.2, 17.27, 17.25, 1000, Sat Feb 01 10:44:03 EST 1975, 17.05, 17.5
10.25, 19.25
<10.25, 11.5, 10.5, 10.5, 100, Sun Mar 01 10:44:03 EST 1970, 10.25, 11.5>
<11.25, 12.5, 11.5, 11.5, 200, Sun May 02 10:44:03 EDT 1971, 11.25, 12.5>
<12.25, 13.5, 12.5, 12.5, 300, Mon Jul 03 10:44:03 EDT 1972, 12.25, 13.5>
<14.25, 15.5, 14.5, 14.5, 400, Tue Sep 04 10:44:03 EDT 1973, 14.25, 15.5>
<15.25, 16.5, 15.5, 15.5, 700, Tue Nov 05 10:44:03 EST 1974, 15.25, 16.5>
Time for 1000 hessian deserialize operations: 148 ms
From the Hessian serialization test, we gather that the data size is about 570 bytes, the time for 1000 serialize operations is about 100 ms, and the time for 1000 deserialize operations is about 150 ms.
Summary
Now to summarize our test results:
XML :: Size (in bytes) = 1840, Serialization (ms) = 970, Deserialization (ms) = 1150
JSON :: Size (in bytes) = 900, Serialization (ms) = 865, Deserialization (ms) = 920
Java :: Size (in bytes) = 940, Serialization (ms) = 180, Deserialization (ms) = 310
Hessian :: Size (in bytes) = 570, Serialization (ms) = 100, Deserialization (ms) = 150
For the test results its clear that Hessian is the most optimal format for object serialization in terms of both size and speed.
Note that XML and JSON serialization formats are human-readable text based while Java and Hessian serialization are binary. If you prefer the text based serialization then JSON may be a better choice. On the other hand, if you prefer more compact binary serialization, then Hessian may be the way to go.
NOTE :: For Java Serialization, we could have implemented the Externalizable interface to optimize the size and speed of serialization