CSC/ECE 517 Spring 2014/ch1a 1d mm: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
(Created page with "== '''Serialization''' == '''Serialization'''[http://en.wikipedia.org/wiki/Serialization] is the process of converting an object or a group of objects into a stream of bytes or...")
 
No edit summary
Line 2: Line 2:




'''Serialization'''[http://en.wikipedia.org/wiki/Serialization] is the process of converting an object or a group of objects into a stream of bytes or string to facilitate storage in memory or transmission over a network. The process of Serialization is also referred to as ''Marshalling''. The stream of data has to be in a format that can be understood by both ends of a communication channel so that the object can be marshaled and reconstructed easily.  
'''Serialization'''[http://en.wikipedia.org/wiki/Serialization] is a process of converting a data structure or an object into a stream of bytes or string to facilitate storage in memory, file(persistence storage) or transmission over a network. The process of Serialization is also referred to as ''Marshalling''[http://en.wikipedia.org/wiki/Marshalling_%28computer_science%29]. The stream of data has to be in a format that can be understood by both ends of a communication channel so that the object can be marshaled and reconstructed easily.  


'''Basic Advantages of Serialization''':
'''Basic Advantages of Serialization''':


1.Facilitates the easy transportation of an object through a network.
1. Communication between two or more processes on same machine. Object state can be saved and shared in a persistent or in-memory store.


2.Creates a clone of the object at the receiving end.
2. Communication between processes on different machines. Serialization facilitates the transmission of an object through a network.


3.Ability to transmit data across the network in a cross-platform-compatible format.
3. Creating a clone of an object.
 
4. Cross-platform compatibility. Object can be serialized in a common format that is understood by multiple platforms. Eg. JSON, XML.


4.Saving the data in a persistent or non-persistent storage medium in a non-proprietary format.


'''De-serialization''' is the process of converting the stream of bytes or string back to objects in memory. It is the process of reconstructing the object later.This process of de-serialization is also referred to as ''Unmarshalling''.
'''De-serialization''' is the process of converting the stream of bytes or string back to objects in memory. It is the process of reconstructing the object later.This process of de-serialization is also referred to as ''Unmarshalling''.
'''Few Practical Applications for Serialization'''
1. HTTP Session Replication by sharing session objects across web servers for handling failover scenarios
2. Serialization facilitates communication in Remote Method Invocation or Remote procedure calls
3. Rails Cookie Handling[http://techbrahmana.blogspot.com/2012/03/rails-cookie-handling-serialization-and.html]. Cookies are stored marshalled/unmarshalled to and from client machines.


[[File:SerializationChart.jpg]]
[[File:SerializationChart.jpg]]
Line 20: Line 31:
== '''Serialization in Ruby:''' ==
== '''Serialization in Ruby:''' ==


Let us consider a situation where two Ruby programs have to communicate with each other. One of the simplest way to do this is to convert the Ruby objects in the first programs into strings and writing these strings into a file. This is nothing but serialization. The second program can read this file and convert the strings back into Ruby objects. This is de-serialization.
Ruby provides serialization capabilities through its module, [http://www.ruby-doc.org/core-2.0.0/Marshal.html Marshal]. There are also other libraries like YAML and JSON which can be used in Ruby to generate serialized objects for purposes like platform independence and human readable formats.


== '''Types of Serialization''' ==
== '''Types of Serialization''' ==
Line 31: Line 42:
==== Converting Ruby Objects to YAML format ====
==== Converting Ruby Objects to YAML format ====


YAML[http://yaml.org/] format is a human friendly data serialization standard for all programming languages. YAML (YAML Ain't Markup Language) is perhaps the most common form of serialization in Ruby applications. It is used for configuration files in Rails and other projects, and is nearly ubiquitous. YAML is a plaintext format, as opposed to Marshal's[http://www.ruby-doc.org/core-2.0.0/Marshal.html] binary format. Immediately, this makes things easier. Objects stored as YAML are completely transparent and editable with nothing more than a text editor. It also has a simple, spartan syntax that's easy to look at and easy to type. It is not encumbered by excessive wordage and symbols seen in XML. Any Ruby object can easily be serialized into YAML format. Let us consider the below code,<
YAML[http://yaml.org/] format is a human friendly data serialization standard for all programming languages. YAML (YAML Ain't Markup Language) is perhaps the most common form of serialization in Ruby applications. It is used for configuration files in Rails and other projects, and is nearly ubiquitous. YAML is a plaintext format, as opposed to Marshal's[http://www.ruby-doc.org/core-2.0.0/Marshal.html] binary format. Objects stored as YAML are completely transparent and editable with nothing more than a text editor. It also has a simple, spartan syntax that's easy to look at and easy to type. It is not encumbered by excessive wordage and symbols seen in XML. In order to use it in Ruby, the yaml.rb file is required to be loaded which provides methods for converting objects into yaml format and creating .yml files.


     <nowiki>require "yaml"
'''Examples of serialization using YAML:'''
 
     <nowiki> #Serialization using YAML's to_yaml method
  require "yaml"
     class First
     class First
     def initialize(name, age, country)
     def initialize(name, age, country)
Line 40: Line 55:
@country=country
@country=country
     end
     end
 
      def to_s
    def to_s
"In First:\n#{@name}, #{@age}, #{@country}\n"
"In First:\n#{@name}, #{@age}, #{@country}\n"
    end
  end
  class Second
    def initialize(address, details)
@address = address
@details = details
    end
    def to_s
"In Second:\n#{@details.to_s}#{@address}\n"
     end
     end
   end
   end
   
   
   x = First.new("Tom", 25, "USA")
   x = First.new("Tom", 25, "USA")
   y = Second.new("St. Marks Street", x)
   puts x
   puts y</nowiki>
   puts x.to_yaml</nowiki>
   
   
We get the string representation of the object tree(object hierarchy) as the Output (because we have used the function to_s[http://ruby-doc.org/core-2.0.0/Object.html#method-i-to_s]).
 
   
   
'''Output''':
'''Output''':
   
   
<code>
<code>
In Second:<br>
  In First:
In First:<br>
  Tom, 25, USA
Tom, 25, USA<br>
  --- !ruby/object:First
St. Marks Street</code>
  name: Tom
  age: 25
  country: USA
</code>
   
   
We use the below code to serialize out object tree.
The above code displays the object x, first as a string and then in the yaml format.
 
'''Saving YAML data into a file:'''
 
<nowiki> # Serialization using YAML::dump
 
require 'yaml'
 
f = File.open( 'first.yml', 'w' )
YAML.dump( ["Tom", 25, "USA"], f )
f.close
File.open( 'first.yml' ){ |f|
    $arr= YAML.load(f)
}
 
p( $arr )</nowiki>
 
   
   
<code>serialized_object = YAML::dump(y)
'''Output''':
puts serialized_object</code>
 
<code>
  ["Tom", 25, "USA"]
</code>
 
 
The dump function can be used to serialize the data and save it into a file in YAML format. As shown in the above example the data in YAML format can be de-serialized using the load function.
 
YAML libraries also provides an option of selecting only those variables of the object that are needed to be serialized. This is done using the to_yaml_properties method as shown in the below example.
 
 
  <nowiki>  #Custom Serialization using YAML
   
   
The dump function serializes the object tree and stores the data in the YAML format in the variable serialized_object.
  require "yaml"
    class First
    def initialize(name, age, country)
@name = name
@age = age
@country=country
    end
    def to_s
"In First:\n#{@name}, #{@age}, #{@country}\n"
    end
      def to_yaml_properties
  ["@name","@age"]  #@country will not be serialized
    end
  end
   
   
Data in the serialized (YAML) format looks like this:
  x = First.new("Tom", 25, "USA")
  puts x
<code>
  puts x.to_yaml
--- !ruby/object:Second
  </nowiki>
address: St. Marks Street
details: !ruby/object:First
name: Tom
age: 25
country: USA</code>
Now, to de-serialize the data, we use load function.
<code>puts YAML::load(serialized_object)</code>
The data is converted back to Ruby object tree.


'''Output:'''
'''Output''':


<code>In Second:
<code>
In First:
  In First:
Tom, 25, USA
  Tom, 25, USA
St. Marks Street
  --- !ruby/object:First
</code>  
  name: Tom
  age: 25
</code>


Thus we get back our original Object tree.


==== Converting Ruby Objects to JSON format: ====
==== Converting Ruby Objects to JSON format: ====


   
   
JSON[http://www.json.org/] is a light-weight data interchange format. JSON is typically generated by web applications and can be quite daunting, with deep hierarchies that are difficult to navigate. Any Ruby object can easily be serialized into JSON format. On Ruby 1.8.7, you'll need to install a gem. However, in Ruby 1.9.2, the json gem is bundled with the core Ruby distribution. So, if you're using 1.9.2, you're probably all set. If you're on 1.8.7, you'll need to install a gem.[http://ruby.about.com/od/tasks/a/The-Json-Gem.htm]
JSON[http://www.json.org/] is a light-weight data interchange format. JSON is typically generated by web applications and can be quite daunting, with deep hierarchies that are difficult to navigate. Any Ruby object can easily be serialized into JSON format. Ruby 1.8.7 distribution is not bundled with json gem. However, in Ruby 1.9.2, the json gem is bundled with the core Ruby distribution.
The JSON library can be installed using Ruby Gems[http://rubygems.org/] like shown below:
The JSON library can be installed using Ruby Gems[http://rubygems.org/] as shown below:
   
   
<code># gem install json</code>
<code># gem install json</code>
   
   
We can create a JSON string for serialization by using the JSON.generate method as below:
A JSON string for serialization can be created by using the JSON.generate method:
   
   
<code>
<code>
Line 124: Line 161:
<code>{"{\"Welcome\":\"Ruby\"}"=>"{\"WELCOME\":\"RUBY\"}"}</code>
<code>{"{\"Welcome\":\"Ruby\"}"=>"{\"WELCOME\":\"RUBY\"}"}</code>
   
   
We can parse the JSON string received from another program by using JSON.parse
A JSON string received from another program can be parsed by using JSON.parse
Ruby thus converts String to Hash.
Ruby thus converts String to Hash.
   
   
Line 134: Line 171:


=== Converting Ruby Objects to Binary Formats ===
=== Converting Ruby Objects to Binary Formats ===
Binary Serialization is another form of serialization in Ruby which is not in human readable form. It is similar to YAML Serialization. Binary Serialization  is done using Marshal[http://www.ruby-doc.org/core-2.0.0/Marshal.html].  Binary Serialization is used when high performance serialization and de-serialization process is required and when the contents are not required to be in readable format.
Binary Serialization is another form of serialization in Ruby which is not in human readable form. Binary Serialization is used when high performance serialization and de-serialization process is required and when the contents are not required to be in readable format. Binary Serialization  is done using [http://www.ruby-doc.org/core-2.0.0/Marshal.html Marshal] which is built into Ruby and the code for it is written in Ruby's Marshal module(marshal.c) and thus no additional files are required in order to use it. The _dump and _load methods defined in marshal are used for serialization. Using marshal module the following types of objects can not be serialized: bindings, procedure objects, singleton objects, instances of IO objects and interfaces.


Since the Binary Serialized data is not in human readable form, there are two essential guidelines that need to be followed. They are :
Since the Binary Serialized data is not in human readable form, the following guidelines need to be followed.
     1.Use print[http://ruby-doc.org/core-2.0.0/ARGF.html#method-i-print] instead of puts[http://ruby-doc.org/core-2.0.0/ARGF.html#method-i-puts] when serialized objects are written to a file in order to avoid new line characters to be written  
     1.Use print instead of puts[http://ruby-doc.org/core-2.0.0/ARGF.html#] when serialized objects are written to a file in order to avoid new line characters to be written  
       in the file.
       in the file.
      
      
     2.Use a record separator in order to differentiate between two objects.
     2.Use a record separator in order to differentiate between two objects.


<code>Binary Serialization Example </code>
 
'''Binary Serialization Example:'''


<code>
<code>
     class Animal
     class Animal
    def initialize  name, age
      def initialize  name, age
      @name = name
      @name = name
      @age=age
      @age=age
      puts "#{self.class.name}"
    end
     end
     end
    class Cat < Animal
  end
    def to_s
  class Cat < Animal
      "In Cat C: #{@name} \t #{@age}"
    def to_s
    end
    "In Cat C: #{@name} \t #{@age}"
     end
     end
    class Dog < Animal
  end
    def to_s
  c = Cat.new("Kitty Kat",5)
      puts "In Dog D: #{@name} \t #{@age}"
  puts "Before Serialization"
    end
  puts c
    end
  #puts d
  d = Dog.new("Doggy Dig", 4)
  serialize_cat= Marshal.dump(c) #dumps the serialized cat object into serialize_cat
  c = Cat.new("Kitty Kat",5)
  puts "\nAfter Serialization:\n #{serialize_cat}"
  puts "Before Serialization"
  deserialize_cat= Marshal::load(serialize_cat) #deserializes the cat object and loads it back into deserialize_cat
  puts c
  puts "\nAfter Deserialization\n #{deserialize_cat}"
  puts d
  serialize_cat= Marshal.dump(c) #dumps the serialized cat object into serialize_cat
  serialize_dog= Marshal.dump(d) #dumps the serialized dog object into serialize_dog
  deserialize_cat= Marshal::load(serialize_cat) #deserializes the cat object and loads it back into deserialize_cat
  deserialize_dog= Marshal::load(serialize_dog) #deserializes the dog object and loads it back into deserialize_dog
  puts "After Serialization #{deserialize_cat}"
  puts "After Dog Serialization #{deserialize_dog}"
</code>
</code>


Output
'''Output:'''
 
<code>
<code>
   Before Serialization
   Before Serialization
   In Cat C: Kitty Kat 5
   In Cat C: Kitty Kat 5
   In Dog D: Doggy Dig 4
   After Serialization:
   After Serialization In Cat C: Kitty Kat 5
  oCat:
   After Dog Serialization In Dog D: Doggy Dig 4
  @nameI"Kitty Kat:ET: @agei
   After Deserialization
  In Cat C: Kitty Kat 5
</code>
 
Similar to YAML, Marshal can also be used to dump data into a file. The above example showing the serialization using YAML::dump can be written using marshal as shown below.
 
<code>
    f = File.open( 'first.yml', 'w' )
    Marshal.dump( ["Tom", 25, "USA"], f )
    f.close
    File.open( 'first.yml' ){ |f|
    $arr= Marshal.load(f)
    }
    p( $arr )
</code>
'''Output''':
 
<code>
   ["Tom", 25, "USA"]
</code>
 
Notice that in the above example there is no "require" statement as opposed to the earlier example of writing serialized data into files using YAML. This is because unlike YAML, marshal is built-in Ruby and  no external libraries is required in order to use its functionality.
 
Marshal can also be used to custom serialize an object i.e. it provides an option to omit the variables of an object that are not required in the serialized data. The following program shows the use of marshal_dump method for achieving custom serialization.
 
<code>
 
      class First
      def initialize(name, age, country)
        @name = name
        @age = age
        @country=country
      end
      def to_s
            "In First: #{@name}, #{@age}, #{@country}"
      end
      def marshal_dump
            [@name,@age]  #@country will not be serialized
      end
      def marshal_load(data)
      @name=data[0]
      @age=data[1]
      @country="United States of America"
      end
      end
      x = First.new("Tom", 25, "USA")
      puts x
 
      marshal_data = Marshal.dump( x )
      y = Marshal.load( marshal_data )
      p( y.to_s )
 
 
</code>
 
'''Output:'''
 
<code>
    In First: Tom, 25, USA
    "In First: Tom, 25, United States of America"
</code>
</code>
== '''Serialization Performance in Ruby''' ==
The different serialization formats described above differ in the efficiency at which they can serialize and deserialize data and thus while serializing large amount of data their efficiency is taken into account. The .report[http://www.ruby-doc.org/stdlib-1.9.3/libdoc/benchmark/rdoc/Benchmark.html] method in Ruby can be used to evaluate the performance of these serialization patterns. Marshal, as it handles data in binary format, is the most efficient form of serialization in Ruby. The following example compares the performance of marshal, JSON and YAML format.
<code>
  require 'benchmark'
  require 'rubygems'
  require 'json'
  require 'yaml'
  include Benchmark
  class First
    def initialize(name, age, country)
      @name=name
      @age=age
      @country=country
    end
  end 
    x = First.new("Tom", 25, "USA")
  benchmark do |t|
    print "Marshal:"
    t.report{1000.times do; Marshal.load(Marshal.dump(x));end}
    print "JSON:"
    t.report{1000.times do; JSON.load(JSON.dump(x));end}
    print "YAML:"
    t.report{1000.times do; YAML.load(YAML.dump(x));end}
  end
</code>
'''Output:'''
<code>
  Marshal:  0.020000  0.000000  0.020000 (  0.015473)
  JSON:  0.020000  0.000000  0.020000 (  0.023934)
  YAML:  0.460000  0.010000  0.470000 (  0.476826)
</code>
The output of the above example shows that, given large chunks of data, marshal is comparatively more efficient format for serialization in Ruby.


== Serialization in OOLS Languages: Comparison ==
== Serialization in OOLS Languages: Comparison ==
{| class="wikitable"
{| class="wikitable"
|-
|-
! Sl.No !! Ruby !! Java !! .Net Framework
! Sl.No !! Ruby !! Java !! .Net Framework !! C++
|-
|-
| 1 || Ruby provides a module called [http://www.ruby-doc.org/core-2.0.0/Marshal.html Marshal] for serialization || Java uses an Interface named [http://docs.oracle.com/javase/6/docs/api/java/io/Serializable.html Serializable] interface for classes to implement || .Net provides a [http://msdn.microsoft.com/en-us/library/ms973893.aspx Serializable] Attribute  
| 1 || Ruby provides a built in module called [http://www.ruby-doc.org/core-2.0.0/Marshal.html Marshal] for serialization || Java uses an Interface named [http://docs.oracle.com/javase/6/docs/api/java/io/Serializable.html Serializable] interface for classes to implement || .Net provides a [http://msdn.microsoft.com/en-us/library/ms973893.aspx Serializable] Attribute || Although, there is no built in support for serialization in C++, it can be achieved by using Boost libraries[http://www.boost.org/doc/libs/1_36_0/libs/serialization/example/demo.cpp]
|-
|-
| 2 || Ruby uses JSON to make it platform independent || An object can be serialized in one platform and de-serialized in another platform || .Net used Remoting technology to make it platform independent.
| 2 || The built in module of Ruby [http://www.ruby-doc.org/core-2.0.0/Marshal.html (Marshal)] does not support platform independence, however, it can be achieved by using external libraries like [http://yaml.org YAML] and [http://www.w3schools.com/json/ JSON] || Similarly, Java's built in serialization is also not platform independent and in order to use serialization in Java across Ruby platform, jruby library should be used.  || .Net used Remoting technology to make it platform independent. || Serialization using the Boost libraries is not platform independent.
|-
|-
| 3 || Ruby serializes an Object as a whole.|| Provides an option for serializing only the required methods/attributes to be serialized for an object. Use the keyword [http://docs.oracle.com/javase/7/docs/platform/serialization/spec/serial-arch.html Transient] to ignore certain methods that doesn’t need to be serialized || [http://msdn.microsoft.com/en-us/library/ms973893.aspx XML Serializer] sets  [http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlattributes.xmlignore.aspx XmlIgnoreProperty] to true to ignore the default serialization of a field or a property
| 3 || YAML provides a method [http://www.ruby-doc.org/stdlib-1.9.3/libdoc/syck/rdoc/Object.html (to_yaml_properties)] which can be used to select the variables who's value is need to be serialized. With Marshal, we need to write a method named marshal_dump defining the variables of an object that has to be serialized. || Provides an option for serializing only the required attributes to be serialized for an object. Use the keyword [http://docs.oracle.com/javase/7/docs/platform/serialization/spec/serial-arch.html Transient] to ignore certain data that doesn’t need to be serialized || [http://msdn.microsoft.com/en-us/library/ms973893.aspx XML Serializer] sets  [http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlattributes.xmlignore.aspx XmlIgnoreProperty] to true to ignore the default serialization of a field or a property || Serialization using the Boost libraries is custom and thus the user can specify the part of the objects to be serialized.
|-
| 4 || Bindings, procedure objects, singleton objects, instances of IO objects and interfaces can not be serialized. Serializing these objects throws TypeError exceptions. || Thread, OutputStream and its subclasses, and Socket are not serializable in Java.[http://www.oracle.com/technetwork/articles/java/javaserial-1536170.html] || Objects like DataRow are non serializable in .NET. Whether an object is serializable or not can be checked using the [http://msdn.microsoft.com/en-us/library/system.type.isserializable(v=vs.110).aspx?cs-save-lang=1&cs-lang=cpp#code-snippet-1 Type.IsSerializable] || There are no such objects in C++.
|}
|}


Line 203: Line 335:


3. [http://gregmoreno.wordpress.com/2011/01/27/preventing-model-explosion-via-rails-serialization/ Article on Rails Serialization]
3. [http://gregmoreno.wordpress.com/2011/01/27/preventing-model-explosion-via-rails-serialization/ Article on Rails Serialization]
4. [http://en.wikipedia.org/wiki/Google_Protocol_Buffers Protocol Buffers]
5. [https://rubygems.org/gems/ruby-protocol-buffers Ruby Protocol Buffers Gem]
6. [http://en.wikipedia.org/wiki/Apache_Avro Avro]
7. [http://rubygems.org/gems/avro Ruby Avro Gem]
== References ==
== References ==


1. [http://en.wikipedia.org/wiki/Serialization Serilization in General]
1. [http://en.wikipedia.org/wiki/Serialization Serilization in General]


2. [http://yaml.org YAML]
2. [http://en.wikipedia.org/wiki/Marshalling_%28computer_science%29 Marshaling]
 
3. [http://techbrahmana.blogspot.com/2012/03/rails-cookie-handling-serialization-and.html Rails Cookie Handling]
 
4. [http://yaml.org/ YAML]
 
5. [http://www.ruby-doc.org/core-2.1.0/Marshal.html Marshal module]
 
6. [http://json.org/ JSON]
 
7. [http://rubygems.org/ Ruby Gems]
 
8. [http://ruby-doc.org/core-2.0.0/ARGF.html# Ruby ARGF Documentation]
 
9. [http://www.ruby-doc.org/stdlib-1.9.3/libdoc/benchmark/rdoc/Benchmark.html Ruby Documentation for Benchmark Gem]
 
10. [http://www.boost.org/doc/libs/1_36_0/libs/serialization/example/demo.cpp Boost Libraries]


3. [http://www.w3schools.com/json/ JSON]
11. [http://www.oracle.com/technetwork/articles/java/javaserial-1536170.html Java Serialization]


4. [http://www.skorks.com/2010/04/serializing-and-deserializing-objects-with-ruby/ Serializing and De-serializing in Ruby]
12. [http://www.w3schools.com/json/ JSON Tutorial]


5. [http://www.codeproject.com/Articles/33296/Serialization-and-De-serialization Serialization and De-serialization]
13. [http://www.skorks.com/2010/04/serializing-and-deserializing-objects-with-ruby/ Serializing and De-serializing in Ruby]


6. [http://www.ruby-doc.org/core-2.0.0/Marshal.html Marshal]
14. [http://www.codeproject.com/Articles/33296/Serialization-and-De-serialization Serialization and De-serialization]


7. [http://www.waset.org/journals/waset/v60/v60-39.pdf Object Serialization Techniques]
15. [http://msdn.microsoft.com/en-us/library/182eeyhh.aspx XML Serialization]


8. [http://msdn.microsoft.com/en-us/library/182eeyhh.aspx XML Serialization]
16. [http://www.tutorialspoint.com/java/java_serialization.htm Java Serialization Tutorial]


9. [http://www.tutorialspoint.com/java/java_serialization.htm Java Serialization]
17. [http://www.sapphiresteel.com/ruby-programming/The-Book-Of-Ruby The Book of Ruby]


10.[http://www.codeproject.com/Articles/20962/Introducing-Serialization-in-NET Serialization in .Net]
18. [http://www.codeproject.com/Articles/20962/Introducing-Serialization-in-NET Serialization in .Net]


11.[http://json.org JSON]
19. [http://ruby.about.com/od/advancedruby/ss/Serialization-In-Ruby-Yaml.html Serialization in Ruby YAML]


12.[http://ruby.about.com/od/advancedruby/ss/Serialization-In-Ruby-Yaml.html Serialization in Ruby YAML]
20. [http://ruby.about.com/od/tasks/a/The-Json-Gem.html Installing JSON gem]


13.[http://ruby.about.com/od/tasks/a/The-Json-Gem.html Installing JSON gem]
21. [http://www.ruby-doc.org/stdlib-2.0.0/libdoc/json/rdoc/JSON.html Serialization in Ruby JSON]

Revision as of 03:38, 11 February 2014

Serialization

Serialization[1] is a process of converting a data structure or an object into a stream of bytes or string to facilitate storage in memory, file(persistence storage) or transmission over a network. The process of Serialization is also referred to as Marshalling[2]. The stream of data has to be in a format that can be understood by both ends of a communication channel so that the object can be marshaled and reconstructed easily.

Basic Advantages of Serialization:

1. Communication between two or more processes on same machine. Object state can be saved and shared in a persistent or in-memory store.

2. Communication between processes on different machines. Serialization facilitates the transmission of an object through a network.

3. Creating a clone of an object.

4. Cross-platform compatibility. Object can be serialized in a common format that is understood by multiple platforms. Eg. JSON, XML.


De-serialization is the process of converting the stream of bytes or string back to objects in memory. It is the process of reconstructing the object later.This process of de-serialization is also referred to as Unmarshalling.


Few Practical Applications for Serialization

1. HTTP Session Replication by sharing session objects across web servers for handling failover scenarios

2. Serialization facilitates communication in Remote Method Invocation or Remote procedure calls

3. Rails Cookie Handling[3]. Cookies are stored marshalled/unmarshalled to and from client machines.


Serialization in Ruby:

Ruby provides serialization capabilities through its module, Marshal. There are also other libraries like YAML and JSON which can be used in Ruby to generate serialized objects for purposes like platform independence and human readable formats.

Types of Serialization

Serialization in Ruby can be done in two ways. During serialization, the object in memory can be converted into Human Readable formats like YAML (YAML Ain’t Markup Language) and JSON (JavaScript Object Notation), or the object can be converted into binary format.

Converting Ruby Objects in Human Readable Formats

The conversion of Ruby objects into YAML and JSON formats are explained below.

Converting Ruby Objects to YAML format

YAML[4] format is a human friendly data serialization standard for all programming languages. YAML (YAML Ain't Markup Language) is perhaps the most common form of serialization in Ruby applications. It is used for configuration files in Rails and other projects, and is nearly ubiquitous. YAML is a plaintext format, as opposed to Marshal's[5] binary format. Objects stored as YAML are completely transparent and editable with nothing more than a text editor. It also has a simple, spartan syntax that's easy to look at and easy to type. It is not encumbered by excessive wordage and symbols seen in XML. In order to use it in Ruby, the yaml.rb file is required to be loaded which provides methods for converting objects into yaml format and creating .yml files.

Examples of serialization using YAML:

    #Serialization using YAML's to_yaml method
 
  require "yaml"
     class First
     def initialize(name, age, country)
	@name = name
	@age = age
	@country=country
     end
      def to_s
	"In First:\n#{@name}, #{@age}, #{@country}\n"
     end
   end
 
  x = First.new("Tom", 25, "USA")
  puts x
  puts x.to_yaml


Output:

  In First:
  Tom, 25, USA
  --- !ruby/object:First
  name: Tom
  age: 25
  country: USA

The above code displays the object x, first as a string and then in the yaml format.

Saving YAML data into a file:

 # Serialization using YAML::dump

require 'yaml'

f = File.open( 'first.yml', 'w' )
YAML.dump( ["Tom", 25, "USA"], f )
f.close
		
File.open( 'first.yml' ){ |f|	
    $arr= YAML.load(f)
}	

p( $arr )


Output:

  ["Tom", 25, "USA"]


The dump function can be used to serialize the data and save it into a file in YAML format. As shown in the above example the data in YAML format can be de-serialized using the load function.

YAML libraries also provides an option of selecting only those variables of the object that are needed to be serialized. This is done using the to_yaml_properties method as shown in the below example.


   #Custom Serialization using YAML
 
  require "yaml"
     class First
     def initialize(name, age, country)
	@name = name
	@age = age
	@country=country
     end
     def to_s
	"In First:\n#{@name}, #{@age}, #{@country}\n"
     end
      def to_yaml_properties
	   ["@name","@age"]  #@country will not be serialized
     end
   end
 
  x = First.new("Tom", 25, "USA")
  puts x
  puts x.to_yaml
  

Output:

  In First:
  Tom, 25, USA
  --- !ruby/object:First
  name: Tom
  age: 25


Converting Ruby Objects to JSON format:

JSON[6] is a light-weight data interchange format. JSON is typically generated by web applications and can be quite daunting, with deep hierarchies that are difficult to navigate. Any Ruby object can easily be serialized into JSON format. Ruby 1.8.7 distribution is not bundled with json gem. However, in Ruby 1.9.2, the json gem is bundled with the core Ruby distribution. The JSON library can be installed using Ruby Gems[7] as shown below:

# gem install json

A JSON string for serialization can be created by using the JSON.generate method:

       require 'json'
       my_hash = {:Welcome => "Ruby"}
       puts JSON.generate(my_hash) => "{\"WELCOME\":\"RUBY\"}"

Output:

{"{\"Welcome\":\"Ruby\"}"=>"{\"WELCOME\":\"RUBY\"}"}

A JSON string received from another program can be parsed by using JSON.parse Ruby thus converts String to Hash.

       require 'json'
       my_hash = JSON.parse('{"Welcome": "Ruby"}')
       puts my_hash["Welcome"] => "Ruby"

Converting Ruby Objects to Binary Formats

Binary Serialization is another form of serialization in Ruby which is not in human readable form. Binary Serialization is used when high performance serialization and de-serialization process is required and when the contents are not required to be in readable format. Binary Serialization is done using Marshal which is built into Ruby and the code for it is written in Ruby's Marshal module(marshal.c) and thus no additional files are required in order to use it. The _dump and _load methods defined in marshal are used for serialization. Using marshal module the following types of objects can not be serialized: bindings, procedure objects, singleton objects, instances of IO objects and interfaces.

Since the Binary Serialized data is not in human readable form, the following guidelines need to be followed.

    1.Use print instead of puts[8] when serialized objects are written to a file in order to avoid new line characters to be written 
      in the file.
    
    2.Use a record separator in order to differentiate between two objects.


Binary Serialization Example:

   class Animal
     def initialize  name, age
     @name = name
     @age=age
   end
  end
  class Cat < Animal
   def to_s
    "In Cat C: #{@name} \t #{@age}"
   end
  end
 c = Cat.new("Kitty Kat",5)
 puts "Before Serialization"
 puts c
 #puts d
 serialize_cat= Marshal.dump(c) #dumps the serialized cat object into serialize_cat
 puts "\nAfter Serialization:\n #{serialize_cat}"
 deserialize_cat= Marshal::load(serialize_cat) #deserializes the cat object and loads it back into deserialize_cat
 puts "\nAfter Deserialization\n #{deserialize_cat}"

Output:

  Before Serialization
  In Cat C: Kitty Kat 	 5
  After Serialization:
  oCat:
  @nameI"Kitty Kat:ET:	@agei
  After Deserialization
  In Cat C: Kitty Kat 	 5

Similar to YAML, Marshal can also be used to dump data into a file. The above example showing the serialization using YAML::dump can be written using marshal as shown below.

   f = File.open( 'first.yml', 'w' )
   Marshal.dump( ["Tom", 25, "USA"], f )
   f.close	
   File.open( 'first.yml' ){ |f|	
   $arr= Marshal.load(f)
   }	
   p( $arr )

Output:

  ["Tom", 25, "USA"]

Notice that in the above example there is no "require" statement as opposed to the earlier example of writing serialized data into files using YAML. This is because unlike YAML, marshal is built-in Ruby and no external libraries is required in order to use its functionality.

Marshal can also be used to custom serialize an object i.e. it provides an option to omit the variables of an object that are not required in the serialized data. The following program shows the use of marshal_dump method for achieving custom serialization.

     class First
     def initialize(name, age, country)
        @name = name
        @age = age
        @country=country
     end
     def to_s
           "In First: #{@name}, #{@age}, #{@country}"
     end
     def marshal_dump
           [@name,@age]  #@country will not be serialized
     end
     def marshal_load(data)
      @name=data[0]
      @age=data[1]
      @country="United States of America"
     end
     end

     x = First.new("Tom", 25, "USA")
     puts x
 
     marshal_data = Marshal.dump( x ) 
     y = Marshal.load( marshal_data ) 
     p( y.to_s )


Output:

   In First: Tom, 25, USA
   "In First: Tom, 25, United States of America"

Serialization Performance in Ruby

The different serialization formats described above differ in the efficiency at which they can serialize and deserialize data and thus while serializing large amount of data their efficiency is taken into account. The .report[9] method in Ruby can be used to evaluate the performance of these serialization patterns. Marshal, as it handles data in binary format, is the most efficient form of serialization in Ruby. The following example compares the performance of marshal, JSON and YAML format.

  require 'benchmark'
  require 'rubygems'
  require 'json'
  require 'yaml'
  include Benchmark
  class First
    def initialize(name, age, country)
     @name=name
     @age=age
     @country=country
    end
  end  
   x = First.new("Tom", 25, "USA")
  benchmark do |t|
   print "Marshal:"
   t.report{1000.times do; Marshal.load(Marshal.dump(x));end}
   print "JSON:"
   t.report{1000.times do; JSON.load(JSON.dump(x));end}
   print "YAML:"
   t.report{1000.times do; YAML.load(YAML.dump(x));end}
  end

Output:

  Marshal:   0.020000   0.000000   0.020000 (  0.015473)
  JSON:   0.020000   0.000000   0.020000 (  0.023934)
  YAML:   0.460000   0.010000   0.470000 (  0.476826)

The output of the above example shows that, given large chunks of data, marshal is comparatively more efficient format for serialization in Ruby.

Serialization in OOLS Languages: Comparison

Sl.No Ruby Java .Net Framework C++
1 Ruby provides a built in module called Marshal for serialization Java uses an Interface named Serializable interface for classes to implement .Net provides a Serializable Attribute Although, there is no built in support for serialization in C++, it can be achieved by using Boost libraries[10]
2 The built in module of Ruby (Marshal) does not support platform independence, however, it can be achieved by using external libraries like YAML and JSON Similarly, Java's built in serialization is also not platform independent and in order to use serialization in Java across Ruby platform, jruby library should be used. .Net used Remoting technology to make it platform independent. Serialization using the Boost libraries is not platform independent.
3 YAML provides a method (to_yaml_properties) which can be used to select the variables who's value is need to be serialized. With Marshal, we need to write a method named marshal_dump defining the variables of an object that has to be serialized. Provides an option for serializing only the required attributes to be serialized for an object. Use the keyword Transient to ignore certain data that doesn’t need to be serialized XML Serializer sets XmlIgnoreProperty to true to ignore the default serialization of a field or a property Serialization using the Boost libraries is custom and thus the user can specify the part of the objects to be serialized.
4 Bindings, procedure objects, singleton objects, instances of IO objects and interfaces can not be serialized. Serializing these objects throws TypeError exceptions. Thread, OutputStream and its subclasses, and Socket are not serializable in Java.[11] Objects like DataRow are non serializable in .NET. Whether an object is serializable or not can be checked using the Type.IsSerializable There are no such objects in C++.

See Also

1. Serialization in Ruby JSON

2. Serialization in Rails

3. Article on Rails Serialization

4. Protocol Buffers

5. Ruby Protocol Buffers Gem

6. Avro

7. Ruby Avro Gem

References

1. Serilization in General

2. Marshaling

3. Rails Cookie Handling

4. YAML

5. Marshal module

6. JSON

7. Ruby Gems

8. Ruby ARGF Documentation

9. Ruby Documentation for Benchmark Gem

10. Boost Libraries

11. Java Serialization

12. JSON Tutorial

13. Serializing and De-serializing in Ruby

14. Serialization and De-serialization

15. XML Serialization

16. Java Serialization Tutorial

17. The Book of Ruby

18. Serialization in .Net

19. Serialization in Ruby YAML

20. Installing JSON gem

21. Serialization in Ruby JSON