CSC/ECE 517 Spring 2015/ch1a 9 RA: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
(Created page with "Lazy Enumerators")
 
 
(81 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Lazy Enumerators
<font size="5"><b>Lazy Enumerators</b></font>
 
=='''Introduction'''==
In [http://en.wikipedia.org/wiki/Ruby_%28programming_language%29#Ruby_1.8 Ruby 1.8] and [http://en.wikipedia.org/wiki/Ruby_%28programming_language%29#Ruby_1.9 Ruby 1.9] the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using [http://en.wikipedia.org/wiki/Ruby_(programming_language)#Ruby_2.0 Ruby 2.0] or later, you have this support built in. When you call lazy a lot happens inside of Ruby to give you just the values you need, Ruby automatically creates and uses many different types of internal Ruby objects.
 
If you call lazy Enumerator on any Ruby Enumerator, you get back an instance of class [http://www.ruby-doc.org/core-2.0.0/Enumerator/Lazy.html Enumerator::Lazy]. Many basic class like Array and Hash has lazy method through Enumerator class.<ref>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/1937785491</ref>'lazy' version of some methods such as map and select never returns an array. Instead, they yield the transformed/filtered elements one by one. So you can use them for -
*Huge data which cannot be processed at a time,
*Data stream which you want to process in real-time,
*Infinite list, which has no end.
 
 
__TOC__
 
=='''How a lazy enumerator works'''==
A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is.
 
A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator.Consider the example below to create an Enumerator:
 
<pre>
even_natural_numbers = Enumerator.new do |yielder|
number = 0
loop do
number += 2
yielder.yield number
end
end
</pre>
 
Here a block with an yielder is passed to the constructor of the enumerator. Every time the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again.
 
[[File:Enumerator-constructor.png‎]]
 
To understand how the block is executed on each call of “next”, the above code can be modified as follows:
<pre>
 
even_natural_numbers = Enumerator.new do |yielder|
number = 0
loop do
  number += 2
  yielder.yield number
  puts "continuing loop"
end
end
 
puts even_natural_numbers.next
puts even_natural_numbers.next
 
Output:
2
continuing loop
4
 
</pre>
 
On the first “next” call to ‘even_natural_numbers’, the value of number is 2 and it’s passed to the yielder. During the second “next” call, the execution of the generator block continues from the line ‘puts “continuing loop”’ and the blocks yields again with the value 4.
 
Printing the above even_natural_numbers object gives the output:
#<Enumerator: #<Enumerator::Generator:0x007fbb040b2e28>:each>
 
This shows that the enumerator object is created with a reference to an internal object called Enumerator::Generator and the “each” method is setup to call that generator.
 
In the above example, the enumerator acts as a lazy generator i.e. generating new numbers of the series only on demand. By cascading multiple such lazy enumerators, each enumerator can be made to act as a data generator as well as a consumer. Consider an example where the first 10 even numbers which are also multiples of 5 are to be generated:
 
If the “select” method is used on the enumerator “even_natural_numbers” as follows, an infinite loop is produced:
 
<pre>
puts even_natural_numbers.select {|v| v%5==0}.first(10)
</pre>
 
The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn't pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated.<ref>http://www.michaelharrison.ws/weblog/?p=163</ref> The code given below adds a lazy version of the “select” method to the existing “Enumerator” class:
 
<pre>
class Enumerator
def lazy_select(&block)
  Enumerator.new do |yielder|
    self.each do |val|
      yielder.yield(val) if block.call(val)
    end
  end
end
end
</pre>
 
The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. Using the lazy_select method on even_natural_numbers won't give an infinite loop anymore.
 
<pre>
puts even_natural_numbers.lazy_select {|v| v%5==0}.first(5)
</pre>
 
The above line of code gives the output as:
10
20
30
40
50
 
The flow of numbers in the above chain of lazy methods can be illustrated as below:
[[File:Lazy-select.png‎]]
 
=='''Differences '''==
If you try to iterate over an infinite range in Enumerator then it will run  into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need.
 
<pre>
 
> range = 1..Float::INFINITY
> range.collect { |n| n**2}.first(10) #endless loop 
 
</pre>
 
but with lazy,
 
<pre>
 
> range = 1..Float::INFINITY
> range.lazy.collect { |n| n**2}.first(10)
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
 
</pre>
 
Enumerator::Lazy acts just like the original, but it re-implements methods such as select and map so that they can work with infinite sequences. The lazy versions of the various methods do not return arrays of data. Instead, it returns a new enumerator that includes its own special processing—below example of map method returns an enumerator.<ref>http://patshaughnessy.net/2013/4/3/ruby-2-0-works-hard-so-you-can-be-lazy</ref>
 
<pre>
 
> range = 1..100
> range.map{|n| n**2}.take(10) #returns array
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
 
</pre>
 
[[File:Enumerator map.png ]]
 
but with lazy,
 
<pre>
 
> range.lazy.map{|n| n**2}.take(10) #returns enumerator
=>#<Enumerator::Lazy:#<Enumerator::Lazy:#<Enumerator::Lazy:1..100>:map>:take(10)>
 
</pre>
 
=='''Examples '''==
 
===Example 1===
Enumerator::lazy manipulate infinite lists wisely. Below example shows without '.lazy', map tries to evaluate an array till last value and then gives first 10 values So, it takes much longer time to evaluate. But, with '.Lazy' method instead of evaluating full length array it evaluates only first 10 values. Thus, it gives answer much quicker than without '.Lazy' method.
 
<pre>
 
require 'date'
require ‘benchmark’
Benchmark.bm(8) do |x|
x.report("Non Lazy :") do
puts (Date.new(2015)..Date.new(9999)).select{|d| d.day == 13 and d.friday?}.first(10)
end
x.report("Lazy :") do
puts (Date.new(2015)..Date.new(9999)).lazy.select{|d| d.day == 13 and d.friday?}.first(10)
end
end
 
</pre>
 
===Example 2===
 
This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way,
 
<pre>
 
require 'benchmark'
fileName = "C:\\sample.txt"
Benchmark.bm(8) do |x|
x.report("Non Lazy :") do
50.times { File.readlines(fileName).detect { |line| line =~ /110216/i} }
end
x.report("Lazy :") do
50.times { File.open(fileName).lazy.detect { |line| line =~ /110216/i} }
end
end
 
</pre>
 
If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation.
 
=='''Drawbacks'''==
 
===Example===
We have seen usage and benefits of lazy enumeration in above examples but below simple example will give you drawback of it.
 
<pre>
 
require 'benchmark'
Benchmark.bm(8) do |x|
x.report("Non Lazy :") do
(1..10000000).select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+)
end
x.report("Lazy :") do
(1..10000000).lazy.select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+)
end
end
 
</pre>
 
Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.<ref>https://bugs.ruby-lang.org/issues/6183</ref>
 
=='''[http://www.ruby-doc.org/core-2.0.0/Enumerator/Lazy.html Methods]'''==
Enumerator::Lazy overrides the following methods as 'lazy' version.
# Methods which transform the element:
##map(collect),
##flat_map(collect_concat),
##zip
# Methods which filters the element:
##select(find_all)
##grep
##take_while
##reject
##drop
##drop_while
# These lazy methods returns an instance of Enumerator::Lazy, where the 'normal' version returns an Array (and goes into infinite loop). It means that calling these methods does not generate any result. Actual values are generated when you call the methods like these on Enumerable::Lazy.
##take
##first
##find(detect)
#Enumerator::Lazy does not override these methods:
##chunk, cycle, slicebefore, each_cons, each_entry, each_slice,each_with_index, each_with_object.
##:Because they return an Enumerator.
##any?, include?, take. detect, find_index, first.
##:Because they never cause an infinite loop, because they need only finite number of elements
##inject, count, all?, none?, one?, max, max_by, min, min_by, minmax, minmax_by.
##:Because they need all elements to define the return value, and they are smart enough not to store all the element on the memory.
##entries, group_by, partition, sort, sort_by
##:Because of the size of their return value is linear to the input size.
##reverse_each
##:Because it needs the last element first.
 
=='''References'''==
<references/>

Latest revision as of 16:48, 16 February 2015

Lazy Enumerators

Introduction

In Ruby 1.8 and Ruby 1.9 the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using Ruby 2.0 or later, you have this support built in. When you call lazy a lot happens inside of Ruby to give you just the values you need, Ruby automatically creates and uses many different types of internal Ruby objects.

If you call lazy Enumerator on any Ruby Enumerator, you get back an instance of class Enumerator::Lazy. Many basic class like Array and Hash has lazy method through Enumerator class.<ref>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/1937785491</ref>'lazy' version of some methods such as map and select never returns an array. Instead, they yield the transformed/filtered elements one by one. So you can use them for -

  • Huge data which cannot be processed at a time,
  • Data stream which you want to process in real-time,
  • Infinite list, which has no end.


How a lazy enumerator works

A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is.

A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator.Consider the example below to create an Enumerator:

even_natural_numbers = Enumerator.new do |yielder|
	number = 0
	loop do
		number += 2
		yielder.yield number
	end
end

Here a block with an yielder is passed to the constructor of the enumerator. Every time the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again.

To understand how the block is executed on each call of “next”, the above code can be modified as follows:


even_natural_numbers = Enumerator.new do |yielder|
 number = 0
 loop do
   number += 2
   yielder.yield number
   puts "continuing loop"
 end
end

puts even_natural_numbers.next
puts even_natural_numbers.next

Output:
2
continuing loop
4

On the first “next” call to ‘even_natural_numbers’, the value of number is 2 and it’s passed to the yielder. During the second “next” call, the execution of the generator block continues from the line ‘puts “continuing loop”’ and the blocks yields again with the value 4.

Printing the above even_natural_numbers object gives the output:

#<Enumerator: #<Enumerator::Generator:0x007fbb040b2e28>:each>

This shows that the enumerator object is created with a reference to an internal object called Enumerator::Generator and the “each” method is setup to call that generator.

In the above example, the enumerator acts as a lazy generator i.e. generating new numbers of the series only on demand. By cascading multiple such lazy enumerators, each enumerator can be made to act as a data generator as well as a consumer. Consider an example where the first 10 even numbers which are also multiples of 5 are to be generated:

If the “select” method is used on the enumerator “even_natural_numbers” as follows, an infinite loop is produced:

puts even_natural_numbers.select {|v| v%5==0}.first(10)

The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn't pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated.<ref>http://www.michaelharrison.ws/weblog/?p=163</ref> The code given below adds a lazy version of the “select” method to the existing “Enumerator” class:

class Enumerator
 def lazy_select(&block)
   Enumerator.new do |yielder|
     self.each do |val|
       yielder.yield(val) if block.call(val)
     end
   end
 end
end

The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. Using the lazy_select method on even_natural_numbers won't give an infinite loop anymore.

puts even_natural_numbers.lazy_select {|v| v%5==0}.first(5)

The above line of code gives the output as: 10 20 30 40 50

The flow of numbers in the above chain of lazy methods can be illustrated as below:

Differences

If you try to iterate over an infinite range in Enumerator then it will run into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need.


> range = 1..Float::INFINITY
> range.collect { |n| n**2}.first(10) #endless loop  

but with lazy,


> range = 1..Float::INFINITY
> range.lazy.collect { |n| n**2}.first(10)
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Enumerator::Lazy acts just like the original, but it re-implements methods such as select and map so that they can work with infinite sequences. The lazy versions of the various methods do not return arrays of data. Instead, it returns a new enumerator that includes its own special processing—below example of map method returns an enumerator.<ref>http://patshaughnessy.net/2013/4/3/ruby-2-0-works-hard-so-you-can-be-lazy</ref>


> range = 1..100
> range.map{|n| n**2}.take(10) #returns array
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

but with lazy,


> range.lazy.map{|n| n**2}.take(10) #returns enumerator
=>#<Enumerator::Lazy:#<Enumerator::Lazy:#<Enumerator::Lazy:1..100>:map>:take(10)>

Examples

Example 1

Enumerator::lazy manipulate infinite lists wisely. Below example shows without '.lazy', map tries to evaluate an array till last value and then gives first 10 values So, it takes much longer time to evaluate. But, with '.Lazy' method instead of evaluating full length array it evaluates only first 10 values. Thus, it gives answer much quicker than without '.Lazy' method.


require 'date'
require ‘benchmark’
Benchmark.bm(8) do |x|
	x.report("Non Lazy :") do
		puts (Date.new(2015)..Date.new(9999)).select{|d| d.day == 13 and d.friday?}.first(10)
	end
	x.report("Lazy :") do
		puts (Date.new(2015)..Date.new(9999)).lazy.select{|d| d.day == 13 and d.friday?}.first(10)
	end
end

Example 2

This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way,


require 'benchmark'
fileName = "C:\\sample.txt"
Benchmark.bm(8) do |x|
		x.report("Non Lazy :") do
		50.times { File.readlines(fileName).detect { |line| line =~ /110216/i} }
		end
	x.report("Lazy :") do
			50.times { File.open(fileName).lazy.detect { |line| line =~ /110216/i} }
		end
end

If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation.

Drawbacks

Example

We have seen usage and benefits of lazy enumeration in above examples but below simple example will give you drawback of it.


require 'benchmark'
Benchmark.bm(8) do |x|
	x.report("Non Lazy :") do
		(1..10000000).select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+)
	end
	x.report("Lazy :") do
		(1..10000000).lazy.select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+)
	end
end

Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.<ref>https://bugs.ruby-lang.org/issues/6183</ref>

Methods

Enumerator::Lazy overrides the following methods as 'lazy' version.

  1. Methods which transform the element:
    1. map(collect),
    2. flat_map(collect_concat),
    3. zip
  2. Methods which filters the element:
    1. select(find_all)
    2. grep
    3. take_while
    4. reject
    5. drop
    6. drop_while
  3. These lazy methods returns an instance of Enumerator::Lazy, where the 'normal' version returns an Array (and goes into infinite loop). It means that calling these methods does not generate any result. Actual values are generated when you call the methods like these on Enumerable::Lazy.
    1. take
    2. first
    3. find(detect)
  4. Enumerator::Lazy does not override these methods:
    1. chunk, cycle, slicebefore, each_cons, each_entry, each_slice,each_with_index, each_with_object.
      Because they return an Enumerator.
    2. any?, include?, take. detect, find_index, first.
      Because they never cause an infinite loop, because they need only finite number of elements
    3. inject, count, all?, none?, one?, max, max_by, min, min_by, minmax, minmax_by.
      Because they need all elements to define the return value, and they are smart enough not to store all the element on the memory.
    4. entries, group_by, partition, sort, sort_by
      Because of the size of their return value is linear to the input size.
    5. reverse_each
      Because it needs the last element first.

References

<references/>