CSC/ECE 517 Spring 2015/ch1a 9 RA: Difference between revisions
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
__TOC__ | __TOC__ | ||
=='''How a lazy enumerator works'''== | |||
A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is. | |||
A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator. Consider the example below to create an Enumerator: | |||
<pre> | |||
even_natural_numbers = Enumerator.new do |yielder| | |||
number = 0 | |||
loop do | |||
number += 2 | |||
yielder.yield number | |||
end | |||
end | |||
</pre> | |||
Here a block with an yielder is passed to the constructor of the enumerator. Everytime the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again. To understand how the block is executed on each call of “next”, the above code can be modified as follows: | |||
<pre> | |||
even_natural_numbers = Enumerator.new do |yielder| | |||
number = 0 | |||
loop do | |||
number += 2 | |||
yielder.yield number | |||
puts "continuing loop" | |||
end | |||
end | |||
puts even_natural_numbers.next | |||
puts even_natural_numbers.next | |||
Output: | |||
2 | |||
continuing loop | |||
4 | |||
</pre> | |||
On the first “next” call to ‘even_natural_numbers’, the value of number is 2 and it’s passed to the yielder. During the second “next” call, the execution of the generator block continues from the line ‘puts “continuing loop”’ and the blocks yields again with the value 4. | |||
Printing the above even_natural_numbers object gives the output: | |||
#<Enumerator: #<Enumerator::Generator:0x007fbb040b2e28>:each> | |||
This shows that the enumerator object is created with a reference to an internal object called Enumerator::Generator and the “each” method is setup to call that generator. | |||
In the above example, the enumerator acts as a lazy generator i.e. generating new numbers of the series only on demand. By cascading multiple such lazy enumerators, each enumerator can be made to act as a data generator as well as a consumer. Consider an example where the first 10 even numbers which are also multiples of 5 are to be generated: | |||
If the “select” method is used on the enumerator “even_natural_numbers” as follows, an infinite loop is produced: | |||
<pre> | |||
puts even_natural_numbers.select {|v| v%5==0}.first(10) | |||
</pre> | |||
The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn’t pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated. The code given below adds a lazy version of the “select” method to the existing “Enumerator” class: | |||
<pre> | |||
class Enumerator | |||
def lazy_select(&block) | |||
Enumerator.new do |yielder| | |||
self.each do |val| | |||
yielder.yield(val) if block.call(val) | |||
end | |||
end | |||
end | |||
end | |||
</pre> | |||
The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. | |||
=='''Differences '''== | =='''Differences '''== | ||
If you try to iterate over an infinite range in Enumerator then it will run into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need. | If you try to iterate over an infinite range in Enumerator then it will run into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need. |
Revision as of 23:42, 7 February 2015
Lazy Enumerators
Introduction
In Ruby 1.8 and Ruby 1.9 the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using Ruby 2.0 or later, you have this support built in. If you call Enumerator#lazy on any Ruby enumerator, you get back an instance of class Enumerator::Lazy. Many basic class like Array and Hash has lazy method through Enumerator class.
The lazy methods returns a lazy enumerator whose methods Map/collect, flat_map/ collect_caoncat, select/ find_all, reject, grep, zip, take, #take_while, drop, #drop_while, and cycle enumerate values only on as-needed basis.
How a lazy enumerator works
A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is.
A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator. Consider the example below to create an Enumerator:
even_natural_numbers = Enumerator.new do |yielder| number = 0 loop do number += 2 yielder.yield number end end
Here a block with an yielder is passed to the constructor of the enumerator. Everytime the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again. To understand how the block is executed on each call of “next”, the above code can be modified as follows:
even_natural_numbers = Enumerator.new do |yielder| number = 0 loop do number += 2 yielder.yield number puts "continuing loop" end end puts even_natural_numbers.next puts even_natural_numbers.next Output: 2 continuing loop 4
On the first “next” call to ‘even_natural_numbers’, the value of number is 2 and it’s passed to the yielder. During the second “next” call, the execution of the generator block continues from the line ‘puts “continuing loop”’ and the blocks yields again with the value 4.
Printing the above even_natural_numbers object gives the output:
#<Enumerator: #<Enumerator::Generator:0x007fbb040b2e28>:each>
This shows that the enumerator object is created with a reference to an internal object called Enumerator::Generator and the “each” method is setup to call that generator.
In the above example, the enumerator acts as a lazy generator i.e. generating new numbers of the series only on demand. By cascading multiple such lazy enumerators, each enumerator can be made to act as a data generator as well as a consumer. Consider an example where the first 10 even numbers which are also multiples of 5 are to be generated:
If the “select” method is used on the enumerator “even_natural_numbers” as follows, an infinite loop is produced:
puts even_natural_numbers.select {|v| v%5==0}.first(10)
The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn’t pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated. The code given below adds a lazy version of the “select” method to the existing “Enumerator” class:
class Enumerator def lazy_select(&block) Enumerator.new do |yielder| self.each do |val| yielder.yield(val) if block.call(val) end end end end
The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block.
Differences
If you try to iterate over an infinite range in Enumerator then it will run into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need.
> range = 1..Float::INFINITY > range.collect { |n| n**2}.first(10) #endless loop
but with lazy,
> range = 1..Float::INFINITY > range.lazy.collect { |n| n**2}.first(10) => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Enumerator::Lazy acts just like the original, but it re-implements methods such as select and map so that they can work with infinite sequences. The lazy versions of the various methods do not return arrays of data. Instead, it returns a new enumerator that includes its own special processing—below example of map method returns an enumerator.
> range = 1..100 > range.map{|n| n**2}.take(10) #returns array => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
but with lazy,
> range.lazy.map{|n| n**2}.take(10) #returns enumerator =>#<Enumerator::Lazy:#<Enumerator::Lazy:#<Enumerator::Lazy:1..100>:map>:take(10)>
Examples
1.)
require 'date' require ‘benchmark’ Benchmark.bm(8) do |x| x.report("Non Lazy :") do puts (Date.new(2015)..Date.new(9999)).select{|d| d.day == 13 and d.friday?}.first(10) end x.report("Lazy :") do puts (Date.new(2015)..Date.new(9999)).lazy.select{|d| d.day == 13 and d.friday?}.first(10) end end
2). This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way
require 'benchmark' fileName = "C:\\sample.txt" Benchmark.bm(8) do |x| x.report("Non Lazy :") do 50.times { File.readlines(fileName).detect { |line| line =~ /110216/i} } end x.report("Lazy :") do 50.times { File.open(fileName).lazy.detect { |line| line =~ /110216/i} } end end
If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation.
3). We have seen usage and benefits of lazy enumeration in above examples but below simple example will give you drawback of it.
require 'benchmark' Benchmark.bm(8) do |x| x.report("Non Lazy :") do (1..10000000).select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+) end x.report("Lazy :") do (1..10000000).lazy.select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+) end end
Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.