CSC/ECE 517 Spring 2015/ch1a 9 RA
Lazy Enumerators
Introduction
In Ruby 1.8 and Ruby 1.9 the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using Ruby 2.0 or later, you have this support built in. If you call Enumerator#lazy on any Ruby enumerator, you get back an instance of class Enumerator::Lazy. Many basic class like Array and Hash has lazy method through Enumerator class.
Methods
Enumerable::Lazy overrides the following methods as 'lazy' version.
- Methods which transform the element:
- map(collect),
- flat_map(collect_concat),
- zip
- Methods which filters the element:
- select(find_all)
- grep
- take_while
- reject
- drop
- drop_while
- These lazy methods returns an instance of Enumerable::Lazy, where the 'normal' version returns an Array (and goes into infinite loop). It means that calling these methods does not generate any result. Actual values are generated when you call the methods like these on Enumerable::Lazy.
- take
- first
- find(detect)
- Enumerable::Lazy does not override these methods:
- chunk, cycle, slicebefore, each_cons, each_entry, each_slice,each_with_index, each_with_object.
- Because they return an Enumerator.
- any?, include?, take. detect, find_index, first.
- Because they never cause an infinite loop, because they need only finite number of elements
- inject, count, all?, none?, one?, max, max_by, min, min_by, minmax, minmax_by.
- Because they need all elements to define the return value, and they are smart enough not to store all the element on the memory.
- entries, group_by, partition, sort, sort_by
- Because of the size of their return value is linear to the input size.
- reverse_each
- Because it needs the last element first.
- chunk, cycle, slicebefore, each_cons, each_entry, each_slice,each_with_index, each_with_object.
How a lazy enumerator works
A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is.
A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator.<ref>33333333333333</ref> Consider the example below to create an Enumerator:
even_natural_numbers = Enumerator.new do |yielder| number = 0 loop do number += 2 yielder.yield number end end
Here a block with an yielder is passed to the constructor of the enumerator. Everytime the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again.
To understand how the block is executed on each call of “next”, the above code can be modified as follows:
even_natural_numbers = Enumerator.new do |yielder| number = 0 loop do number += 2 yielder.yield number puts "continuing loop" end end puts even_natural_numbers.next puts even_natural_numbers.next Output: 2 continuing loop 4
On the first “next” call to ‘even_natural_numbers’, the value of number is 2 and it’s passed to the yielder. During the second “next” call, the execution of the generator block continues from the line ‘puts “continuing loop”’ and the blocks yields again with the value 4.
Printing the above even_natural_numbers object gives the output:
#<Enumerator: #<Enumerator::Generator:0x007fbb040b2e28>:each>
This shows that the enumerator object is created with a reference to an internal object called Enumerator::Generator and the “each” method is setup to call that generator.
In the above example, the enumerator acts as a lazy generator i.e. generating new numbers of the series only on demand. By cascading multiple such lazy enumerators, each enumerator can be made to act as a data generator as well as a consumer. Consider an example where the first 10 even numbers which are also multiples of 5 are to be generated:
If the “select” method is used on the enumerator “even_natural_numbers” as follows, an infinite loop is produced:
puts even_natural_numbers.select {|v| v%5==0}.first(10)
The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn’t pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated. The code given below adds a lazy version of the “select” method to the existing “Enumerator” class:
class Enumerator def lazy_select(&block) Enumerator.new do |yielder| self.each do |val| yielder.yield(val) if block.call(val) end end end end
The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. Using the lazy_select method on even_natural_numbers won't give an infinte loop anymore.
puts even_natural_numbers.lazy_select {|v| v%5==0}.first(5)
The above line of code gives the output as: 10 20 30 40 50
The flow of numbers in the above chain of lazy methods can be illustrated as below:
Differences
If you try to iterate over an infinite range in Enumerator then it will run into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need.<ref>http://patshaughnessy.net/2013/4/3/ruby-2-0-works-hard-so-you-can-be-lazy</ref>
> range = 1..Float::INFINITY > range.collect { |n| n**2}.first(10) #endless loop
but with lazy,
> range = 1..Float::INFINITY > range.lazy.collect { |n| n**2}.first(10) => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Enumerator::Lazy acts just like the original, but it re-implements methods such as select and map so that they can work with infinite sequences. The lazy versions of the various methods do not return arrays of data. Instead, it returns a new enumerator that includes its own special processing—below example of map method returns an enumerator.<ref>http://patshaughnessy.net/2013/4/3/ruby-2-0-works-hard-so-you-can-be-lazy</ref>
> range = 1..100 > range.map{|n| n**2}.take(10) #returns array => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
but with lazy,
> range.lazy.map{|n| n**2}.take(10) #returns enumerator =>#<Enumerator::Lazy:#<Enumerator::Lazy:#<Enumerator::Lazy:1..100>:map>:take(10)>
Examples
Example 1
require 'date' require ‘benchmark’ Benchmark.bm(8) do |x| x.report("Non Lazy :") do puts (Date.new(2015)..Date.new(9999)).select{|d| d.day == 13 and d.friday?}.first(10) end x.report("Lazy :") do puts (Date.new(2015)..Date.new(9999)).lazy.select{|d| d.day == 13 and d.friday?}.first(10) end end
Example 2
This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way
require 'benchmark' fileName = "C:\\sample.txt" Benchmark.bm(8) do |x| x.report("Non Lazy :") do 50.times { File.readlines(fileName).detect { |line| line =~ /110216/i} } end x.report("Lazy :") do 50.times { File.open(fileName).lazy.detect { |line| line =~ /110216/i} } end end
If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation.
Drawbacks
Example
We have seen usage and benefits of lazy enumeration in above examples but below simple example will give you drawback of it.
require 'benchmark' Benchmark.bm(8) do |x| x.report("Non Lazy :") do (1..10000000).select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+) end x.report("Lazy :") do (1..10000000).lazy.select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+) end end
Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.<ref>https://bugs.ruby-lang.org/issues/6183</ref>
References
<references/>
See Also
Programming Ruby 1.9 & 2.0 4th Edition