CSC/ECE 517 Spring 2015/ch1a 9 RA: Difference between revisions
(43 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
=='''Introduction'''== | =='''Introduction'''== | ||
In Ruby 1.8 and Ruby 1.9 the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using Ruby 2.0 or later, you have this support built in. | In [http://en.wikipedia.org/wiki/Ruby_%28programming_language%29#Ruby_1.8 Ruby 1.8] and [http://en.wikipedia.org/wiki/Ruby_%28programming_language%29#Ruby_1.9 Ruby 1.9] the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using [http://en.wikipedia.org/wiki/Ruby_(programming_language)#Ruby_2.0 Ruby 2.0] or later, you have this support built in. When you call lazy a lot happens inside of Ruby to give you just the values you need, Ruby automatically creates and uses many different types of internal Ruby objects. | ||
If you call lazy Enumerator on any Ruby Enumerator, you get back an instance of class [http://www.ruby-doc.org/core-2.0.0/Enumerator/Lazy.html Enumerator::Lazy]. Many basic class like Array and Hash has lazy method through Enumerator class.<ref>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/1937785491</ref>'lazy' version of some methods such as map and select never returns an array. Instead, they yield the transformed/filtered elements one by one. So you can use them for - | |||
*Huge data which cannot be processed at a time, | |||
*Data stream which you want to process in real-time, | |||
*Infinite list, which has no end. | |||
__TOC__ | __TOC__ | ||
Line 44: | Line 15: | ||
A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is. | A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is. | ||
A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator. | A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator.Consider the example below to create an Enumerator: | ||
<pre> | <pre> | ||
Line 56: | Line 27: | ||
</pre> | </pre> | ||
Here a block with an yielder is passed to the constructor of the enumerator. | Here a block with an yielder is passed to the constructor of the enumerator. Every time the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again. | ||
[[File:Enumerator- | [[File:Enumerator-constructor.png]] | ||
To understand how the block is executed on each call of “next”, the above code can be modified as follows: | To understand how the block is executed on each call of “next”, the above code can be modified as follows: | ||
Line 97: | Line 68: | ||
</pre> | </pre> | ||
The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and | The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn't pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated.<ref>http://www.michaelharrison.ws/weblog/?p=163</ref> The code given below adds a lazy version of the “select” method to the existing “Enumerator” class: | ||
<pre> | <pre> | ||
Line 111: | Line 82: | ||
</pre> | </pre> | ||
The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. Using the lazy_select method on even_natural_numbers won't give an | The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. Using the lazy_select method on even_natural_numbers won't give an infinite loop anymore. | ||
<pre> | <pre> | ||
Line 128: | Line 99: | ||
=='''Differences '''== | =='''Differences '''== | ||
If you try to iterate over an infinite range in Enumerator then it will run into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need. | If you try to iterate over an infinite range in Enumerator then it will run into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need. | ||
<pre> | <pre> | ||
Line 171: | Line 142: | ||
===Example 1=== | ===Example 1=== | ||
Enumerator::lazy manipulate infinite lists wisely. Below example shows without '.lazy', map tries to evaluate an array till last value and then gives first 10 values So, it takes much longer time to evaluate. But, with '.Lazy' method instead of evaluating full length array it evaluates only first 10 values. Thus, it gives answer much quicker than without '.Lazy' method. | |||
<pre> | <pre> | ||
Line 189: | Line 161: | ||
===Example 2=== | ===Example 2=== | ||
This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way | This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way, | ||
<pre> | <pre> | ||
Line 206: | Line 178: | ||
</pre> | </pre> | ||
If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation. | If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation. | ||
=='''Drawbacks'''== | =='''Drawbacks'''== | ||
Line 228: | Line 200: | ||
Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.<ref>https://bugs.ruby-lang.org/issues/6183</ref> | Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.<ref>https://bugs.ruby-lang.org/issues/6183</ref> | ||
=='''[http://www.ruby-doc.org/core-2.0.0/Enumerator/Lazy.html Methods]'''== | |||
Enumerator::Lazy overrides the following methods as 'lazy' version. | |||
# Methods which transform the element: | |||
##map(collect), | |||
##flat_map(collect_concat), | |||
##zip | |||
# Methods which filters the element: | |||
##select(find_all) | |||
##grep | |||
##take_while | |||
##reject | |||
##drop | |||
##drop_while | |||
# These lazy methods returns an instance of Enumerator::Lazy, where the 'normal' version returns an Array (and goes into infinite loop). It means that calling these methods does not generate any result. Actual values are generated when you call the methods like these on Enumerable::Lazy. | |||
##take | |||
##first | |||
##find(detect) | |||
#Enumerator::Lazy does not override these methods: | |||
##chunk, cycle, slicebefore, each_cons, each_entry, each_slice,each_with_index, each_with_object. | |||
##:Because they return an Enumerator. | |||
##any?, include?, take. detect, find_index, first. | |||
##:Because they never cause an infinite loop, because they need only finite number of elements | |||
##inject, count, all?, none?, one?, max, max_by, min, min_by, minmax, minmax_by. | |||
##:Because they need all elements to define the return value, and they are smart enough not to store all the element on the memory. | |||
##entries, group_by, partition, sort, sort_by | |||
##:Because of the size of their return value is linear to the input size. | |||
##reverse_each | |||
##:Because it needs the last element first. | |||
=='''References'''== | =='''References'''== | ||
<references/> | <references/> | ||
Latest revision as of 16:48, 16 February 2015
Lazy Enumerators
Introduction
In Ruby 1.8 and Ruby 1.9 the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using Ruby 2.0 or later, you have this support built in. When you call lazy a lot happens inside of Ruby to give you just the values you need, Ruby automatically creates and uses many different types of internal Ruby objects.
If you call lazy Enumerator on any Ruby Enumerator, you get back an instance of class Enumerator::Lazy. Many basic class like Array and Hash has lazy method through Enumerator class.<ref>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/1937785491</ref>'lazy' version of some methods such as map and select never returns an array. Instead, they yield the transformed/filtered elements one by one. So you can use them for -
- Huge data which cannot be processed at a time,
- Data stream which you want to process in real-time,
- Infinite list, which has no end.
How a lazy enumerator works
A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is.
A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator.Consider the example below to create an Enumerator:
even_natural_numbers = Enumerator.new do |yielder| number = 0 loop do number += 2 yielder.yield number end end
Here a block with an yielder is passed to the constructor of the enumerator. Every time the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again.
To understand how the block is executed on each call of “next”, the above code can be modified as follows:
even_natural_numbers = Enumerator.new do |yielder| number = 0 loop do number += 2 yielder.yield number puts "continuing loop" end end puts even_natural_numbers.next puts even_natural_numbers.next Output: 2 continuing loop 4
On the first “next” call to ‘even_natural_numbers’, the value of number is 2 and it’s passed to the yielder. During the second “next” call, the execution of the generator block continues from the line ‘puts “continuing loop”’ and the blocks yields again with the value 4.
Printing the above even_natural_numbers object gives the output:
#<Enumerator: #<Enumerator::Generator:0x007fbb040b2e28>:each>
This shows that the enumerator object is created with a reference to an internal object called Enumerator::Generator and the “each” method is setup to call that generator.
In the above example, the enumerator acts as a lazy generator i.e. generating new numbers of the series only on demand. By cascading multiple such lazy enumerators, each enumerator can be made to act as a data generator as well as a consumer. Consider an example where the first 10 even numbers which are also multiples of 5 are to be generated:
If the “select” method is used on the enumerator “even_natural_numbers” as follows, an infinite loop is produced:
puts even_natural_numbers.select {|v| v%5==0}.first(10)
The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn't pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated.<ref>http://www.michaelharrison.ws/weblog/?p=163</ref> The code given below adds a lazy version of the “select” method to the existing “Enumerator” class:
class Enumerator def lazy_select(&block) Enumerator.new do |yielder| self.each do |val| yielder.yield(val) if block.call(val) end end end end
The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. Using the lazy_select method on even_natural_numbers won't give an infinite loop anymore.
puts even_natural_numbers.lazy_select {|v| v%5==0}.first(5)
The above line of code gives the output as: 10 20 30 40 50
The flow of numbers in the above chain of lazy methods can be illustrated as below:
Differences
If you try to iterate over an infinite range in Enumerator then it will run into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need.
> range = 1..Float::INFINITY > range.collect { |n| n**2}.first(10) #endless loop
but with lazy,
> range = 1..Float::INFINITY > range.lazy.collect { |n| n**2}.first(10) => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Enumerator::Lazy acts just like the original, but it re-implements methods such as select and map so that they can work with infinite sequences. The lazy versions of the various methods do not return arrays of data. Instead, it returns a new enumerator that includes its own special processing—below example of map method returns an enumerator.<ref>http://patshaughnessy.net/2013/4/3/ruby-2-0-works-hard-so-you-can-be-lazy</ref>
> range = 1..100 > range.map{|n| n**2}.take(10) #returns array => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
but with lazy,
> range.lazy.map{|n| n**2}.take(10) #returns enumerator =>#<Enumerator::Lazy:#<Enumerator::Lazy:#<Enumerator::Lazy:1..100>:map>:take(10)>
Examples
Example 1
Enumerator::lazy manipulate infinite lists wisely. Below example shows without '.lazy', map tries to evaluate an array till last value and then gives first 10 values So, it takes much longer time to evaluate. But, with '.Lazy' method instead of evaluating full length array it evaluates only first 10 values. Thus, it gives answer much quicker than without '.Lazy' method.
require 'date' require ‘benchmark’ Benchmark.bm(8) do |x| x.report("Non Lazy :") do puts (Date.new(2015)..Date.new(9999)).select{|d| d.day == 13 and d.friday?}.first(10) end x.report("Lazy :") do puts (Date.new(2015)..Date.new(9999)).lazy.select{|d| d.day == 13 and d.friday?}.first(10) end end
Example 2
This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way,
require 'benchmark' fileName = "C:\\sample.txt" Benchmark.bm(8) do |x| x.report("Non Lazy :") do 50.times { File.readlines(fileName).detect { |line| line =~ /110216/i} } end x.report("Lazy :") do 50.times { File.open(fileName).lazy.detect { |line| line =~ /110216/i} } end end
If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation.
Drawbacks
Example
We have seen usage and benefits of lazy enumeration in above examples but below simple example will give you drawback of it.
require 'benchmark' Benchmark.bm(8) do |x| x.report("Non Lazy :") do (1..10000000).select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+) end x.report("Lazy :") do (1..10000000).lazy.select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+) end end
Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.<ref>https://bugs.ruby-lang.org/issues/6183</ref>
Methods
Enumerator::Lazy overrides the following methods as 'lazy' version.
- Methods which transform the element:
- map(collect),
- flat_map(collect_concat),
- zip
- Methods which filters the element:
- select(find_all)
- grep
- take_while
- reject
- drop
- drop_while
- These lazy methods returns an instance of Enumerator::Lazy, where the 'normal' version returns an Array (and goes into infinite loop). It means that calling these methods does not generate any result. Actual values are generated when you call the methods like these on Enumerable::Lazy.
- take
- first
- find(detect)
- Enumerator::Lazy does not override these methods:
- chunk, cycle, slicebefore, each_cons, each_entry, each_slice,each_with_index, each_with_object.
- Because they return an Enumerator.
- any?, include?, take. detect, find_index, first.
- Because they never cause an infinite loop, because they need only finite number of elements
- inject, count, all?, none?, one?, max, max_by, min, min_by, minmax, minmax_by.
- Because they need all elements to define the return value, and they are smart enough not to store all the element on the memory.
- entries, group_by, partition, sort, sort_by
- Because of the size of their return value is linear to the input size.
- reverse_each
- Because it needs the last element first.
- chunk, cycle, slicebefore, each_cons, each_entry, each_slice,each_with_index, each_with_object.
References
<references/>