CSC/ECE 517 Spring 2015/ch1a 9 RA: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(51 intermediate revisions by 2 users not shown)
Line 2: Line 2:


=='''Introduction'''==
=='''Introduction'''==
In Ruby 1.8 and Ruby 1.9 the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using Ruby 2.0 or later, you have this support built in. If you call Enumerator#lazy on any Ruby enumerator, you get back an instance of class Enumerator::Lazy. Many basic class like Array and Hash has lazy method through Enumerator class.
In [http://en.wikipedia.org/wiki/Ruby_%28programming_language%29#Ruby_1.8 Ruby 1.8] and [http://en.wikipedia.org/wiki/Ruby_%28programming_language%29#Ruby_1.9 Ruby 1.9] the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using [http://en.wikipedia.org/wiki/Ruby_(programming_language)#Ruby_2.0 Ruby 2.0] or later, you have this support built in. When you call lazy a lot happens inside of Ruby to give you just the values you need, Ruby automatically creates and uses many different types of internal Ruby objects.  


Enumerable::Lazy overrides the following methods as 'lazy' version.
If you call lazy Enumerator on any Ruby Enumerator, you get back an instance of class [http://www.ruby-doc.org/core-2.0.0/Enumerator/Lazy.html Enumerator::Lazy]. Many basic class like Array and Hash has lazy method through Enumerator class.<ref>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/1937785491</ref>'lazy' version of some methods such as map and select never returns an array. Instead, they yield the transformed/filtered elements one by one. So you can use them for -
Methods which transform the element: 
*Huge data which cannot be processed at a time,
* map(collect)
*Data stream which you want to process in real-time,
* flat_map(collect_concat)
*Infinite list, which has no end.
* zip
Methods which filters the element:
* select(find_all)
* grep
* take_while
* reject
* drop
* drop_while


These lazy methods returns an instance of Enumerable::Lazy, where the 'normal' version returns an Array (and goes into infinite loop). It means that calling these methods does not generate any result. Actual values are generated when you call the methods like these on Enumerable::Lazy.
* take
* first
* find(detect)
Enumerable::Lazy does not override these methods:
* chunk cycle slicebefore each(cons|entry|slice|with_index|with_object) Because they return an Enumerator.
* any? include? take detect find_index first Because they never cause an infinit loop, because they need only finite number of elements
* inject count all? none? one? max(_by) min(_by) minmax(_by)
* They need all elements to define the return value, and they are smart enough not to store all the element on the memory.
* entries group_by partition sort(_by)
* Size of their return value is linear to the input size.
* reverse_each
* It needs the last element first.


__TOC__
__TOC__
Line 39: Line 15:
A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is.
A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is.


A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator.<ref>33333333333333</ref> Consider the example below to create an Enumerator:
A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator.Consider the example below to create an Enumerator:


<pre>
<pre>
Line 51: Line 27:
</pre>
</pre>


Here a block with an yielder is passed to the constructor of the enumerator. Everytime the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again.  
Here a block with an yielder is passed to the constructor of the enumerator. Every time the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again.  


[[File:Enumerator-generator.png‎]]
[[File:Enumerator-constructor.png‎]]


To understand how the block is executed on each call of “next”, the above code can be modified as follows:
To understand how the block is executed on each call of “next”, the above code can be modified as follows:
Line 92: Line 68:
</pre>
</pre>


The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn’t pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated. The code given below adds a lazy version of the “select” method to the existing “Enumerator” class:
The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn't pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated.<ref>http://www.michaelharrison.ws/weblog/?p=163</ref> The code given below adds a lazy version of the “select” method to the existing “Enumerator” class:


<pre>
<pre>
Line 106: Line 82:
</pre>
</pre>


The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. Using the lazy_select method on even_natural_numbers won't give an infinte loop anymore.  
The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. Using the lazy_select method on even_natural_numbers won't give an infinite loop anymore.  


<pre>
<pre>
Line 123: Line 99:


=='''Differences '''==
=='''Differences '''==
If you try to iterate over an infinite range in Enumerator then it will run  into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need.<ref>http://patshaughnessy.net/2013/4/3/ruby-2-0-works-hard-so-you-can-be-lazy</ref>
If you try to iterate over an infinite range in Enumerator then it will run  into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need.


<pre>
<pre>
Line 166: Line 142:


===Example 1===
===Example 1===
Enumerator::lazy manipulate infinite lists wisely. Below example shows without '.lazy', map tries to evaluate an array till last value and then gives first 10 values So, it takes much longer time to evaluate. But, with '.Lazy' method instead of evaluating full length array it evaluates only first 10 values. Thus, it gives answer much quicker than without '.Lazy' method.


<pre>
<pre>
Line 184: Line 161:
===Example 2===
===Example 2===


This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way
This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way,


<pre>
<pre>
Line 201: Line 178:
</pre>
</pre>


If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation.  
If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation.


=='''Drawbacks'''==
=='''Drawbacks'''==
Line 223: Line 200:


Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.<ref>https://bugs.ruby-lang.org/issues/6183</ref>
Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.<ref>https://bugs.ruby-lang.org/issues/6183</ref>
=='''[http://www.ruby-doc.org/core-2.0.0/Enumerator/Lazy.html Methods]'''==
Enumerator::Lazy overrides the following methods as 'lazy' version.
# Methods which transform the element:
##map(collect),
##flat_map(collect_concat),
##zip
# Methods which filters the element:
##select(find_all)
##grep
##take_while
##reject
##drop
##drop_while
# These lazy methods returns an instance of Enumerator::Lazy, where the 'normal' version returns an Array (and goes into infinite loop). It means that calling these methods does not generate any result. Actual values are generated when you call the methods like these on Enumerable::Lazy.
##take
##first
##find(detect)
#Enumerator::Lazy does not override these methods:
##chunk, cycle, slicebefore, each_cons, each_entry, each_slice,each_with_index, each_with_object.
##:Because they return an Enumerator.
##any?, include?, take. detect, find_index, first.
##:Because they never cause an infinite loop, because they need only finite number of elements
##inject, count, all?, none?, one?, max, max_by, min, min_by, minmax, minmax_by.
##:Because they need all elements to define the return value, and they are smart enough not to store all the element on the memory.
##entries, group_by, partition, sort, sort_by
##:Because of the size of their return value is linear to the input size.
##reverse_each
##:Because it needs the last element first.


=='''References'''==
=='''References'''==
<references/>
<references/>
==See Also ==
[https://github.com/yhara/enumerable-lazy]
Programming Ruby 1.9 & 2.0 4th Edition

Latest revision as of 16:48, 16 February 2015

Lazy Enumerators

Introduction

In Ruby 1.8 and Ruby 1.9 the problem with enumeration that generate infinite sequences is that we have to write special, non-greedy, versions of methods. But, if you’re using Ruby 2.0 or later, you have this support built in. When you call lazy a lot happens inside of Ruby to give you just the values you need, Ruby automatically creates and uses many different types of internal Ruby objects.

If you call lazy Enumerator on any Ruby Enumerator, you get back an instance of class Enumerator::Lazy. Many basic class like Array and Hash has lazy method through Enumerator class.<ref>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/1937785491</ref>'lazy' version of some methods such as map and select never returns an array. Instead, they yield the transformed/filtered elements one by one. So you can use them for -

  • Huge data which cannot be processed at a time,
  • Data stream which you want to process in real-time,
  • Infinite list, which has no end.


How a lazy enumerator works

A lazy enumerator in Ruby works as both a data consumer and a data producer. To understand how it’s done, we have to understand what an Enumerator::Generator and an Enumerator::Yielder is.

A simple way of creating an enumerable series of data in Ruby is to create an array. However if we want more flexibility in handling each element of the series while generating it, we can do so by using an Enumerator::Generator.Consider the example below to create an Enumerator:

even_natural_numbers = Enumerator.new do |yielder|
	number = 0
	loop do
		number += 2
		yielder.yield number
	end
end

Here a block with an yielder is passed to the constructor of the enumerator. Every time the next method of the enumerator is called, the block is executed till the point the yielder is called in the line “yielder.yield number”. The control is passed to the yielder and returned back to the enumerator generator block when the “next” method is called on the enumerator again.

To understand how the block is executed on each call of “next”, the above code can be modified as follows:


even_natural_numbers = Enumerator.new do |yielder|
 number = 0
 loop do
   number += 2
   yielder.yield number
   puts "continuing loop"
 end
end

puts even_natural_numbers.next
puts even_natural_numbers.next

Output:
2
continuing loop
4

On the first “next” call to ‘even_natural_numbers’, the value of number is 2 and it’s passed to the yielder. During the second “next” call, the execution of the generator block continues from the line ‘puts “continuing loop”’ and the blocks yields again with the value 4.

Printing the above even_natural_numbers object gives the output:

#<Enumerator: #<Enumerator::Generator:0x007fbb040b2e28>:each>

This shows that the enumerator object is created with a reference to an internal object called Enumerator::Generator and the “each” method is setup to call that generator.

In the above example, the enumerator acts as a lazy generator i.e. generating new numbers of the series only on demand. By cascading multiple such lazy enumerators, each enumerator can be made to act as a data generator as well as a consumer. Consider an example where the first 10 even numbers which are also multiples of 5 are to be generated:

If the “select” method is used on the enumerator “even_natural_numbers” as follows, an infinite loop is produced:

puts even_natural_numbers.select {|v| v%5==0}.first(10)

The reason behind the infinite loop is that the “select” method tries to iterate over the entire collection and doesn't pass the control to the next method in line unless the collection is exhausted which, in an infinite collection, never happens. Hence it’s necessary to define a lazy version of the select method to do something useful with the infinite range that’s being lazily generated.<ref>http://www.michaelharrison.ws/weblog/?p=163</ref> The code given below adds a lazy version of the “select” method to the existing “Enumerator” class:

class Enumerator
 def lazy_select(&block)
   Enumerator.new do |yielder|
     self.each do |val|
       yielder.yield(val) if block.call(val)
     end
   end
 end
end

The lazy_select method creates an enumerator which yields to the next method in line on selecting each value that satisfies the condition in the block. Using the lazy_select method on even_natural_numbers won't give an infinite loop anymore.

puts even_natural_numbers.lazy_select {|v| v%5==0}.first(5)

The above line of code gives the output as: 10 20 30 40 50

The flow of numbers in the above chain of lazy methods can be illustrated as below:

Differences

If you try to iterate over an infinite range in Enumerator then it will run into an endless loop while with Enumerator::Lazy you can avoid endless loop and get just the value you need.


> range = 1..Float::INFINITY
> range.collect { |n| n**2}.first(10) #endless loop  

but with lazy,


> range = 1..Float::INFINITY
> range.lazy.collect { |n| n**2}.first(10)
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Enumerator::Lazy acts just like the original, but it re-implements methods such as select and map so that they can work with infinite sequences. The lazy versions of the various methods do not return arrays of data. Instead, it returns a new enumerator that includes its own special processing—below example of map method returns an enumerator.<ref>http://patshaughnessy.net/2013/4/3/ruby-2-0-works-hard-so-you-can-be-lazy</ref>


> range = 1..100
> range.map{|n| n**2}.take(10) #returns array
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

but with lazy,


> range.lazy.map{|n| n**2}.take(10) #returns enumerator
=>#<Enumerator::Lazy:#<Enumerator::Lazy:#<Enumerator::Lazy:1..100>:map>:take(10)>

Examples

Example 1

Enumerator::lazy manipulate infinite lists wisely. Below example shows without '.lazy', map tries to evaluate an array till last value and then gives first 10 values So, it takes much longer time to evaluate. But, with '.Lazy' method instead of evaluating full length array it evaluates only first 10 values. Thus, it gives answer much quicker than without '.Lazy' method.


require 'date'
require ‘benchmark’
Benchmark.bm(8) do |x|
	x.report("Non Lazy :") do
		puts (Date.new(2015)..Date.new(9999)).select{|d| d.day == 13 and d.friday?}.first(10)
	end
	x.report("Lazy :") do
		puts (Date.new(2015)..Date.new(9999)).lazy.select{|d| d.day == 13 and d.friday?}.first(10)
	end
end

Example 2

This example give you idea of using lazy enumeration in file read example and comparing with non lazy enumeration way,


require 'benchmark'
fileName = "C:\\sample.txt"
Benchmark.bm(8) do |x|
		x.report("Non Lazy :") do
		50.times { File.readlines(fileName).detect { |line| line =~ /110216/i} }
		end
	x.report("Lazy :") do
			50.times { File.open(fileName).lazy.detect { |line| line =~ /110216/i} }
		end
end

If you deal with large data and start chaining multiple enumeration methods together, the use of lazy evaluation prevents you from using unnecessary amounts of memory in temporary variables which makes lazy evaluation faster than non lazy evaluation.

Drawbacks

Example

We have seen usage and benefits of lazy enumeration in above examples but below simple example will give you drawback of it.


require 'benchmark'
Benchmark.bm(8) do |x|
	x.report("Non Lazy :") do
		(1..10000000).select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+)
	end
	x.report("Lazy :") do
		(1..10000000).lazy.select { |i| i % 3 == 0 || i % 5 == 0}.reduce(:+)
	end
end

Lazy enumeration is act as a proxy in between program and data so sometimes lazy enumeration gives non negligible performance hit. It is recommended not to use lazy enumeration everywhere as it adds another layer.<ref>https://bugs.ruby-lang.org/issues/6183</ref>

Methods

Enumerator::Lazy overrides the following methods as 'lazy' version.

  1. Methods which transform the element:
    1. map(collect),
    2. flat_map(collect_concat),
    3. zip
  2. Methods which filters the element:
    1. select(find_all)
    2. grep
    3. take_while
    4. reject
    5. drop
    6. drop_while
  3. These lazy methods returns an instance of Enumerator::Lazy, where the 'normal' version returns an Array (and goes into infinite loop). It means that calling these methods does not generate any result. Actual values are generated when you call the methods like these on Enumerable::Lazy.
    1. take
    2. first
    3. find(detect)
  4. Enumerator::Lazy does not override these methods:
    1. chunk, cycle, slicebefore, each_cons, each_entry, each_slice,each_with_index, each_with_object.
      Because they return an Enumerator.
    2. any?, include?, take. detect, find_index, first.
      Because they never cause an infinite loop, because they need only finite number of elements
    3. inject, count, all?, none?, one?, max, max_by, min, min_by, minmax, minmax_by.
      Because they need all elements to define the return value, and they are smart enough not to store all the element on the memory.
    4. entries, group_by, partition, sort, sort_by
      Because of the size of their return value is linear to the input size.
    5. reverse_each
      Because it needs the last element first.

References

<references/>