CSC/ECE 517 Spring 2015/ch1b 18 AS: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(54 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<font size="5"><b>Apache Solr and Rails</b></font>
<font size="5"><b>Apache Solr and Rails</b></font>


The topic write up for this page can be found [https://docs.google.com/document/d/1TgBtp7flIPKJwkkShgtcIkt--mtHuwVHsQX6Tpzj1rc here].
[[File: solr.jpg|right]]
Apache Solr<ref>http://lucene.apache.org/solr/</ref> is a standalone, open-source enterprise search server, written in Java and created by Yonik Seely. It is a <span class="plainlinks"> [https://tomcat.apache.org/tomcat-5.5-doc/servletapi/javax/servlet/Servlet.html servlet]</span> servlet that can run within a servlet container such as <span class="plainlinks">[http://tomcat.apache.org/ Apache Tomcat]</span>. It is a very popular, fast and scaleable open source search platform built on top of <span class="plainlinks">[http://lucene.apache.org/index.html Apache Lucene]</span> search library.


 
Rails is a framework used to develop web based application that incorporates the<span class="plainlinks"> [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controllerRails MVC]</span> architectural pattern. Websites using <span class="plainlinks"> [http://rubyonrails.org/ Rails]</span> can take advantage of the Solr search engine to provide very sophisticated and customizable search features. Rails integrates with Solr search server using Sunspot<ref>https://rubygems.org/gems/sunspot_rails</ref><ref>https://github.com/sunspot/sunspot</ref> gem.
Apache Solr is a standalone enterprise search server with a REST-like API. Indexing could be done using JSON, XML, CSV or binary over Hyper text transfer protocol. It could be then queried using HTTP with a GET method and receive the JSON, XML, CSV or binary results. It is a popular, scalable, blazing-fast, open source enterprise search platform built on <ref>http://lucene.apache.org/index.html</ref>Apache Lucene. Websites using rails can take advantage of the Solr search engine to provide sophisticated and customizable search features.[[File: solr.png|200px|right]]


__TOC__
__TOC__
=='''Introduction'''==
=='''Introduction'''==
Apache Solr is a standalone enterprise search server with a REST-like API. Indexing could be done using JSON, XML, CSV or binary over Hyper text transfer protocol. It could be then queried using HTTP with a GET method and receive the JSON, XML, CSV or binary results. It is a popular, scalable, blazing-fast, open source enterprise search platform built on <ref>http://lucene.apache.org/index.html</ref>Apache Lucene. Websites using rails can take advantage of the Solr search engine to provide sophisticated and customizable search features.
Apache Solr is a search server with a <span class="plainlinks">[http://en.wikipedia.org/wiki/Representational_state_transfer REST]</span>-like API. It is an indexing and searching framework which could be deployed and used with many web frameworks like Rails, Drupal, Django etc. Indexing could be done using JSON, XML, CSV or binary over Hyper text transfer protocol. It could be then queried using HTTP with a GET method and receive the JSON, XML, CSV or binary results.
The sunspot_rails gem which is the client of Solr integrates Sunspot into Rails with drop-in ease, extending ActiveRecord objects for searchability and managing the commit cycle transparently. Sunspot_rails works with Rails 2.3 and Rails 3.0.
 
===Technology Stack<ref>http://www.slideshare.net/dkeener/rails-and-the-apache-solr-search-engine</ref>===
Solr is built on top of Apache Lucene. It is a toolbox responsible for indexing, searching, spell-check and advance tokenization whereas Solr is a search server which inherits all the features of Lucene but also adds API integration, caching and most importantly a web admin interface. This feature makes Solr very easy to use in production environment. Both of these technology needs Java 1.4 or above.
[[File: structure_solr.png|center]]
Sunspot library interacts with Solr using a low-level interface called RSolr. It is a ruby client which integrates with the Solr API's to Rails through the use of Sunspot gem. Sunspot has a drop-in ActiveRecord support. Sunspot provides a easy abstraction to using the Solr search engine. The installation and application of Solr using Sunspot gem is discussed below.
 
=='''Features<ref>http://lucene.apache.org/solr/features.html</ref>'''==
Apache Solr with Rails has the following features
 
*Enables full-text matching, powered by Lucene software
*Built to handle high volume traffic
*Provides <span class="plainlinks">[http://en.wikipedia.org/wiki/Faceted_search faceted]</span> searching
*Provides features like suggester, spellcheck, clustering, auto-complete, highlighting
*Extensible through plugins
*Supports statistical and aggregate processing of text
*Supports rich format data such as PDF, Word, Powerpoint
 
=='''Installation'''==
 
Sunspot makes it easy to do full text searching through Solr. Sunspot comes as a gem and is installed in the usual way by adding it to the Gemfile and running bundle.
 
<pre>
gem 'sunspot_rails'
gem 'sunspot_solr' # optional pre-packaged Solr distribution for use in development
</pre>
 
<pre>
bundle install
</pre>
 
Once the gem and its dependencies have installed we will need to generate Sunspot’s configuration file which we can do by running
 
<pre>
$ rails g sunspot_rails:install
</pre>
 
This command creates a YML file at /config/sunspot.yml. We don’t need to make any changes to the default settings in this file.
Sunspot embeds Solr inside the gem so there’s no need to install it separately. This means that it works straight out of the box which makes it far more convenient to use in development. To get it up and running we run
 
<pre>
$ rake sunspot:solr:start
</pre>
 
If you’re running OS X Lion and you haven’t installed a Java runtime you’ll be prompted to do so when you run this command. You may also see a deprecation warning but this can be safely ignored. The command will also create some more configuration files for advanced configuration.
 
=='''Usage and Examples<ref>http://outoftime.github.io/sunspot/docs/index.html</ref>'''==
=== Indexing Objects ===
Add a searchable block to the objects you wish to index.
 
<pre>
class Example < ActiveRecord::Base
  searchable do
    text :title, :body
  end
end
</pre>
 
Text fields will be full-text searchable. Other fields which are outside the scope of searchable can be used to scope queries.
 
=== Searching Objects ===
Now searching can be done simply by passing the query to search method in the respective class.
 
<pre>
Example.search do
  fulltext 'query'
  with :conditions
end
</pre>
 
We can use many variations on the search now by changing the scope variables in the query.
 
 
=== Example of a Blog Search ===
Here we want to index the text fields. So, it is kept inside the searchable block.
 
<pre>
class Blog < ActiveRecord::Base
  searchable do
    text :title, :body
    text :comments do
      comments.map { |comment| comment.body }
    end
 
    boolean :featured
    integer :blog_id
    integer :author_id
    integer :category_ids, :multiple => true
    double  :average_rating
    time    :published_at
    time    :expired_at
 
    string  :sort_title do
      title.downcase.gsub(/^(an?|the)/, '')
    end
  end
end
</pre>
 
Now for searching the indexed objects we can use:
<pre>
Blog.search do
  fulltext 'best author'
 
  with :blog_id, 1
  with(:published_at).less_than Time.now
  order_by :published_at, :desc
  paginate :page => 2, :per_page => 15
  facet :category_ids, :author_id
end
</pre>
 
Here the text fields are full text searchable and other attributes like blog_id, page, author_id is used to scope the query.
 
=='''Extensions to Sunspot'''==
Though Sunspot is used primarily for indexing and searching, it could be further extended to support multiple features. Some of them are listed as:
 
===<span class="plainlinks"> [http://en.wikipedia.org/wiki/Scope_%28computer_science%29 Scoping]</span>===
We can put Positive and negative restrictions along with conjunctions or disjunctions to scope the query.
 
*Positive restrictions
<pre>
#Posts with a category of 1, 3, or 5
Blog.search do
  with(:category_ids, [1, 3, 5])
end
</pre>
 
*Negative restrictions
<pre>
# Blogs not in category 1 or 3
Blog.search do
  without(:category_ids, [1, 3])
end
</pre>
 
*Disjunctions and Conjunctions
<pre>
# Blogs that do not have an expired time or have not yet expired
Blog.search do
  any_of do
    with(:expired_at).greater_than(Time.now)
    with(:expired_at, nil)
  end
end
</pre>
 
===<span class="plainlinks"> [http://en.wikipedia.org/wiki/Pagination Pagination]</span>===
The search results given by Sunspot are paginated upto 30 items per page.
A custom number of results per page can be specified with the :per_page option to paginate:
 
<pre>
search = Blog.search do
  fulltext "pizza"
  paginate :page => 1, :per_page => 50
end
</pre>
 
===Faceting===
 
It is a feature of Solr that determines the number of documents that match a given search and an additional criterion. This allows you to build powerful drill-down interfaces for search.
In field facets each row represents a particular value for a given field. In query facets, each row represents an arbitrary scope.
 
<pre>
# Blogs that match 'war' returning counts for each :author_id
search = Blog.search do
  fulltext "war"
  facet :author_id
end
 
search.facet(:author_id).rows.each do |facet|
  puts "Author #{facet.value} has #{facet.count} war article!"
end
</pre>
 
===Ordering===
By default, Sunspot orders results by "score": the Solr-determined relevancy metric. We can use order_by method to customize the search.
 
<pre>
# Order by average rating, descending
Blog.search do
  fulltext("war")
  order_by(:average_rating, :desc)
end
</pre>
 
===Highlighting===
It is the snippet of the matched part of the search result. It has to be stored in order to be produced.
<pre>
search = Blog.search do
  fulltext "war" do
    highlight :body
  end
end
</pre>
 
It will highlight the word war in every result string.
 
===Hits vs Results===
 
Sunspot simply stores the type and primary key of objects in Solr. When results are retrieved, those primary keys are used to load the actual object like any relational database.
 
<pre>
# Using #results pulls in the records from the object-relational mapper
Blog.search.results.each do |result|
  puts result.body
end
</pre>
 
To get results without accessing the database, use hits:
<pre>
Blog.search.hits.each do |hit|
  puts hit.stored(:body)
end
</pre>
 
===Reindexing===
Objects are automatically indexed to Solr as a part of the save callbacks but if there is a change in schema then reindexing is necessary.
 
<pre>
bundle exec rake sunspot:solr:reindex
</pre>
 
<ref>http://tech.favoritemedium.com/2010/01/full-text-search-in-rails-with-sunspot.html</ref><ref>http://www.linux-mag.com/id/7341/</ref>Other interesting reads on the topic and referral links are mentioned below.




=='''References'''==
=='''References'''==
<references/>
<references/>

Latest revision as of 20:46, 22 February 2015

Apache Solr and Rails

The topic write up for this page can be found here.

Apache Solr<ref>http://lucene.apache.org/solr/</ref> is a standalone, open-source enterprise search server, written in Java and created by Yonik Seely. It is a servlet servlet that can run within a servlet container such as Apache Tomcat. It is a very popular, fast and scaleable open source search platform built on top of Apache Lucene search library.

Rails is a framework used to develop web based application that incorporates the MVC architectural pattern. Websites using Rails can take advantage of the Solr search engine to provide very sophisticated and customizable search features. Rails integrates with Solr search server using Sunspot<ref>https://rubygems.org/gems/sunspot_rails</ref><ref>https://github.com/sunspot/sunspot</ref> gem.

Introduction

Apache Solr is a search server with a REST-like API. It is an indexing and searching framework which could be deployed and used with many web frameworks like Rails, Drupal, Django etc. Indexing could be done using JSON, XML, CSV or binary over Hyper text transfer protocol. It could be then queried using HTTP with a GET method and receive the JSON, XML, CSV or binary results. The sunspot_rails gem which is the client of Solr integrates Sunspot into Rails with drop-in ease, extending ActiveRecord objects for searchability and managing the commit cycle transparently. Sunspot_rails works with Rails 2.3 and Rails 3.0.

Technology Stack<ref>http://www.slideshare.net/dkeener/rails-and-the-apache-solr-search-engine</ref>

Solr is built on top of Apache Lucene. It is a toolbox responsible for indexing, searching, spell-check and advance tokenization whereas Solr is a search server which inherits all the features of Lucene but also adds API integration, caching and most importantly a web admin interface. This feature makes Solr very easy to use in production environment. Both of these technology needs Java 1.4 or above.

Sunspot library interacts with Solr using a low-level interface called RSolr. It is a ruby client which integrates with the Solr API's to Rails through the use of Sunspot gem. Sunspot has a drop-in ActiveRecord support. Sunspot provides a easy abstraction to using the Solr search engine. The installation and application of Solr using Sunspot gem is discussed below.

Features<ref>http://lucene.apache.org/solr/features.html</ref>

Apache Solr with Rails has the following features

  • Enables full-text matching, powered by Lucene software
  • Built to handle high volume traffic
  • Provides faceted searching
  • Provides features like suggester, spellcheck, clustering, auto-complete, highlighting
  • Extensible through plugins
  • Supports statistical and aggregate processing of text
  • Supports rich format data such as PDF, Word, Powerpoint

Installation

Sunspot makes it easy to do full text searching through Solr. Sunspot comes as a gem and is installed in the usual way by adding it to the Gemfile and running bundle.

gem 'sunspot_rails'
gem 'sunspot_solr' # optional pre-packaged Solr distribution for use in development
bundle install

Once the gem and its dependencies have installed we will need to generate Sunspot’s configuration file which we can do by running

$ rails g sunspot_rails:install

This command creates a YML file at /config/sunspot.yml. We don’t need to make any changes to the default settings in this file. Sunspot embeds Solr inside the gem so there’s no need to install it separately. This means that it works straight out of the box which makes it far more convenient to use in development. To get it up and running we run

$ rake sunspot:solr:start

If you’re running OS X Lion and you haven’t installed a Java runtime you’ll be prompted to do so when you run this command. You may also see a deprecation warning but this can be safely ignored. The command will also create some more configuration files for advanced configuration.

Usage and Examples<ref>http://outoftime.github.io/sunspot/docs/index.html</ref>

Indexing Objects

Add a searchable block to the objects you wish to index.

class Example < ActiveRecord::Base
  searchable do
    text :title, :body
  end
end

Text fields will be full-text searchable. Other fields which are outside the scope of searchable can be used to scope queries.

Searching Objects

Now searching can be done simply by passing the query to search method in the respective class.

Example.search do
  fulltext 'query'
  with :conditions
end

We can use many variations on the search now by changing the scope variables in the query.


Example of a Blog Search

Here we want to index the text fields. So, it is kept inside the searchable block.

class Blog < ActiveRecord::Base
  searchable do
    text :title, :body
    text :comments do
      comments.map { |comment| comment.body }
    end

    boolean :featured
    integer :blog_id
    integer :author_id
    integer :category_ids, :multiple => true
    double  :average_rating
    time    :published_at
    time    :expired_at

    string  :sort_title do
      title.downcase.gsub(/^(an?|the)/, '')
    end
  end
end

Now for searching the indexed objects we can use:

Blog.search do
  fulltext 'best author'

  with :blog_id, 1
  with(:published_at).less_than Time.now
  order_by :published_at, :desc
  paginate :page => 2, :per_page => 15
  facet :category_ids, :author_id
end

Here the text fields are full text searchable and other attributes like blog_id, page, author_id is used to scope the query.

Extensions to Sunspot

Though Sunspot is used primarily for indexing and searching, it could be further extended to support multiple features. Some of them are listed as:

Scoping

We can put Positive and negative restrictions along with conjunctions or disjunctions to scope the query.

  • Positive restrictions
#Posts with a category of 1, 3, or 5
Blog.search do
  with(:category_ids, [1, 3, 5])
end
  • Negative restrictions
# Blogs not in category 1 or 3
Blog.search do
  without(:category_ids, [1, 3])
end
  • Disjunctions and Conjunctions
# Blogs that do not have an expired time or have not yet expired
Blog.search do
  any_of do
    with(:expired_at).greater_than(Time.now)
    with(:expired_at, nil)
  end
end

Pagination

The search results given by Sunspot are paginated upto 30 items per page. A custom number of results per page can be specified with the :per_page option to paginate:

search = Blog.search do
  fulltext "pizza"
  paginate :page => 1, :per_page => 50
end

Faceting

It is a feature of Solr that determines the number of documents that match a given search and an additional criterion. This allows you to build powerful drill-down interfaces for search. In field facets each row represents a particular value for a given field. In query facets, each row represents an arbitrary scope.

# Blogs that match 'war' returning counts for each :author_id
search = Blog.search do
  fulltext "war"
  facet :author_id
end

search.facet(:author_id).rows.each do |facet|
  puts "Author #{facet.value} has #{facet.count} war article!"
end

Ordering

By default, Sunspot orders results by "score": the Solr-determined relevancy metric. We can use order_by method to customize the search.

# Order by average rating, descending
Blog.search do
  fulltext("war")
  order_by(:average_rating, :desc)
end

Highlighting

It is the snippet of the matched part of the search result. It has to be stored in order to be produced.

search = Blog.search do
  fulltext "war" do
    highlight :body
  end
end

It will highlight the word war in every result string.

Hits vs Results

Sunspot simply stores the type and primary key of objects in Solr. When results are retrieved, those primary keys are used to load the actual object like any relational database.

# Using #results pulls in the records from the object-relational mapper
Blog.search.results.each do |result|
  puts result.body
end

To get results without accessing the database, use hits:

Blog.search.hits.each do |hit|
  puts hit.stored(:body)
end

Reindexing

Objects are automatically indexed to Solr as a part of the save callbacks but if there is a change in schema then reindexing is necessary.

bundle exec rake sunspot:solr:reindex

<ref>http://tech.favoritemedium.com/2010/01/full-text-search-in-rails-with-sunspot.html</ref><ref>http://www.linux-mag.com/id/7341/</ref>Other interesting reads on the topic and referral links are mentioned below.


References

<references/>