CSC/ECE 517 Fall 2013/ch1 1w44 s

From Expertiza_Wiki
Jump to navigation Jump to search

Non-relational Databases in Rails Applications

Active Record is basic to Rails and it is responsible for allowing programs to treat records in relational databases as objects. It employs Object-Relational Mapping, commonly referred to as its abbreviation ORM, is a technique that connects the rich objects of an application to tables in a relational database management system (RDBMS). While Active Record is efficient in what it does, the complexities of ORM make way for a new approach for databases with rails, that is to move to different Database Models namely Object-Oriented Databases and No-SQL Databases.

Overview

Rails, as a web development framework, and Ruby, as a programming language, can be used with a choice of operating systems (Linux, Mac OS X, Windows), databases (SQLite, MySQL, PostgreSQL, and others), and web servers (Apache, Nginx, and others). The Rails stack can also include a variety of software libraries (gems) that add features to a website or make development easier. Sometimes the choice of components in a stack is driven by the requirements of an application

Rails uses the Active Record pattern which describes the mapping of an object instance to a row in a relational database table, using accessor methods to retrieve columns/properties, and the ability to create, update, read, and delete entities from the database.

Active Record is the layer of the system responsible for representing business data and logic. It facilitates the creation and use of business objects whose data requires persistent storage to a database. It employs ORM, is a technique that connects the rich objects of an application to tables in a relational database management system. Using ORM, the properties and relationships of the objects in an application can be easily stored and retrieved from a database without writing SQL statements directly and with less overall database access code.

Active Record gives us several mechanisms, the most important being the ability to:

Represent models and their data.
Represent associations between these models.
Represent inheritance hierarchies through related models.
Validate models before they get persisted to the database.
Perform database operations in an object-oriented fashion.

Object-relational impedance mismatch

The relational model on which Active Record relies on takes data and separates it into many interrelated tables that contain rows and columns. Tables reference each other through foreign keys that are stored in columns as well. When looking up data, the desired information needs to be collected from many tables (often hundreds in today’s enterprise applications) and combined before it can be provided to the application. Similarly, when writing data, the write needs to be coordinated and performed on many tables.

The Active Record pattern describes the mapping of an object instance to a row in a relational database table, using accessor methods to retrieve columns/properties, and the ability to create, update, read, and delete entities from the database. The pattern has numerous limitations referred to as the Object-relational impedance mismatch. Some of these technical difficulties are structural. In Object Oriented programming, objects may be composed of other objects. The Active Record pattern maps these sub-objects to separate tables, thus introducing issues concerning the representation of relationships and encapsulated data. Active Record uses macros to create relationships between objects and single table inheritance to represent inheritance.

Active Record is a solution for the Object-relational impedance mismatch, but this is assuming the datastore is relational. It’s also, fundamentally, a hack.

Non Relational Databases

Object Database

An object database also known as Object-Oriented Database Management System (OODBMS) is a Database Management System (DBMS) which allows information to be represented in the form of objects as used in object-oriented programming. OODBMSs provide an integrated application development environment by joining object-oriented programming with database technology. OODBMSs enforces object oriented programming concepts such as encapsulation, polymorphism and inheritance as well as database management concepts such as Atomicity, Consistency, Isolation and Durability. Since both the programming languages and OODBMS use the same object-oriented model, the programmers can maintain the consistency easily between the two environments.

Comparing RDBMS with Object Database

Criteria RDBMS OODBMS
Object-oriented programming support Poor Extensive
Ease of Use Table schemas easy to understand for everyone Good for Programmers only
Ease in development Independent from application Objects are a natural way to model, can incorporate types and relationships
Extensiblity Low High
Data relationships Hard to model Easy to handle
Maturity Very Mature Relatively Mature

No-SQL Databases

A NoSQL database provides a mechanism for storage and retrieval of data that is less constrained in its consistency models than traditional Relational Databases.

NoSQL encompasses a wide variety of different database technologies but generally all NoSQL databases have a few features in common.

  • Dynamic Schemas

Relational databases require that schemas be defined before you can add data. This fits poorly with agile development approaches, because each time you complete new features, the schema of your database often needs to change. Furthermore, if the database is large, this is a very slow process that involves significant downtime. In contrast NoSQL databases are built to allow the insertion of data without a predefined schema. That makes it easy to make significant application changes in real-time, without worrying about service interruptions – which means development is faster, code integration is more reliable, and less database administrator time is needed.

  • Sharding, Replication and Caching

Because of the way they are structured, relational databases usually scale vertically. This gets expensive quickly, places limits on scale and is failure prone. The solution is to scale horizontally, by adding servers instead of concentrating more capacity in a single server. NoSQL databases usually support auto-sharding, automatic replication, meaning that you get high availability and finally caching capabilities.

NoSQL Database Types

Document databases
Graph stores
Key-value stores
Wide-column stores

Comparing No-SQL DB with RDBMS<ref>http://www.couchbase.com/why-nosql/nosql-database</ref>

Relational and Object database models are very different. For example, a document-oriented database takes the data you want to store and aggregates it into documents using the JSON format. Each JSON document can be thought of as an object to be used by your application. A JSON document might, for example, take all the data stored in a row that spans 20 tables of a relational database and aggregate it into a single document/object. Aggregating this information may lead to duplication of  information, but since storage is no longer cost prohibitive, the resulting data model flexibility, ease of efficiently distributing the resulting documents and read and write performance improvements make it an easy trade-off for web-based applications.

Document Object Model equivalent to Relational tables
Document Object Model equivalent to Relational tables<ref>http://www.couchbase.com/why-nosql/nosql-database</ref>

Another major difference is that relational databases have rigid schemas. Relational databases requires strict definition of a schema prior to storing any data into a database. Changing the schema once data is inserted is a big deal, extremely disruptive and frequently avoided. In comparison, document databases are schemaless, allowing you to freely add fields to JSON documents without having to first define changes. The format of the data being inserted can be changed at any time, without application disruption.

Examples in Rails Apps

In this section we will see how to incorporate two Document Object Databases MongoDB and CouchDB in a rails application. We will also explore Embedding and Nesting for models to better exhibit their use.

MongoDB<ref>http://docs.mongodb.org/</ref>

MongoDB is a document database that provides high performance, high availability, and easy scalability. Key features:

  • Flexibility

MongoDB stores data in JSON documents (which we serialize to BSON). JSON provides a rich data model that seamlessly maps to native programming language types, and the dynamic schema makes it easier to evolve your data model than with a system with enforced schemas such as a RDBMS.

  • Power

MongoDB provides a lot of the features of a traditional RDBMS such as secondary indexes, dynamic queries, sorting, rich updates, upserts (update if document exists, insert if it doesn’t), and easy aggregation. This gives you the breadth of functionality that you are used to from an RDBMS, with the flexibility and scaling capability that the non-relational model allows.

  • Speed/Scaling

By keeping related data together in documents, queries can be much faster than in a relational database where related data is separated into multiple tables and then needs to be joined later. MongoDB also makes it easy to scale out your database. Autosharding allows you to scale your cluster linearly by adding more machines. It is possible to increase capacity without any downtime, which is very important on the web when load can increase suddenly and can bring down the website. Thus leading to extended maintenance which can cost large amounts of revenue.

  • Ease of use

MongoDB works hard to be very easy to install, configure, maintain, and use. To this end, MongoDB provides few configuration options, and instead tries to automatically do the “right thing” whenever possible. This means that MongoDB works right out of the box, and you can dive right into developing your application, instead of spending a lot of time fine-tuning obscure database configurations.

A MongoDB deployment hosts a number of databases. A database holds a set of collections. A collection holds a set of documents. A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection’s documents may hold different types of data.

MongoDB stores all data in documents, which are JSON-style data structures composed of field-and-value pairs:

{ "item": "pencil", "qty": 500, "type": "no.2" }

MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents, though contains more data types than does JSON

MongoMapper and Mongoid are the two leading gems that make it possible to use MongoDB as a datastore with Rails. MongoMapper is a simple ORM for MongoDB. Mongoid’s goal is to provide a familiar API to Active Record, while still leveraging MongoDB’s advantages of schema flexibility, document design, atomic modifiers, and rich query interface.

MongoDB in Rails<ref>http://docs.mongodb.org/ecosystem/tutorial/model-data-for-ruby-on-rails/</ref>

Firstly to create a new Rails app using MongoDB we need to ensure we create it without active record and install the necessary gems outlined before.

> rails new my_app --skip-active-record
> cd my_app
> gem install mongo
> gem install bson
> gem install bson_ext
> gem install mongomapper
> gem install mongoid
> rails server

Rails model files are altered to not have the classes inherit from ActiveRecord::Base, instead must include a module MongoMapper::Document, and define the schema in the actual file. Database migrations are not necessary.

class Story
 include MongoMapper::Document

 key :title,     String
 key :url,       String
 key :voters,    Array
 key :votes,     Integer, :default => 0

 # Cached values.
 key :comment_count, Integer, :default => 0
 key :username,      String

 # Note this: ids are of class ObjectId.
 key :user_id,   ObjectId
 timestamps!

 # Relationships.
 belongs_to :user

 # Validations.
 validates_presence_of :title, :url, :user_id
end

Additionally, you have a number of configuration options available, such as allow_dynamic_fields that allows you to define attributes on an object that aren’t in the model file’s schema. You can then add some logic in your model file if you need to do something different depending on the existence or absence of this field.

For fast lookups, we can create an index on this field. In the MongoDB shell:

db.courses.ensureIndex('voters');

To find all the Courses by name:

Course.all(:conditions => {:name => @suject.name})

Embedding<ref>http://docs.mongodb.org/ecosystem/tutorial/model-data-for-ruby-on-rails/</ref>

Now we will see how we can acheive embedding in a Rails App that uses MongoDB. We do this by example, incorporating comments into the Story example illustrated above. In a relational database, comments are usually given their own table, related by foreign key to some parent table. This approach is occasionally necessary in MongoDB; however, it’s always best to try to embed first, as this will achieve greater query efficiency.

  • Linear Comments
class Story
  include MongoMapper::Document
  many :comments
end

 class Comment
  include MongoMapper::EmbeddedDocument
  key :body, String

  belongs_to :story
end

If we were using the Ruby driver alone, we could save our structure like so:

@stories  = @db.collection('stories')
@document = {:title => "MongoDB on Rails",
             :comments => [{:body     => "Revelatory! Loved it!",
                            :username => "Matz"
                           }
                          ]
            }
@stories.save(@document)
  • Nested Comments

To build threaded comments:

@stories  = @db.collection('stories')
@document = {:title => "MongoDB on Rails",
             :comments => [{:body     => "Revelatory! Loved it!",
                            :username => "Matz",
                            :comments => [{:body     => "Agreed.",
                                           :username => "rubydev29"
                                          }
                                         ]
                           }
                          ]
            }
@stories.save(@document)

We can also represent comments as their own collection. Relative to the other options, this incurs a small performance penalty while granting us the greatest flexibility. The tree structure can be represented by storing the unique path for each leaf.

class Comment
  include MongoMapper::Document

  key :body,       String
  key :depth,      Integer, :default => 0
  key :path,       String,  :default => ""

  # Note: we're intentionally storing parent_id as a string
  key :parent_id,  String
  key :story_id,   ObjectId
  timestamps!

  # Relationships.
  belongs_to :story

  # Callbacks.
  after_create :set_path

  private

  # Store the comment's path.
  def set_path
  ....
    end
    save
  end

CouchDB

CouchDB is often categorized as a “NoSQL” database. A CouchDB database lacks a schema, or rigid pre-defined data structures such as tables. A CouchDB server hosts named databases stores documents. Data stored in CouchDB is a JSON document(s). The structure of the data, or document(s), can change dynamically to accommodate evolving needs. Each document is uniquely named in the database, and CouchDB provides a RESTful HTTP API for reading and updating (add, edit, delete) database documents. Documents are the primary unit of data in CouchDB and consist of any number of fields and attachments. Documents also include metadata that’s maintained by the database system. Document fields are uniquely named and contain values of varying types (text, number, boolean, lists, etc), and there is no set limit to text size or element count. User can just chuck arbitrary JSON objects up at this thing and it’ll store it. No setup, no schemas, no nothing.


Key Charachteristics

  • Documents

A CouchDB document is a JSON object that consists of named fields. Field values may be strings, numbers, dates, or even ordered lists and associative maps. An example of a document would be:

{
    "Subject": "Foo Bar",
    "Author": "John Doe",
    "PostedDate": "10/7/2013",
    "Tags": ["foo", "bar"],
    "Body": "Lorem Ipsum..."
}

A CouchDB database is a flat collection of these documents. Each document is identified by a unique ID.

  • Views

To address this problem of adding structure back to semi-structured data, CouchDB integrates a view model using JavaScript for description. Views are the method of aggregating and reporting on the documents in a database, and are built on-demand to aggregate, join and report on database documents. Views are built dynamically and don’t affect the underlying document; you can have as many different view representations of the same data as you like. Incremental updates to documents do not require full re-indexing of views.

  • Schema Free

Unlike SQL databases, which are designed to store and report on highly structured, interrelated data, CouchDB is designed to store and report on large amounts of semi-structured, document oriented data. CouchDB greatly simplifies the development of document oriented applications, such as collaborative web applications.

In an SQL database, the schema and storage of the existing data must be updated as needs evolve. With CouchDB, no schema is required, so new document types with new meaning can be safely added alongside the old. However, for applications requiring robust validation of new documents custom validation functions are possible. The view engine is designed to easily handle new document types and disparate but similar documents.

  • Distributed

CouchDB is a peer based distributed database system. Any number of CouchDB hosts (servers and offline-clients) can have independent "replica copies" of the same database, where applications have full database interactivity (query, add, edit, delete). When back online or on a schedule, database changes can be replicated bi-directionally.

CouchDB has built-in conflict detection and management and the replication process is incremental and fast, copying only documents changed since the previous replication. Most applications require no special planning to take advantage of distributed updates and replication.

CouchDB on Rails<ref>http://www.couchrest.info/</ref>

Installing CouchDB.

sudo apt-get install couchdb -y

Or we can use the many pre compiled binaries on the CouchDB website.

After installation, check if everything works fine by sending a curl request

$ curl http://localhost:5984

{"couchdb":"Welcome","version":"1.1.1"}

We now install the Couchrest gem for ruby

gem install couchrest

To create a new model, we need to define a class that inherits from the CouchRest::Model::Base class in much the same way as an ActiveRecord model. Unlike a regular database mapper, CouchRest Model requires you to define the properties or attributes your model needs to use directly in the class. There are no migrations or automatic column detection in a Document Database.

class Person < CouchRest::Model::Base
  use_database 'people'
  property :first_name, String
  property :last_name, String
  timestamps!
  belongs_to :person
  collection_of :tags
end

CouchDB is accessed via HTTP requests, as such, there is no requirement to create a binary connection to a specific database. The use_database method tells the model which database it should use. This can be provided as a CouchRest::Database object or as simple string that will be concatenated to the connection prefix. A special property macro is available called timestamps! that will create the created_at and updated_at accessors.

As you’d expect, CouchRest Model supports all the standard CRUD operations you’re used to in other object mappers. CouchRest Model supports a few basic queries for retrieving your data from the database.

Embedding<ref>http://www.couchrest.info/</ref>

The CouchRest Model Embeddable module allows you to take full advantage of CouchDB’s ability to store complex documents and retrieve them. Simply include the module in a Class and set any properties you’d like to use.

class CatToy
  include CouchRest::Model::Embeddable

  property :name, String
  property :purchased, Date
end

class Cat < CouchRest::Model::Base
  property :name, String
  property :toys, CatToy, :array => true
end

@cat = Cat.new(:name => 'Felix', :toys => [{:name => 'mouse', :purchased => 1.month.ago}])
@cat.toys.first.class == CatToy
@cat.toys.first.name == 'mouse'

Summary

While Active Record's ORM is most prevelant other forms of databases provide considerable advantage over RDBMS. Other alternatives help us address issues such as scalability, flexibility and are more in-tune with the Cloud model of deploying apps.

See Also

References

<references/>