CSC/ECE 517 Spring 2015/ch1a 13 MA: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
<font size="5"><b>Postgres H-Store</b></font>
<font size="5"><b>Postgres H-Store</b></font>


PostgreSQL is a powerful, open source object-relational database system based on [http://db.cs.berkeley.edu/postgres.html POSTGRES, Version 4.2], developed at the University of California at Berkeley Computer Science Department. It is fully [http://en.wikipedia.org/wiki/ACID ACID(Atomicity, Consistency, Isolation, Durability)] compliant,has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.
PostgreSQL is a powerful, open source object-relational database system based on [http://db.cs.berkeley.edu/postgres.html POSTGRES, Version 4.2], developed at the University of California at Berkeley Computer Science Department. It is fully [http://en.wikipedia.org/wiki/ACID ACID(Atomicity, Consistency, Isolation, Durability)] compliant,has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.

Revision as of 05:13, 8 February 2015

Postgres H-Store


PostgreSQL is a powerful, open source object-relational database system based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department. It is fully ACID(Atomicity, Consistency, Isolation, Durability) compliant,has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.

An enterprise class database, PostgreSQL boasts sophisticated features such as Multiversion concurrency control (MVCC), Point in time recovery, Tablespaces, Savepoints, asynchronous replication, online/hot backups, a sophisticated query planner/optimizer, and write ahead logging for fault tolerance. It supports international character sets, multibyte character encodings, Unicode, and it is locale-aware for sorting, case-sensitivity, and formatting. It is highly scalable both in the sheer quantity of data it can manage and in the number of concurrent users it can accommodate. There are active PostgreSQL systems in production environments that manage in excess of 4 terabytes of data.

One of the great strengths of PostgreSQL is extensibility. Just as the JVM has become more than a way to just run Java—spawning languages such as Clojure and Scala—PostgreSQL has become more than just a home to relational data and the SQL language.

PostgreSQL has a number of ways of storing loosely structured data — documents — in a column on a table.

  • hstore is a data type available as a contrib package that allows the user to store key/value structures just like a dictionary or hash.
  • You can store data in JSON format on a text field, and then use PLV8 to JSON.parse() it right in the database.
  • There is a native xml data type, along with a few interesting query functions that allow you to extract and operate on data that sits deep in an XML structure.

We would be focusing on hstore in this document. hstore is a key value store within Postgres. It can be used similar to a dictionary within another language, but it’s specific to a column on a row.

Background

Hstore is a schemaless key value column in Postgres. It's perfect if the users need to store attributes for an object but aren't quite sure what the schema should be. In a scenario where a user starts selling 'Products' that have an author and an isbn, if he wants to start selling cameras, or computer equipment he'll need to store other things like amount of ram or manufacturer. With hstore, the same column can be used for both sets of data, no migrations or schema changes needed. Hstore is really useful for saving attributes on models. If we need to store all the settings for a particular set of users, this would be typically done in a separate model (or on the User model). Each setting would be an additional column. Instead of adding columns each time a new setting has to be created, a single hstore column could be created and used. It's much more flexible and doesn't require migrations each time something new has to be stored.

hstore enables the users to build better apps faster without sacrificing the power, reliability, and flexibility of the underlying PostgreSQL storage engine. By using hstore, the users will be able to leverage the flexibility and agility of schema-less data stores in existing environments. Although hstore is a mature, stable solution, it has recently been gathering widespread excitement. Support for hstore is available today in many popular languages and frameworks, including plugins for Django, Rails/ActiveRecord, Sequel, and Node.js.

In Rails hstore can be used like a hash, with any key and any value input. Once it's stored the keys or values can be queried. hstore behaves much like a serialized hash column. The main difference is performance, since hstore is storing the keys and values natively in Postgres, queries run much much faster. Rails 4 supports the hstore column type, until then a gem has to be used.

Potential Applications

What should hstore be used for? hstore is suitable for use in cases where most of the users' data is normalized, but there may be bits of data related to models that may need to be changed frequently and the user doesn’t prefer to run database migrations too often, thus changing the structure of the database and possibly requiring the site to come down. For example, maybe a user consumes information from an API that may change its responses without providing a backward-compatible, unchanging structure (inserting new keys in its JSON responses, perhaps). This may not be all that common with public APIs, but with internal projects this can happen rather frequently. Perhaps the user wouldn’t want to discard this data, but without writing a migration and making room for new data, he won’t be able to keep it. Another possibility could be user preferences – Any user may not necessarily want 10-30+ rows per user in a giant join table for user preference data and perhaps may feel that should be stored with the user object, but don’t want to go through the trouble of migrating the database (and incurring possible downtime) when deploying changes to user preference structure.

Enter hstore. With this extension active and in use, the user could simply define changes to the structure without a migration and redeploy to pick up changes immediately. That API could throw whatever it wants because the JSON can be received and it can be pushed into hstore. In order to handle the myriad of user preferences, only the stuff that differs from the defaults has to be stored as key/value pair. It can be changed at will by just committing new code. reference:https://blog.engineyard.com/2013/using-postgresql-hstore-in-a-rails-application-on-engine-yard-cloud

hstore Datatype

hstore Representation

The text representation of an hstore, used for input and output, includes zero or more key => value pairs separated by commas. Some examples:

k => v
foo => bar, baz => whatever
"1-a" => "anything at all"


The order of the pairs is not significant (and may not be reproduced on output). Whitespace between pairs or around the => sign is ignored. Double-quote keys and values that include whitespace, commas, =s or >s. To include a double quote or a backslash in a key or value, escape it with a backslash.

Each key in an hstore is unique. If you declare an hstore with duplicate keys, only one will be stored in the hstore and there is no guarantee as to which will be kept:

SELECT 'a=>1,a=>2'::hstore;
  hstore
----------
 "a"=>"1"

A value (but not a key) can be an SQL NULL. For example:

key => NULL

hstore Operators

hstore Functions

Indexes

hstore has GiST and GIN index support for the @>, ?, ?& and ?| operators. For example:


CREATE INDEX hidx ON testhstore USING GIST (h);
CREATE INDEX hidx ON testhstore USING GIN (h);

hstore also supports btree or hash indexes for the = operator. This allows hstore columns to be declared UNIQUE, or to be used in GROUP BY, ORDER BY or DISTINCT expressions. The sort ordering for hstore values is not particularly useful, but these indexes may be useful for equivalence lookups. Create indexes for = comparisons as follows:


CREATE INDEX hidx ON testhstore USING BTREE (h);
CREATE INDEX hidx ON testhstore USING HASH (h);

Add a key, or update an existing key with a new value:

Other Operations

UPDATE tab SET h = h || ('c' => '3');

Delete a key:

UPDATE tab SET h = delete(h, 'k1');

Convert a record to an hstore:

CREATE TABLE test (col1 integer, col2 text, col3 text);
INSERT INTO test VALUES (123, 'foo', 'bar');

SELECT hstore(t) FROM test AS t;

                   hstore                    
---------------------------------------------
 "col1"=>"123", "col2"=>"foo", "col3"=>"bar"  (1 row)

Convert an hstore to a predefined record type:

CREATE TABLE test (col1 integer, col2 text, col3 text);

SELECT * FROM populate_record(null::test,
                              '"col1"=>"456", "col2"=>"zzz"');
 col1 | col2 | col3 
------+------+------
  456 | zzz  |       (1 row)

Modify an existing record using the values from an hstore:

CREATE TABLE test (col1 integer, col2 text, col3 text);
INSERT INTO test VALUES (123, 'foo', 'bar');

SELECT (r).* FROM (SELECT t #= '"col3"=>"baz"' AS r FROM test t) s;
 col1 | col2 | col3 
------+------+------
  123 | foo  | baz   (1 row)


Using Hstore with Rails 4

Install GEM

  • Install hstore is a PostgreSQL contrib type, check it out first.
  • Then, just add the following to the Gemfile:
    • gem 'activerecord-postgres-hstore'
  • And run the bundler:
    • bundle install

Migrate

  • A migration has to be created that adds hstore support for the PostgreSQL database:
    • rails g hstore:setup
  • Run rake db:migrate
  • Now database tables can be created using the hstore type.

Enable hstore Extension

First, the hstore extension has to be enabled in the PostgreSQL database. This can be done with a migration.

class AddHstore < ActiveRecord::Migration  
  def up
    enable_extension :hstore
  end

  def down
    disable_extension :hstore
  end
end  

Add hstore Column

Next, since hstore is now a natively recognized datatype in Rails, a hstore column can be added to any existing model. A "settings" column to a users table can be added as follows.

class AddSettingsToUser < ActiveRecord::Migration  
  def up
    add_column :users, :settings, :hstore
  end

  def down
    remove_column :users, :settings
  end
end  

Define Accessor

Accessors can be defined for the hstore keys in the model. Validations work just like they would for any other column in your model.

class User < ActiveRecord::Base  
  # setup hstore
  store_accessor :settings, :favorite_color, :time_zone

  # can even run typical validations on hstore fields
  validates :favorite_color,                            
    inclusion: { in: %w{blue, gold, red} }

  validates_inclusion_of :time_zone, 
    in: ActiveSupport::TimeZone.zones_map { |m| m.name }, 
    message: 'is not a valid Time Zone'

end 

A Pitfall

One thing to look out for is storing booleans in hstore. hstore converts booleans to strings. A quick solution to this is overwriting the getter method to convert them back to a boolean on read. Following is an example:

# convert string to boolean. hstore only stores strings
def name_of_some_hstore_key  
  return (super == 'true') if %w{true false}.include? super
  super
end

Hstore vs JSON

Hstore is often compared to JSON. Which one has to be used for what kind of application? If JSON is already being used and the user simply wants to store it in the database then JSON datatype is often the correct pick. However, if the user is looking for flexibility with the data model then hstore is likely the a better path. hstore gives great flexibility as well as a good ability to query data with high performance.

https://mikecoutermarsh.com/using-hstore-with-rails-4/

Conclusion

References

<references/>