CSC/ECE 517 Fall 2007/wiki2 2 d4
Topic
Object-relational mapping. Ruby's ActiveRecord is one attempt to allow an object-oriented program to use a relational database. The Crossing Chasms pattern is another. Look up several approaches to mapping relational databases to o-o programs, include hyperlinks to all of them, and explain how they differ. Report on the strengths of various approaches (making sure to credit the authors for their insights)
Object Relational Mapping (ORM)
Object-relational mapping (ORM) is the technique used to marshal database records into classes of object-oriented (o-o) languages and vice-versa. The technique is usually developed into a tool that can be used by the developer to alleviate some of the complexity in this mapping. The complexity derives from the inherently different data types that are used by relational databases and o-o languages. Database data types are typically scalar in nature while classes are more typically composed of a mix of non-scalar and scalar types.
Figure 1. Position of ORM Layer in Database Architecture
In addition to mapping data structures in ORM, o-o concepts often need to “bridge the chasm” between the object layer and the database layer. These concepts include: inheritance, aggregation, polymorphism, class associations, and data encapsulation6. This conceptual difference between the database structure and o-o data storage is known as the impedance mismatch,which can create a fragile bridge across the chasm. A solution to this problem would be to create o-o databases, which has been done, but these databases do not have the structural stability or mathematical foundation that relational databases have. Even if o-o databases were to catch on, the ORM problem would still exist for years to come with the many legacy relational databases.
There have been many ORM models developed to bridge the chasm, and we will compare and contrast a handful of them in terms of their utility for developers. Some issues that should be considered when comparing ORM models are the degree of coupling between the layers, performance issues, storage space usage, and maybe even concurrency and synchronization issues. Often improving one issue will come at the expense of another issue, e.g., decoupling the layers, maybe by separating their physical locations, can increase response time thus degrading performance6.
Approaches/Patterns
Crossing the Chasm Pattern Language
One of the first ORMs, the name implies the difficulty involved in mapping between the two disjoint data structures. The Chasm Pattern Language is a collection of patterns developed by Kyle Brown and Bruce Whitenack used to solve the ORM problem as it relates to “marrying relational databases and Smalltalk”. The patterns of Crossing Chasms are grouped into “static” patterns that deal with setting up the database tables and objects, “dynamic” patterns that relate to the runtime issues involved in the ORM model, and architectural patterns that are broad patterns that model entire systems.
Representing Objects as Tables with Crossing Chasms
The following "static" patterns can be combined to represent class objects as tables in a relational database.
- Object Identifier - This is a pattern that assigns a unique identifier to each persistent object usually using a database’s sequence number generator. This pattern will allow the identity of instances of class objects to be persisted in a database and avoid duplication of these instances9.
- Foreign Key – This is a pattern that handles references to other class types by taking advantage of instances of the other class’s object identifier. Maintenance of these references can incur additional overhead in more searching and caching when accessing objects. Inheritance can also be mapped in this manner with each subclass mapping to a different table linked by using their object identifiers. Again, there is the overhead of having multi-table joins. An alternative is to map the entire hierarchy to one table which makes maintenance more difficult since a change in a base class could require updating many different tables8. If multiple inheritance is involved, then another alternative would be to put the entire inheritance hierarchy into one table, the downside being that there could be wasted storage space with many NULL fields in the rows.
- Representing Collections - Use a relationship table for each collection that maps the object IDs of the containing objects to the contained objects. The relational table can store additional data as well, such as meta-data or positional information for ordered collections9.
The following “dynamic” patterns deal with the flow of data between the layers of the ORM model.
- Broker – This pattern is responsible for writing and reading the class object information to and from the database.10
- Object Metadata - This is a pattern to map the relational database schema to their corresponding class object fields and relationships. This will provide generic code for producing the mapping, thus avoiding repetitive code.
- Query Object - This is a pattern to wrap the query language making database access consistent with the o-o language being used.
Table Data Gateway
This pattern has a simple structure. This class will have methods that take the parameters required in SQL as function arguments and function returns the SQL values. Typical methods would be find, update, insert, delete. This approach is the first step in abstracting the database and calls to SQL. A simple usage of this can be seen in Perl's DBI module. Though a most of the features of the SQL are implemented in this module, the idea is the same.
Dealing the Return type is not straightforward. Most of the times SQL queries return multiple values but programming languages methods typically return only one value. We can use either Map approach or a RecordSet(JDBC, .NET) approach both of which have some compromises.
Row Data Gateway
In this pattern every row of the record structure is treated like an instance of an object. There will be two objects one finder and a gateway. The finder object has methods like find and the gateway objects has methods like insert, update. The reason for separating the finder class is to have polymorphisms of finds from separate data stores. This pattern is almost similar to the active record pattern but with the exception that active record also have the domain logic in the class.
Active Record
- Class objects map to rows in a table and a table represents the class definition.
- Encapsulates database access
- Class objects are responsible for adding, deleting, and updating database tables.
- Easy to implement and use pattern7.
Data Mapper
Popular Products
These patterns in practice can be found in:
Hibernate
- Uses the Data Mapper, Identity Map, and Unit of Work patterns
Oslick
Rails Active Record
Rails DataMapper
Though active record is quite popular among the rails community it has some drawbacks and sometimes criticized to be quite slow. DataMapper for rails uses the following techniques to enhance the performance. A very good comparision has been made in favor of DataMapper here.
Summarizing the points:
- Eager Loading: Its a common practise that if you are making some retrieval from the database, you are likely to make another retrieval surrounding this result. This implementation exploits this fact and loads additional results to the programming interface. This is a positive minded approach, the system hopes that you will require these values too and bring them to your program, if you dont require them it will flush them away after sometime. But if you do require, this will increase the performance by almost twofolds.
- Laziness(sometimes): Dealing Text columns in quite expensive in databases. DataMapper uses a clever strategy and delays the execution of a query on text column till the last required moment. The idea is to group a set of changes as one transaction and avoid multiple expensive database calls.
Lafcadio
- Another ORM for Ruby that currently only supports MySQL databases
- Treats each table row as a class object
- see http://lafcadio.rubyforge.org/ and http://www.zenspider.com/dl/rubyconf2003/lafcadio.pdf
Perl's DBIx::Class
DBIx::Class of Perl uses the Table Gateway pattern to access the database and provide results. Almost all the functionality of the database includeing CRUD, transactions, concurrancy ocntrol can be controlled through its methods instead of going in for the SQL code. The results are returned in a map and it not always trivial to extract values from it especially if the table has many columns.
References
- ORM articles
- Active Record was mentioned in this book
- A video from RailsEnvy giving a quick and clean introduction to Active Record
- A comparison of Data Mapper and Active Record implementations in Ruby
- A blog entry by Dev411 on limitations of Active Record
- Mapping Objects to Tables, Proceedings EuroPLoP, 1997 (2004 update)
- Hybernate vs. Rails: The Persistence Showdown
- Crossing Chasms: The Static Patterns
- A Pattern Language for Relational Databases and Smalltalk
- Relational Database Access Layers: A Pattern Language