CSC/ECE 517 Fall 2007/wiki2 2 d4: Difference between revisions
(60 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Topic == | == Topic == | ||
Line 6: | Line 5: | ||
== Object | == Object Relational Mapping (ORM) == | ||
[http://en.wikipedia.org/wiki/Object_Relational_Mapping Object-relational mapping] (ORM) is the technique used to marshal database records into classes of object-oriented (o-o) languages and vice-versa. The technique is usually developed into a tool that can be used by the developer to alleviate some of the complexity in this mapping. The complexity derives from the inherently different data types that are used by relational databases and o-o languages. Database data types are typically scalar in nature while classes are more typically composed of a mix of non-scalar and scalar types. | |||
[[Image:ORM_Flowchart.jpg]] | |||
Figure 1. Position of ORM Layer in Database Architecture | |||
In addition to mapping data structures in ORM, o-o concepts often need to “bridge the chasm” between the object layer and the database layer. These concepts include: inheritance, aggregation, polymorphism, class associations, and data encapsulation[[#References|<sup>6</sup>]]. This conceptual difference between the database structure and o-o data storage is known as the [http://en.wikipedia.org/wiki/Object-Relational_impedance_mismatch impedance mismatch,]which can create a fragile bridge across the chasm. A solution to this problem would be to create [http://en.wikipedia.org/wiki/OODBMS o-o databases], which has been done, but these databases do not have the structural stability or mathematical foundation that relational databases have. Even if o-o databases were to catch on, the ORM problem would still exist for years to come with the many legacy relational databases. | |||
= | There have been [http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software many ORM] models developed to bridge the chasm, and we will compare and contrast a handful of them in terms of their utility for developers. Some issues that should be considered when comparing ORM models are the degree of coupling between the layers, [http://www.realsolve.co.uk/site/tech/orm-performance.php performance issues], [http://www.inconcept.com/jcm/April2000/hallock.html storage space usage], and maybe even [http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0005267.htm concurrency and synchronization issues]. Often improving one issue will come at the expense of another issue, e.g., decoupling the layers, maybe by separating their physical locations, can increase response time thus degrading performance[[#References|<sup>6</sup>]]. | ||
== Approaches/Patterns == | |||
=== Crossing the Chasm Pattern Language === | |||
One of the [http://www.agiledata.org/essays/mappingObjects.html first ORMs], the name implies the difficulty involved in mapping between the two disjoint data structures. The Chasm Pattern Language is a collection of patterns developed by Kyle Brown and Bruce Whitenack used to solve the ORM problem as it relates to [http://www.ksc.com/articles/patternlanguage.htm “marrying relational databases and Smalltalk”]. The patterns of Crossing Chasms are grouped into “static” patterns that deal with setting up the database tables and objects, “dynamic” patterns that relate to the runtime issues involved in the ORM model, and architectural patterns that are broad patterns that model entire systems. | |||
= | ==== Representing Objects as Tables with Crossing Chasms ==== | ||
The following "static" patterns can be combined to represent class objects as tables in a relational database. | |||
* <B>Object Identifier</B> - This is a pattern that assigns a unique identifier to each persistent object usually using a database’s sequence number generator. This pattern will allow the identity of instances of class objects to be persisted in a database and avoid duplication of these instances[http://www.ksc.com/articles/patternlanguage.htm<sup>9</sup>]. | |||
* <B>Foreign Key</B> – This is a pattern that handles references to other class types by taking advantage of instances of the other class’s object identifier. Maintenance of these references can incur additional overhead in more searching and caching when accessing objects. Inheritance can also be mapped in this manner with each subclass mapping to a different table linked by using their object identifiers. Again, there is the overhead of having multi-table joins. An alternative is to map the entire hierarchy to one table which makes maintenance more difficult since a change in a base class could require updating many different tables[http://www.ksc.com/article5.htm<sup>8</sup>]. If multiple inheritance is involved, then another alternative would be to put the entire inheritance hierarchy into one table, the downside being that there could be wasted storage space with many NULL fields in the rows. | |||
* <B>Representing Collections</B> - Use a relationship table for each collection that maps the object IDs of the containing objects to the contained objects. The relational table can store additional data as well, such as meta-data or positional information for ordered collections[http://www.ksc.com/articles/patternlanguage.htm<sup>9</sup>]. | |||
The following “dynamic” patterns deal with the flow of data between the layers of the ORM model. | |||
* <B>Broker</B> – This pattern is responsible for writing and reading the class object information to and from the database.[http://www.objectarchitects.de/ObjectArchitects/papers/Published/ZippedPapers/plop_relzs05.pdf<sup>10</sup>] | |||
* <B>Object Metadata</B> - This is a pattern to map the relational database schema to their corresponding class object fields and relationships. This will provide generic code for producing the mapping, thus avoiding repetitive code. | |||
* <B>Query Object</B> - This is a pattern to wrap the query language making database access consistent with the o-o language being used. | |||
=== | === Table Data Gateway === | ||
[[Image:Table_Data_GateWay.jpg]] | |||
This pattern has a simple structure. This class will have methods that take the parameters required in SQL as function arguments and function returns the SQL values. Typical methods would be find, update, insert, delete. This approach is the first step in abstracting the database and calls to SQL. A simple usage of this can be seen in [http://search.cpan.org/~timb/DBI/DBI.pm Perl's DBI module]. Though most of the features of the SQL are implemented in this module, the idea is the same. | |||
# | <br /><br /> Dealing the Return type is not straightforward. Most of the time SQL queries return multiple values, but programming languages methods typically return only one value. We can use either a Map approach or a RecordSet(JDBC, .NET) approach to overcome this disparity, but both have some compromises. | ||
=== Row Data Gateway === | |||
In this pattern every row of the record structure is treated like an instance of an object. There will be two objects one finder and a gateway. The finder object has methods like find and the gateway objects has methods like insert, update. The reason for separating the finder class is to have polymorphisms of finds from separate data stores. This pattern is almost similar to the active record pattern but with the exception that active record also have the domain logic in the class. | |||
=== Active Record === | |||
[[Image:ActiveRecord1.jpg]] | |||
# Class objects map to rows in a table and a table represents the class definition. | |||
# Encapsulates database access | |||
# Class objects are responsible for adding, deleting, and updating database tables. | |||
# Easy to implement and use pattern[[#References|<sup>7</sup>]]. | |||
=== Data Mapper === | === Data Mapper === | ||
[[Image:DataMapper3.jpg]] | |||
There are some difficulties identified with the Active Record pattern. As the implementor of the class we want all business logic to be in the object but doing so will not correspond to the schema very well. It means that the object schema and the relational schema will not match up. Also the data transfer will cost us a lot of memory usage. | |||
The idea of the data mapper pattern is to have a separate layer of software which separates the in memory objects from the database. It acts like a gateway between the two and isolates one from the another. With the presence of this layer, the in memory objects need not even know that a database exists, and in a similar vein the database need not be concerned with the memory objects. Also, this is the place where we can add extra performance gains, e.g., eager loading, lazy evaluation, etc., which are covered in the next section. | |||
== Popular Products == | == Popular Products == | ||
These patterns in practice can be found in | These patterns in practice can be found in: | ||
=== Rails Active Record === | |||
Rails along with ActiveRecord has taken the database driven webdevelopment to a new height. Creating websites with tightly integrated databases has been made simpler and maintainable. In Rails ActiveRecord is basically a class packaged along with core rails. Only after an initial setup (giving passwords, specifying schema) interacting with the database and even migrating is quite simple. | |||
==== Highlights ==== | |||
#ActiveRecord automatically links the tables in the database to the classes in our project. | |||
#We can specify the cardinality in our model reducing a lot of coding. e.g We have all the child tuples to be deleted if the corresponding parent has been deleted. | |||
#Validation of the data can be specified in the model using simple macros. e.g its easy to specify the length of the name from the model with validates_lenght_of | |||
#Built in support for several databases including MySQL, Oracle, PostgreSQL | |||
#Transactional support etc.. | |||
==== Limitations ==== | |||
# No Foreign Key Support: Although RoR lets you define has_many relationships, it makes no effort to create foreign key constraints in the underlying database to ensure relational integrity. | |||
# No Multi-column Primary Key Support: Multi-column primary keys are a staple of relational database schema definition. | |||
=== Rails DataMapper === | |||
Though active record is quite popular among the rails community, it has some drawbacks and is sometimes criticized to be quite [http://discuss.joelonsoftware.com/default.asp?joel.3.309321.127 slow]. DataMapper for rails uses the following techniques to enhance the performance. A very good comparision has been made in favor of DataMapper [http://datamapper.org/why.html here]. <br /> | |||
Summarizing the points: | |||
#Eager Loading: It is a common situation where, if you are making a retrieval from the database, you are likely to make another retrieval near this result. This implementation exploits this fact, and loads additional results into the programming interface. This is a positive minded approach, the system hopes that you will require these values too and bring them in to your program, if you don't require them it will flush them away after some time. But, if you do require them, this will increase the performance by almost twofold. | |||
#[http://en.wikipedia.org/wiki/Lazy_loading Laziness](sometimes): Dealing with text columns is quite expensive in databases. DataMapper uses a clever strategy and delays the execution of a query on text column until the last possible moment. The idea is to group a set of changes as one transaction and avoid multiple expensive database calls. | |||
=== Hibernate === | |||
# This is likely one of the most popular and oldest OR/Ms. This is used in Java and has a large user base. It uses the Data Mapper, Identity Map, and Unit of Work patterns. | |||
=== Lafcadio === | |||
# Another ORM for Ruby that currently only supports MySQL databases | |||
# Treats each table row as a class object | |||
# see http://lafcadio.rubyforge.org/ and http://www.zenspider.com/dl/rubyconf2003/lafcadio.pdf | |||
=== Perl's DBIx::Class === | === Perl's DBIx::Class === | ||
DBIx::Class of Perl that uses the Table Gateway pattern to access the database and provide results. Almost all the functionality of the database includeing CRUD, transactions, concurrency control can be controlled through its methods instead of using SQL code. The results are returned in a map and it not always trivial to extract values from it especially if the table has many columns. | |||
== Conclusion == | |||
Object-oriented languages have almost become 'the' standard way to develop large scale and/or professional software, and relational databases are currently the most popular approach in database design. Bridging the gap between the two worlds is not easy as both differ in their philosophy, performance enhancements techniques, and it seems one cannot live without the other. So, the real burden is on the developer to cope with these two things. The aim of all the patterns described above is to help the developer reduce or even eliminate writing pure SQL code in their development. Since the developer has different requirements and needs in different projects, we can't decide on an ultimate winner in ORMs. Every project or language has its own requirements. We attempted to list the options available, their advantages and disadvantages, and showed some practical utilizations of the these patterns. Unless there is a shift from using relational databases to using object databases in the future, we can anticipate that object relational mapping will be explored in many different contexts. | |||
== References == | == References == | ||
Line 48: | Line 107: | ||
#[http://www.martinfowler.com/eaaCatalog/index.html Active Record was mentioned in this book] | #[http://www.martinfowler.com/eaaCatalog/index.html Active Record was mentioned in this book] | ||
#[http://www.patchedsoftware.com/RailsEnvy-ActiveRecord.mov A video from RailsEnvy giving a quick and clean introduction to Active Record ] | #[http://www.patchedsoftware.com/RailsEnvy-ActiveRecord.mov A video from RailsEnvy giving a quick and clean introduction to Active Record ] | ||
#[http://datamapper.org/why.html A comparison of | #[http://datamapper.org/why.html A comparison of Data Mapper and Active Record implementations in Ruby] | ||
#[http://www.dev411.com/blog/2006/06/04/activerecord-achilles-heel-of-ruby-on-rails A blog entry by Dev411 on limitations of Active Record] | #[http://www.dev411.com/blog/2006/06/04/activerecord-achilles-heel-of-ruby-on-rails A blog entry by Dev411 on limitations of Active Record] | ||
#[http://www.objectarchitects.de/ObjectArchitects/papers/index.htm?Published/index.htm Mapping Objects to Tables, Proceedings EuroPLoP, 1997 (2004 update)] | |||
#[http://www.theserverside.com/tt/articles/content/RailsHibernate/article.html Hybernate vs. Rails: The Persistence Showdown] | |||
#[http://www.ksc.com/article5.htm Crossing Chasms: The Static Patterns] | |||
#[http://www.ksc.com/articles/patternlanguage.htm A Pattern Language for Relational Databases and Smalltalk] | |||
#[http://www.objectarchitects.de/ObjectArchitects/papers/Published/ZippedPapers/plop_relzs05.pdf Relational Database Access Layers: A Pattern Language] |
Latest revision as of 16:41, 29 October 2007
Topic
Object-relational mapping. Ruby's ActiveRecord is one attempt to allow an object-oriented program to use a relational database. The Crossing Chasms pattern is another. Look up several approaches to mapping relational databases to o-o programs, include hyperlinks to all of them, and explain how they differ. Report on the strengths of various approaches (making sure to credit the authors for their insights)
Object Relational Mapping (ORM)
Object-relational mapping (ORM) is the technique used to marshal database records into classes of object-oriented (o-o) languages and vice-versa. The technique is usually developed into a tool that can be used by the developer to alleviate some of the complexity in this mapping. The complexity derives from the inherently different data types that are used by relational databases and o-o languages. Database data types are typically scalar in nature while classes are more typically composed of a mix of non-scalar and scalar types.
Figure 1. Position of ORM Layer in Database Architecture
In addition to mapping data structures in ORM, o-o concepts often need to “bridge the chasm” between the object layer and the database layer. These concepts include: inheritance, aggregation, polymorphism, class associations, and data encapsulation6. This conceptual difference between the database structure and o-o data storage is known as the impedance mismatch,which can create a fragile bridge across the chasm. A solution to this problem would be to create o-o databases, which has been done, but these databases do not have the structural stability or mathematical foundation that relational databases have. Even if o-o databases were to catch on, the ORM problem would still exist for years to come with the many legacy relational databases.
There have been many ORM models developed to bridge the chasm, and we will compare and contrast a handful of them in terms of their utility for developers. Some issues that should be considered when comparing ORM models are the degree of coupling between the layers, performance issues, storage space usage, and maybe even concurrency and synchronization issues. Often improving one issue will come at the expense of another issue, e.g., decoupling the layers, maybe by separating their physical locations, can increase response time thus degrading performance6.
Approaches/Patterns
Crossing the Chasm Pattern Language
One of the first ORMs, the name implies the difficulty involved in mapping between the two disjoint data structures. The Chasm Pattern Language is a collection of patterns developed by Kyle Brown and Bruce Whitenack used to solve the ORM problem as it relates to “marrying relational databases and Smalltalk”. The patterns of Crossing Chasms are grouped into “static” patterns that deal with setting up the database tables and objects, “dynamic” patterns that relate to the runtime issues involved in the ORM model, and architectural patterns that are broad patterns that model entire systems.
Representing Objects as Tables with Crossing Chasms
The following "static" patterns can be combined to represent class objects as tables in a relational database.
- Object Identifier - This is a pattern that assigns a unique identifier to each persistent object usually using a database’s sequence number generator. This pattern will allow the identity of instances of class objects to be persisted in a database and avoid duplication of these instances9.
- Foreign Key – This is a pattern that handles references to other class types by taking advantage of instances of the other class’s object identifier. Maintenance of these references can incur additional overhead in more searching and caching when accessing objects. Inheritance can also be mapped in this manner with each subclass mapping to a different table linked by using their object identifiers. Again, there is the overhead of having multi-table joins. An alternative is to map the entire hierarchy to one table which makes maintenance more difficult since a change in a base class could require updating many different tables8. If multiple inheritance is involved, then another alternative would be to put the entire inheritance hierarchy into one table, the downside being that there could be wasted storage space with many NULL fields in the rows.
- Representing Collections - Use a relationship table for each collection that maps the object IDs of the containing objects to the contained objects. The relational table can store additional data as well, such as meta-data or positional information for ordered collections9.
The following “dynamic” patterns deal with the flow of data between the layers of the ORM model.
- Broker – This pattern is responsible for writing and reading the class object information to and from the database.10
- Object Metadata - This is a pattern to map the relational database schema to their corresponding class object fields and relationships. This will provide generic code for producing the mapping, thus avoiding repetitive code.
- Query Object - This is a pattern to wrap the query language making database access consistent with the o-o language being used.
Table Data Gateway
This pattern has a simple structure. This class will have methods that take the parameters required in SQL as function arguments and function returns the SQL values. Typical methods would be find, update, insert, delete. This approach is the first step in abstracting the database and calls to SQL. A simple usage of this can be seen in Perl's DBI module. Though most of the features of the SQL are implemented in this module, the idea is the same.
Dealing the Return type is not straightforward. Most of the time SQL queries return multiple values, but programming languages methods typically return only one value. We can use either a Map approach or a RecordSet(JDBC, .NET) approach to overcome this disparity, but both have some compromises.
Row Data Gateway
In this pattern every row of the record structure is treated like an instance of an object. There will be two objects one finder and a gateway. The finder object has methods like find and the gateway objects has methods like insert, update. The reason for separating the finder class is to have polymorphisms of finds from separate data stores. This pattern is almost similar to the active record pattern but with the exception that active record also have the domain logic in the class.
Active Record
- Class objects map to rows in a table and a table represents the class definition.
- Encapsulates database access
- Class objects are responsible for adding, deleting, and updating database tables.
- Easy to implement and use pattern7.
Data Mapper
There are some difficulties identified with the Active Record pattern. As the implementor of the class we want all business logic to be in the object but doing so will not correspond to the schema very well. It means that the object schema and the relational schema will not match up. Also the data transfer will cost us a lot of memory usage.
The idea of the data mapper pattern is to have a separate layer of software which separates the in memory objects from the database. It acts like a gateway between the two and isolates one from the another. With the presence of this layer, the in memory objects need not even know that a database exists, and in a similar vein the database need not be concerned with the memory objects. Also, this is the place where we can add extra performance gains, e.g., eager loading, lazy evaluation, etc., which are covered in the next section.
Popular Products
These patterns in practice can be found in:
Rails Active Record
Rails along with ActiveRecord has taken the database driven webdevelopment to a new height. Creating websites with tightly integrated databases has been made simpler and maintainable. In Rails ActiveRecord is basically a class packaged along with core rails. Only after an initial setup (giving passwords, specifying schema) interacting with the database and even migrating is quite simple.
Highlights
- ActiveRecord automatically links the tables in the database to the classes in our project.
- We can specify the cardinality in our model reducing a lot of coding. e.g We have all the child tuples to be deleted if the corresponding parent has been deleted.
- Validation of the data can be specified in the model using simple macros. e.g its easy to specify the length of the name from the model with validates_lenght_of
- Built in support for several databases including MySQL, Oracle, PostgreSQL
- Transactional support etc..
Limitations
- No Foreign Key Support: Although RoR lets you define has_many relationships, it makes no effort to create foreign key constraints in the underlying database to ensure relational integrity.
- No Multi-column Primary Key Support: Multi-column primary keys are a staple of relational database schema definition.
Rails DataMapper
Though active record is quite popular among the rails community, it has some drawbacks and is sometimes criticized to be quite slow. DataMapper for rails uses the following techniques to enhance the performance. A very good comparision has been made in favor of DataMapper here.
Summarizing the points:
- Eager Loading: It is a common situation where, if you are making a retrieval from the database, you are likely to make another retrieval near this result. This implementation exploits this fact, and loads additional results into the programming interface. This is a positive minded approach, the system hopes that you will require these values too and bring them in to your program, if you don't require them it will flush them away after some time. But, if you do require them, this will increase the performance by almost twofold.
- Laziness(sometimes): Dealing with text columns is quite expensive in databases. DataMapper uses a clever strategy and delays the execution of a query on text column until the last possible moment. The idea is to group a set of changes as one transaction and avoid multiple expensive database calls.
Hibernate
- This is likely one of the most popular and oldest OR/Ms. This is used in Java and has a large user base. It uses the Data Mapper, Identity Map, and Unit of Work patterns.
Lafcadio
- Another ORM for Ruby that currently only supports MySQL databases
- Treats each table row as a class object
- see http://lafcadio.rubyforge.org/ and http://www.zenspider.com/dl/rubyconf2003/lafcadio.pdf
Perl's DBIx::Class
DBIx::Class of Perl that uses the Table Gateway pattern to access the database and provide results. Almost all the functionality of the database includeing CRUD, transactions, concurrency control can be controlled through its methods instead of using SQL code. The results are returned in a map and it not always trivial to extract values from it especially if the table has many columns.
Conclusion
Object-oriented languages have almost become 'the' standard way to develop large scale and/or professional software, and relational databases are currently the most popular approach in database design. Bridging the gap between the two worlds is not easy as both differ in their philosophy, performance enhancements techniques, and it seems one cannot live without the other. So, the real burden is on the developer to cope with these two things. The aim of all the patterns described above is to help the developer reduce or even eliminate writing pure SQL code in their development. Since the developer has different requirements and needs in different projects, we can't decide on an ultimate winner in ORMs. Every project or language has its own requirements. We attempted to list the options available, their advantages and disadvantages, and showed some practical utilizations of the these patterns. Unless there is a shift from using relational databases to using object databases in the future, we can anticipate that object relational mapping will be explored in many different contexts.
References
- ORM articles
- Active Record was mentioned in this book
- A video from RailsEnvy giving a quick and clean introduction to Active Record
- A comparison of Data Mapper and Active Record implementations in Ruby
- A blog entry by Dev411 on limitations of Active Record
- Mapping Objects to Tables, Proceedings EuroPLoP, 1997 (2004 update)
- Hybernate vs. Rails: The Persistence Showdown
- Crossing Chasms: The Static Patterns
- A Pattern Language for Relational Databases and Smalltalk
- Relational Database Access Layers: A Pattern Language