CSC/ECE 517 Summer 2008/wiki3 1 th: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(173 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= RDB/OO Patterns =
= RBP/OO Interactions =


:It would be good if OO programs could interact with OO databases, but alas, relational databases have a 99% market share. This has led to many attempts to access them from OO languages. Design patterns for doing this have been developed, starting with [http://c2.com/cgi/wiki?CrossingChasms Crossing Chasms] and extending to [http://en.wikipedia.org/wiki/ActiveRecord_%28Rails%29 Rails' ActiveRecord] [http://wiki.rubyonrails.org/rails/pages/ActiveRecord][http://ar.rubyonrails.com/]. Here, we investigate the various approaches for marrying OO programs to relational databases, comparing them in terms of ease of programming, robustness, and efficiency.
:It would be good if OO programs could interact with OO databases, but alas, relational databases have a 99% market share. This has led to many attempts to access them from OO languages. Design patterns and implementations for doing this have been developed, starting with [http://c2.com/cgi/wiki?CrossingChasms Crossing Chasms] and extending to [http://en.wikipedia.org/wiki/ActiveRecord_(Rails) Rails' ActiveRecord] [http://wiki.rubyonrails.org/rails/pages/ActiveRecord][http://ar.rubyonrails.com/]. Here, we investigate the various approaches for marrying OO programs to relational databases, comparing them in terms of ease of programming, robustness, and efficiency.


= Introduction =
= Introduction =
Line 9: Line 9:
One of the primary problems that object-relational mapping (ORM) attempts to solve is that of [http://www.service-architecture.com/object-oriented-databases/articles/transparent_persistence.html transparent object persistence], which allows an object to outlive the process that created it. The state of an object can be stored to disk, and an object with the same state can be re-created in the future. This object data is typically internally stored in a relational database using SQL.
One of the primary problems that object-relational mapping (ORM) attempts to solve is that of [http://www.service-architecture.com/object-oriented-databases/articles/transparent_persistence.html transparent object persistence], which allows an object to outlive the process that created it. The state of an object can be stored to disk, and an object with the same state can be re-created in the future. This object data is typically internally stored in a relational database using SQL.


Unfortunately, relational databases lie at the core of any modern Enterprise application, and such tabular representation of SQL data is fundamentally different than the network of objects used in object-oriented applications. ORM allows us to interact with business objects directly in an object-oriented domain model, instead of having to work with rows and columns at the programming level.  For an introduction to ORM and the surrounding issues, refer to [http://digitalcommons.macalester.edu/context/mathcs_honors/article/1006/type/native/viewcontent/].  
Unfortunately, relational databases lie at the core of any modern Enterprise application, and such tabular representation of SQL data is fundamentally different than the network of objects used in object-oriented applications. ORM allows us to interact with business objects directly in an object-oriented domain model, instead of having to work with rows and columns at the programming level.  For further introduction to ORM and the surrounding issues, refer to [http://www.agiledata.org/essays/mappingObjects.html][http://www.acmqueue.org/modules.php?name=Content&pa=printer_friendly&pid=538&page=1][http://digitalcommons.macalester.edu/context/mathcs_honors/article/1006/type/native/viewcontent/][http://www.rgoarchitects.com/Files/ormappin.pdf].


= Design Patterns =
= Design Patterns =


Design patterns provide the theoretical underpinnings for the object-relational tools that we use in practice today. One of the first design patterns exploring the bridge between object-oriented domains and relational domains is Crossing Chasms.
Design patterns provide the theoretical underpinnings for the object-relational tools that we use in practice today. One of the first design pattern languages exploring the bridge between object-oriented domains and relational domains is Crossing Chasms.


= Concentrate Examples =
# The ''Chasm'' in [http://www.ksc.com/articles/patternlanguage.htm Crossing Chasms] refers to the semantic and structural gap between data in an object oriented application and the representation and access of that data in a database. As a [http://en.wikipedia.org/wiki/Pattern_language Pattern Language], Crossing Chasms captures multiple proven methods for effectively integrating an object oriented application and a relational database. These solutions are grouped into two broad categories, [http://www.ksc.com/articles/staticpatterns.htm Static Patterns], useful in data modeling and database table design, and [http://www.ksc.com/articles/crossingchasms.htm Architectural Patterns], useful primarily for designing overall systems to be as effective as possible in the dynamic interaction between data and objects. Originally developed to integrate Smalltalk applications to relational databases, and rooted in the theory of Object Orientation from Jacobsen to Rumbaugh, the patterns have been abstracted and refined to capture the best practices of the experts in OO database integration, whatever the platform.  The architectural patterns include ''Four Layer Architecture'', ''Trim And Fit Client'', and ''Phase In Tiers'', which we summarize here.
# The [http://c2.com/cgi/wiki?FourLayerArchitecture Four Layer Architecture] improves upon the classical 'Model-View-Controller' application design by first recognizing the 'M-V-C' architecture, while an evolutionary gain, is inadequate to support every possible application environment, especially, the challenges associated with database integration and communication.  Three of the layers of the 'Four Layer Architecture' closely follow the 'Model', 'View', and 'Controller', with the addition of the 'Infrastructure' Layer to handle the database tables and communication links.
# Closely related to the Four Layer Architecture is the [http://c2.com/cgi/wiki?TrimAndFitClient Trim And Fit Client], the pattern describing how to partition the processing workload between the client and server.  The factors driving the decision of how much work the client should do are very dynamic, given Moore's Law and increases in efficiency for communication links as well as memory, yet the 'Trim And Fit Client' pattern provides a timeless solution.  Leveraging the architectural segments provided by the 'Four Layer Architecture', 'Trim And Fit Client' guides the designer to partition responsibility between the client and server at an arbitrary point either in the Application Models, or between the Application and Domain Model layers.  This scheme gives the designer the flexibility to allocate the workload in an optimal way for the particular constraints in place, without being forced into a design situation with an imbalanced workload.
# [http://c2.com/cgi-bin/wiki/wiki?PhaseInTiers Phase In Tiers] presents the best practices to engage when designing a constantly expanding data center. Although it may be difficult for the architect to estimate the capacity required for the system, and nearly impossible for the developers to continuously migrate new applications and data to the growing network of database, client applications, and servers, the 'Phase In Tiers' pattern helps the enterprise architect stay ahead of the challenge. Incorporating another pattern, the [http://c2.com/cgi/wiki?ThreeTierDistributionArchitecture Three-Tier Distribution Architecture], which provides justification for the now commonplace three-tiered enterprise application-platform split of client-server-database, 'Phase In Tiers' shows how to implement new designs in stages, over time, without breaking existing applications, or slowing the roll-out of new features.  The key idea of 'Phase In Tiers' is to build new functionality into clients, and execute staged migration of application layers from the client to servers.


To motivate our discussions, it is helpful to provide a practical, concentrate reference example of object-relational frameworks against traditional dynamic SQL approaches. One can think of these examples as the "Hello World" of DB interfacing.
= Concrete Examples =


== Dynamic SQL ==
To motivate our discussions, it is helpful to provide practical, concrete reference examples of object-relational frameworks against traditional dynamic SQL approaches. One can think of these examples as the "Hello World" of DB interfacing.
 
== Embedded (Ad-Hoc) SQL ==


The traditional approach uses simple strings and API calls to connect to databases and return their results. In this example in PHP, the title and date of the event with an ID greater than 5 is echoed:
The traditional approach uses simple strings and API calls to connect to databases and return their results. In this example in PHP, the title and date of the event with an ID greater than 5 is echoed:
Line 48: Line 53:
== Stored Procedures ==
== Stored Procedures ==


Stored procedures are mechanisms that encapsulate SQL queries and other SQL business logic within the database server itself. It has the advantage of decoupling the SQL syntax from the client application, but calls to these stored procedures are still not transparent from within the client application. An [http://www.sqlteam.com/article/stored-procedures-an-overview example of a SQL stored procedure] in Microsoft SQL Server is as follows:
Stored procedures are mechanisms that encapsulate SQL queries and other SQL business logic within the database server itself. It has the advantage of decoupling the SQL syntax from the client application, but calls to these stored procedures themselves are still not transparent from within the client application. An [http://www.sqlteam.com/article/stored-procedures-an-overview example of a SQL stored procedure] in Microsoft SQL Server is as follows:


   CREATE PROCEDURE spCaliforniaAuthors
   CREATE PROCEDURE spCaliforniaAuthors
Line 56: Line 61:
     ORDER BY zip
     ORDER BY zip


This stored procedure is kept on the database. The application can then call the stored procedure <code>spCaliforniaAuthors</code> directly, without concerning themselves with the lower level implementation details.
This stored procedure is kept on the database. The application can then call the stored procedure <tt>spCaliforniaAuthors</tt> directly, without concerning themselves with the lower level implementation details.


== The ORM Approach ==
== The ORM Approach ==
Line 75: Line 80:
== Microsoft LINQ ==
== Microsoft LINQ ==


http://msdn.microsoft.com/en-us/library/bb425822.aspx
Microsoft provides a hybrid approach to object-relational mapping, called LINQ, which [http://msdn.microsoft.com/en-us/library/bb425822.aspx interleaves database metadata with object-oriented programming] to allow for language-integrated querying of relational databases.
 
  [Table(Name="Customers")]
  public class Customer
  {
    [Column(IsPrimaryKey=true)]
    public string CustomerID;
    [Column]
    public string City;
  }
 
Unlike other ORM mechanics, which try to hide non-OO querying as much as possible, or use external library calls for direct SQL queries, LINQ to SQL provides a run time infrastructure for managing relational data as objects without losing the ability to query through SQL-like constructs:
 
  var q =
    from c in Customers
    where c.City == "London"
    select c;
 
  foreach (var cust in q)
    Console.WriteLine("id = {0}, City = {1}",
          cust.CustomerID, cust.City);


Using a hybrid paradigm such as Microsoft's LINQ:
These SQL-like constructs are integrated into the language itself, without the need for external library calls, and are available to non-database data structures as well, if they support the appropriate interfaces.


= Comparison =
= Comparison =


In this section, we discuss the advantages and disadvantages of using object-relational mapping techniques in application development.
With our examples in hand, we can now discuss the advantages and disadvantages of the various approaches to marrying object-oriented programming and relational databases.


== Ease of Programming ==
== Ease of Programming ==


ORM applications are difficult to initially configure. But once configured, development in ORM is quite straight-forward.
Those who come from a database background find embedded SQL easy to use because queries are written in a SQL language that they are already familiar with. The other advantage of embedded SQL is that it allows the application developers to exploit the properties and function of their particular database, at the expense of reducing portability of the application. From a debugging perspective, the database logic is directly embedded within the source code, which avoids the [http://wapedia.mobi/en/Yo-yo_problem yo-yo effect] between the application source and database source. [http://www.codinghorror.com/blog/archives/000117.html].
 
Stored procedures fare slightly better and have some benefits over embedded SQL. [http://articles.techrepublic.com.com/5100-10878_11-5766837.html] They decouple the database logic from the application logic, where database programmers can work on database logic and application programmers can independently work on application logic. In addition, stored procedures can be changed without having to recompile the application, leading to more modular implementations.
 
Still, the use of stored procedures is not without criticism. Stored Procedures are written in big iron database languages like PL/SQL or T-SQL, which tend to be archaic in functionality. Stored procedures cannot be debugged easily because they cannot be debugged in the same process as the application. Finally, stored procedures cannot pass objects, and instead must pass primitive data types back and forth, often resulting in passing back and forth an overly large number of parameters to these procedures to accomplish a task. [http://www.codinghorror.com/blog/archives/000117.html]
 
ORM frameworks have the advantage in that developers work with objects and the mapping tools that enable data persistence.
transparently. For most applications, this provides a natural object-oriented way to access relational databases. When queries are required that cannot easily be expressed in an object-oriented domain, these ORM tools typically provide standardized, proprietary SQL-like syntax, such as HSQL, thereby abstracting the details of the database itself.[http://espresso.cs.up.ac.za/publications/Espresso%20SAICSIT%20ODBMS%20Presentation%20v6.pdf]
 
While most ORM frameworks are easy to program from an application developer perspective, the initial configuration of such frameworks can be daunting due to the framework's generality and [http://www.hibernate.org/hib_docs/reference/en/html/session-configuration.html large number of configuration parameters]. Configuration of ORM is difficult enough that there exist meta code generation tools such as [http://www.hibernate.org/72.html XDoclet] and [http://boss.bekk.no/boss/middlegen/ Middlegen], which can be thought of as compilers in their own right.
 
Finally, let us look at [http://en.wikipedia.org/wiki/Language_Integrated_Query LINQ], a variation of ORM which adds querying capabilities to .NET 2.0 and provides operations similar to that of SQL. LINQ's major advantage is that it provides consistent domain modeling, while hiding the mundane code (LINQ-to-SQL) that often gets exposed either in configuration of ORM or in embedded SQL. [http://blogs.vertigo.com/personal/petar/Blog/archive/2008/01/04/why-we-notneed-to-use-linq.aspx] LINQ also provides one source and query language for multiple data stores, such as relational data, XML data, and other .NET objects [http://www.eleves.ens.fr/home/rossant/docs/linq.pdf], and it is integrated within the syntax of the language. With respect to ease of programming, LINQ is a clear winner, but its model of embedding database metadata and querying directly within application logic may be met with resistance by individuals who prefer separating database programmers from application developers. [http://reddevnews.com/features/article.aspx?editorialsid=707] LINQ is also specifically developed for use in Microsoft Visual Studio, and does not work in other languages or environments.


== Robustness ==
== Robustness ==


Using an ORM layer provides database independence.
While embedded SQL is one of the easiest ways to connect to a database, it is also one of the least robust. Embedded SQL is commonly subject to [http://en.wikipedia.org/wiki/SQL_injection SQL injection] attacks, though these attacks can be mitigated by libraries with careful programming. Embedded SQL also reduces the elegance of code by increasing coupling between the database and the application itself. In the PHP sample code provided, for instance, the code is tied to a MySQL database using the <tt>mysql_connect</tt> function.
 
Since the SQL code is interspersed between lines of non-SQL code, it can be difficult to maintain and modify if the SQL code needs to be modified. Indeed, there is significant impact to the understandability, testability, adaptability, and other quality aspects of the overall system.[http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/scam/2007/2880/00/2880toc.xml&DOI=10.1109/SCAM.2007.23] Despite these issues, embedded SQL remains a popular choice because it works relatively well for many smaller business applications that mainly do basic [http://en.wikipedia.org/wiki/Create,_read,_update_and_delete CRUD] operations, even in languages like Java that support object-oriented design.
 
Stored procedures are certainly more robust than embedded SQL. Since SQL code stays in the database, and application code stays in the application, it is easier to maintain. The [http://msdn.microsoft.com/en-us/library/ms978510.aspx .NET Data Access Architecture Guide] specifies other reasons that stored procedures are more robust:
 
# Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
# Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.
# Stored procedures add an extra level of abstraction from the underlying database schema. The client of the stored procedure is isolated from the implementation details of the stored procedure and from the underlying schema.
 
Finally, we turn to ORM and LINQ. Both of these mechanisms can be considered robust because they [http://www.hibernate.org/hib_docs/reference/en/html/session-configuration.html abstract entirely the database connection details] and handling from the application developer. One can change databases from say, Microsoft SQL Server to PostgreSQL, with no change in application logic other than the editing of the ORM database configuration files. However, this increased robustness is a trade off and requires sacrifices in ease of programming with respect to smaller projects and potential sacrifices in performance and efficiency, as will be discussed in the following section.


== Efficiency ==
== Efficiency ==


Any good studies on this??
The efficiency of the different mechanisms for database access are depend on the application, implementation, and a variety of other factors. As such, we describe the differences in efficiency only qualitatively rather than quantitatively.
 
At first glance, it may appear that embedded SQL is highly efficient. After all, embedded SQL queries are low-level strings passed directly to the SQL database, and SQL queries can be custom written to take advantage of special capabilities of the particular database. Similarly, others believe that the overhead of processing stored procedures results in a performance penalty. The real answer is not as cut and dried, especially when comparing against stored procedures. Here's why: [http://codebetter.com/blogs/karlseguin/archive/2008/01/02/foundations-of-programming-part-6-nhibernate.aspx]


= Paradigm Mismatches =
# In many cases, you can get better performance by looping and filtering data within the database than at the Data Access Layer. This is because databases are intrinsically designed to do this, while application developers have to write their own code.
# Stored procedures can be used to batch common work together or retrieve multiple sets of data. This batching of data reduces network traffic and and consolidates work.
# Many databases can also take advantage of execution plans, which allow them to cache stored procedures versus embedded SQL where each execution would have to be recalculated for each request.
# Databases tend to optimize better the closer they are to the bare metal. The easiest way to get performance out of your database is to do everything you can to take advantage of the platform you are running on, which means running queries as close to the database as possible.


ORM has experienced criticism, including the notion ORM is [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx The Vietnam of Computer Science].  '''There is so much here I don't know what to incorporate - please suggest.'''
ORM approaches have the opposite default criticism since many assume that their efficiency is lower than stored procedures or embedded SQL because of their overhead in performing O/R mapping. Again, the answer is not as clear cut. Let us examine one of more popular ORM frameworks, Hibernate, for evidence: Hibernate implements an extremely high-concurrency architecture with no resource-contention issues (apart from the obvious - contention for access to the database). This architecture scales extremely well as concurrency increases in a cluster or on a single machine. [http://www.hibernate.org/15.html]


Other problems that ORM attempts to solve are that of paradigm mismatches. These are outlined as follows: [http://www.manning.com/bauer/]
LINQ users can achieve optimum performance by leveraging the capabilities of their platform [http://www.singingeels.com/Articles/Improving_Performance_With_LINQ.aspx].  Knowing how to utilize the LINQ data context in conjunction with appropriately crafted queries permits the implementation to avoid unnecessary database activity, such as eliminating write-backs when no data has changed.


* Problems relating to subtypes. Object-oriented languages implement inheritance through superclasses and subclasses. SQL tables, in contrast, do not generally implement any sort of table inheritance, and they additionally lack an obvious way to implement polymorphism. Mapping class inheritance from the object domain to the relational domain comprises one of the many goals of ORM.
For those interested in competitive performance challenges, the [http://www.polepos.org/ PolePosition] benchmark suite provides rigorous comparison of ORM application-implementation pairs, with some test data archived at their site for specific implementations running well-defined test cases.
* Problems relating to associations. In domain models, associations represent the relationship between entities. Object-oriented languages represent associations using object references, but in relational databases, an association is represented through foreign keys. Object-relational mapping bridges these two concepts.
* Problems relating to data navigation. There is also a key difference in the way data is accessed in object-oriented languages and in relational databases. In OOP, one walks the object network, navigating from one object to another. This is not an efficient way to retrieve data from a SQL database, where the goal is to minimize the number of SQL queries. Efficient access in SQL relies on set operations, like joining multiple tables of interest. This mismatch between the way objects are accessed in OOP versus a relational database is the single most common source of performance problems.


= Implementations =
= Implementations =


Languages and environments as diverse as .Net and PHP support ORM [http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software]. We will focus our comparison on specific implementations of ORM in Java, Ruby on Rails, Microsoft .NET, PHP, ASP Classic, and JDBC.
Languages and environments as diverse as .Net and PHP support ORM [http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software].
 
We will focus on specific implementations in Java, Ruby on Rails, Microsoft .NET, PHP, ASP Classic, and JDBC.


== ORM in Java ==
== ORM in Java ==
Line 113: Line 164:
In recent years, Java has experienced a paradigm shift from complex heavy-weight frameworks such as [http://java.sun.com/products/ejb/ Enterprise Java Beans] to more light-weight agile frameworks that rely instead of simple Plain Old Java Objects ([http://en.wikipedia.org/wiki/POJO POJOs]). This in turn, has increased the popularity of ORM for Java developers.
In recent years, Java has experienced a paradigm shift from complex heavy-weight frameworks such as [http://java.sun.com/products/ejb/ Enterprise Java Beans] to more light-weight agile frameworks that rely instead of simple Plain Old Java Objects ([http://en.wikipedia.org/wiki/POJO POJOs]). This in turn, has increased the popularity of ORM for Java developers.


Indeed, Object-relational mapping is especially popular in the Java community, compared. for example to .NET developers. [http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/mags/co/&toc=comp/mags/co/2005/01/r1toc.xml&DOI=10.1109/MC.2005.22] Although a [http://java-source.net/open-source/persistence plethora of ORM frameworks exist] for Java, among the most popular and widespread ORM layers today include Sun's [http://java.sun.com/jdo/ JDO] and the somewhat entrenched open source O/R mapping framework, [http://www.hibernate.org Hibernate].
Indeed, Object-relational mapping is especially popular in the Java community, compared, for example to .NET developers. [http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/mags/co/&toc=comp/mags/co/2005/01/r1toc.xml&DOI=10.1109/MC.2005.22] Although a [http://java-source.net/open-source/persistence plethora of ORM frameworks exist] for Java, among the most popular and widespread ORM layers today include Sun's [http://java.sun.com/jdo/ JDO] and the somewhat entrenched open source O/R mapping framework, [http://www.hibernate.org Hibernate].


We begin with Java Data Objects (JDO), which by itself is not a framework, but a specification. The API is a standard interface-based Java model abstraction of persistence, developed under the auspices of the [http://jcp.org/ Java Community Process]. Frameworks like [http://db.apache.org/jdo/ Apache JDO] then implement this specification. JDO aims to provide implementations for not only relational databases, but also object databases, and file systems.
We begin with Java Data Objects (JDO), which by itself is not a framework, but a specification. The API is a standard interface-based Java model abstraction of persistence, developed under the auspices of the [http://jcp.org/ Java Community Process]. Frameworks like [http://db.apache.org/jdo/ Apache JDO] then implement this specification. JDO aims to provide implementations for not only relational databases, but also object databases, and file systems.


Hibernate is another ORM implementation, and though it is open source, it is often considered "proprietary" because it does not directly implement the JDO specification or Java Community Process specifications. Still, Hibernate's momentum has resulted in it becoming a de facto standard in the Java industry, furthered by frameworks such as [http://www.springframework.org/ Spring] that use it as a building block.
Hibernate is another ORM implementation, and though it is open source, it is often considered "proprietary" because it does not directly implement the JDO specification or Java Community Process specifications. Still, Hibernate's momentum has resulted in it becoming a de facto standard in the Java industry, furthered by frameworks such as [http://www.springframework.org/ Spring] that use it as a building block.
http://www.kuro5hin.org/story/2006/3/11/1001/81803


== ORM in Ruby on Rails ==
== ORM in Ruby on Rails ==


http://wiki.rubyonrails.org/rails/pages/ActiveRecord
Ruby on Rails is an interesting example of the use of ORM, which provides the [http://en.wikipedia.org/wiki/Active_record_pattern ActiveRecord] pattern implemented as [http://wiki.rubyonrails.org/rails/pages/ActiveRecord ActiveRecord in Rails].  In addition to mapping objects to table rows, and providing persistence in the traditional ORM sense, Ruby on Rails has language-level support for ORM with ActiveRecord.  Rails allows database-oriented web services to be configured and deployed with a minumum of development, featuring the classic ''Model'', ''View'', ''Controller'' (M-V-C) architecture.  In this context, ''Rails ActiveRecord'' implements the Model layer of the M-V-C architecture.


== ORM in Microsoft .NET ==
== ORM in Microsoft .NET ==
Although ORM has achieved significant penetration of the Java world, there are now a variety of [http://en.wikipedia.org/wiki/Category:.Net_Object-relational_mapping_tools .NET implementations enabling ORM.]
An obvious choice for those already aware of [http://en.wikipedia.org/wiki/Hibernate_(Java) Hibernate for Java], is the .NET port of ''Hibernate'' known as [http://www.hibernate.org/343.html NHibernate]. ''NHibernate'' supports a number of database platforms [http://www.hibernate.org/361.html] and permits developers to work in native .NET [http://www.hibernate.org/343.html], integrating Plain Old CLR Objects ([http://en.wikipedia.org/wiki/POCO POCOs]) data with the underlying database.  For a practical getting-started developer guide, read [http://www.developer.com/net/asp/article.php/3709346 Using NHibernate as an ORM Solution for .NET].


== LINQ in Microsoft .NET ==
== LINQ in Microsoft .NET ==


http://msdn.microsoft.com/en-us/netframework/aa904594.aspx
As described previously, Microsoft's [http://msdn.microsoft.com/en-us/netframework/aa904594.aspx LINQ] provides ''language-integrated query''[http://msdn.microsoft.com/en-us/library/bb308959.aspx#linqoverview_topic1] to the .NET environment.  Supporting C# and Visual Basic, ''LINQ'' provides separate, optimized ''providers'' for access from the programming language level to relational data objects [http://msdn.microsoft.com/en-us/library/bb425822.aspx], XML documents[http://msdn.microsoft.com/en-us/library/bb308960.aspx], and SQL databases [http://msdn.microsoft.com/en-us/library/bb425822.aspx].


== Dynamic SQL in PHP, ASP Classic, and JDBC ==
== Dynamic SQL in PHP, ASP Classic, and JDBC ==
Line 138: Line 189:
= Summary =
= Summary =


This article has introduced ORM from the user's perspective, motivating the discussion with the design patterns relevant to integrating a relational database to an object oriented application. We have learned the basic advantages of ORM and the techniques by which it is employed. In addition, we have seen examples of ORM usage in several implementations, and we have explored the various approaches to ORM in languages and environments as diverse as Java / Hibernate, and Microsoft LINQ.  In addition, we have compared the Efficiency and Programming Ease of the different implementations.  ORM has become ubiquitous as object technology development marches forward in concert with advances in networks and distributed architectures.


= Links =
= External Links =
http://www.google.com/search?hl=en&q=ORM&btnG=Google+Search
 
http://en.wikipedia.org/wiki/Object-relational_mapping
 
http://en.wikipedia.org/wiki/Object-relational_database
 
http://en.wikipedia.org/wiki/List_of_object-relational_database_management_systems
 
http://www.aspfree.com/c/a/Database/Introduction-to-RDBMS-OODBMS-and-ORDBMS/
 
http://developers.slashdot.org/article.pl?sid=03/09/23/2016224&threshold=4&mode=nested
 
http://en.wikipedia.org/wiki/Hibernate_%28Java%29
 
http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software
 
http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
 
http://www.google.com/search?hl=en&q=RDB+OO+patterns+faq&btnG=Search
 
http://ootips.org/persistent-objects.html
 
http://dtemplatelib.sourceforge.net/
 
http://soci.sourceforge.net/
 
http://trac.butterfat.net/public/StactiveRecord
 
http://www.metro-design-dev.com/modeler_portal.htm
 
http://www.ksc.com/articles/patternlanguage.htm
 
http://en.wikipedia.org/wiki/ActiveRecord_%28Rails%29
 
http://www.agiledata.org/essays/mappingObjects.html
 
http://www.visualbuilder.com/java/hibernate/tutorial/
 
http://www.hibernate.org/hib_docs/reference/en/html/index.html
 
http://www.hibernate.org/hib_docs/v3/api/index.html
 
http://www.hibernate.org/hib_docs/v3/api/org/hibernate/SessionFactory.html
 
http://www.hibernate.org/hib_docs/v3/api/org/hibernate/Session.html
 
http://www.service-architecture.com/object-relational-mapping/articles/transparent_persistence.html
 
http://www.service-architecture.com/object-oriented-databases/articles/odbms_faq.html
 
http://www.google.com/search?hl=en&q=object+oriented+database+design+pattern+%28faq+OR+tutorial%29&btnG=Google+Search
 
http://portal.acm.org/citation.cfm?id=253810
 
http://www.pearsonhighered.com/educator/academic/course/0,3119,604655,00.html
 
http://www.sigplan.org/oopsla/oopsla98/ap/tutorial/tovervw.htm
 
http://www.cmcrossroads.com/bradapp/links/oo-links.html
 
http://www.edcomp.com/results/Relational+and+Object+oriented+Database+Management+System+.html
 
http://cbbrowne.com/info/rdbms.html
 
http://en.wikipedia.org/wiki/Object-oriented_programming
 
http://www.sei.cmu.edu/str/descriptions/oodatabase_body.html
 
http://www.service-architecture.com/object-oriented-databases//articles/object-relational_mapping.html
 
http://www.service-architecture.com/object-oriented-databases/
 
http://www.arrakis.es/~devis/oo.html
 
http://www.google.com/search?hl=en&q=object+oriented+database+site%3Anist.gov&btnG=Google+Search
 
http://csrc.nist.gov/nissc/1996/papers/NISSC96/paper072_073_074/SCO_.PDF
 
http://madgeek.com/Articles/ORMapping/EN/mapping.htm


http://en.wikipedia.org/wiki/Object_database
This  section summarizes important links from the article for those wishing to quickly access the most practical ''ORM'' resources.


http://www.polepos.org/
* [http://en.wikipedia.org/wiki/Object-relational_mapping Wikipedia on ORM]
* [http://en.wikipedia.org/wiki/ActiveRecord_(Rails) Wikipedia on Active Record in Rails]
* [http://www.agiledata.org/essays/mappingObjects.html Object Mapping Tutorial]
* [http://www.acmqueue.org/modules.php?name=Content&pa=printer_friendly&pid=538&page=1 The ACM on ORM]
* [http://digitalcommons.macalester.edu/context/mathcs_honors/article/1006/type/native/viewcontent/ Object Relational Mapping Tutorial]
* [http://www.rgoarchitects.com/Files/ormappin.pdf When / Why to use ORM]
* [http://www.sqlteam.com/article/stored-procedures-an-overview Stored Procedures Overview]
* [http://www.hibernate.org Hibernate]
* [http://msdn.microsoft.com/en-us/netframework/aa904594.aspx Microsoft on LINQ]
* [http://en.wikipedia.org/wiki/Language_Integrated_Query Wikipedia on LINQ]
* [http://www.hibernate.org/343.html NHibernate]

Latest revision as of 01:10, 31 July 2008

RBP/OO Interactions

It would be good if OO programs could interact with OO databases, but alas, relational databases have a 99% market share. This has led to many attempts to access them from OO languages. Design patterns and implementations for doing this have been developed, starting with Crossing Chasms and extending to Rails' ActiveRecord [1][2]. Here, we investigate the various approaches for marrying OO programs to relational databases, comparing them in terms of ease of programming, robustness, and efficiency.

Introduction

This article explores object-relation mapping (ORM), a programming technique that bridges object-oriented languages against relational databases, and compares them against more traditional approaches to database programming such as stored procedures and dynamic SQL. In the process, we examine basic design patterns for bridging this gap, and evaluate several popular ORM frameworks found in popular program languages in the process.

One of the primary problems that object-relational mapping (ORM) attempts to solve is that of transparent object persistence, which allows an object to outlive the process that created it. The state of an object can be stored to disk, and an object with the same state can be re-created in the future. This object data is typically internally stored in a relational database using SQL.

Unfortunately, relational databases lie at the core of any modern Enterprise application, and such tabular representation of SQL data is fundamentally different than the network of objects used in object-oriented applications. ORM allows us to interact with business objects directly in an object-oriented domain model, instead of having to work with rows and columns at the programming level. For further introduction to ORM and the surrounding issues, refer to [3][4][5][6].

Design Patterns

Design patterns provide the theoretical underpinnings for the object-relational tools that we use in practice today. One of the first design pattern languages exploring the bridge between object-oriented domains and relational domains is Crossing Chasms.

  1. The Chasm in Crossing Chasms refers to the semantic and structural gap between data in an object oriented application and the representation and access of that data in a database. As a Pattern Language, Crossing Chasms captures multiple proven methods for effectively integrating an object oriented application and a relational database. These solutions are grouped into two broad categories, Static Patterns, useful in data modeling and database table design, and Architectural Patterns, useful primarily for designing overall systems to be as effective as possible in the dynamic interaction between data and objects. Originally developed to integrate Smalltalk applications to relational databases, and rooted in the theory of Object Orientation from Jacobsen to Rumbaugh, the patterns have been abstracted and refined to capture the best practices of the experts in OO database integration, whatever the platform. The architectural patterns include Four Layer Architecture, Trim And Fit Client, and Phase In Tiers, which we summarize here.
  2. The Four Layer Architecture improves upon the classical 'Model-View-Controller' application design by first recognizing the 'M-V-C' architecture, while an evolutionary gain, is inadequate to support every possible application environment, especially, the challenges associated with database integration and communication. Three of the layers of the 'Four Layer Architecture' closely follow the 'Model', 'View', and 'Controller', with the addition of the 'Infrastructure' Layer to handle the database tables and communication links.
  3. Closely related to the Four Layer Architecture is the Trim And Fit Client, the pattern describing how to partition the processing workload between the client and server. The factors driving the decision of how much work the client should do are very dynamic, given Moore's Law and increases in efficiency for communication links as well as memory, yet the 'Trim And Fit Client' pattern provides a timeless solution. Leveraging the architectural segments provided by the 'Four Layer Architecture', 'Trim And Fit Client' guides the designer to partition responsibility between the client and server at an arbitrary point either in the Application Models, or between the Application and Domain Model layers. This scheme gives the designer the flexibility to allocate the workload in an optimal way for the particular constraints in place, without being forced into a design situation with an imbalanced workload.
  4. Phase In Tiers presents the best practices to engage when designing a constantly expanding data center. Although it may be difficult for the architect to estimate the capacity required for the system, and nearly impossible for the developers to continuously migrate new applications and data to the growing network of database, client applications, and servers, the 'Phase In Tiers' pattern helps the enterprise architect stay ahead of the challenge. Incorporating another pattern, the Three-Tier Distribution Architecture, which provides justification for the now commonplace three-tiered enterprise application-platform split of client-server-database, 'Phase In Tiers' shows how to implement new designs in stages, over time, without breaking existing applications, or slowing the roll-out of new features. The key idea of 'Phase In Tiers' is to build new functionality into clients, and execute staged migration of application layers from the client to servers.

Concrete Examples

To motivate our discussions, it is helpful to provide practical, concrete reference examples of object-relational frameworks against traditional dynamic SQL approaches. One can think of these examples as the "Hello World" of DB interfacing.

Embedded (Ad-Hoc) SQL

The traditional approach uses simple strings and API calls to connect to databases and return their results. In this example in PHP, the title and date of the event with an ID greater than 5 is echoed:

 <?php
 $link = mysql_connect('localhost', 'mysql_user',
     'mysql_password');
 if (!$link) {
     die('Could not connect: ' . mysql_error());
 }
 echo 'Connected successfully';
 mysql_close($link);
 
 $sql = "SELECT title, date
       FROM   events
       WHERE  id > 5";
 
 $result = mysql_query($sql);
 
 while ($row = mysql_fetch_assoc($result)) {
   echo $row["title"];
   echo $row["date"];
 }
 ?>

Stored Procedures

Stored procedures are mechanisms that encapsulate SQL queries and other SQL business logic within the database server itself. It has the advantage of decoupling the SQL syntax from the client application, but calls to these stored procedures themselves are still not transparent from within the client application. An example of a SQL stored procedure in Microsoft SQL Server is as follows:

 CREATE PROCEDURE spCaliforniaAuthors
 AS
   SELECT * FROM authors
   WHERE state = 'CA'
   ORDER BY zip

This stored procedure is kept on the database. The application can then call the stored procedure spCaliforniaAuthors directly, without concerning themselves with the lower level implementation details.

The ORM Approach

Using object-relational mapping, the internal connection details are hidden, usually in external configuration files. The application developer need not have any knowledge of SQL programming, and can manipulate and access database objects in an object-oriented domain. In this example using Java and Hibernate, the programmer created a new event and then saves it to a database:

 Session session = HibernateUtil.getSessionFactory().
     getCurrentSession();
 session.beginTransaction();
 
 Event theEvent = new Event();
 theEvent.setTitle(title);
 theEvent.setDate(date);
 session.save(theEvent);
 
 session. getTransaction().commit();]

Microsoft LINQ

Microsoft provides a hybrid approach to object-relational mapping, called LINQ, which interleaves database metadata with object-oriented programming to allow for language-integrated querying of relational databases.

 [Table(Name="Customers")]
 public class Customer
 {
    [Column(IsPrimaryKey=true)]
    public string CustomerID;
    [Column]
    public string City;
 }

Unlike other ORM mechanics, which try to hide non-OO querying as much as possible, or use external library calls for direct SQL queries, LINQ to SQL provides a run time infrastructure for managing relational data as objects without losing the ability to query through SQL-like constructs:

 var q =
    from c in Customers
    where c.City == "London"
    select c;
 
 foreach (var cust in q)
    Console.WriteLine("id = {0}, City = {1}", 
          cust.CustomerID, cust.City);

These SQL-like constructs are integrated into the language itself, without the need for external library calls, and are available to non-database data structures as well, if they support the appropriate interfaces.

Comparison

With our examples in hand, we can now discuss the advantages and disadvantages of the various approaches to marrying object-oriented programming and relational databases.

Ease of Programming

Those who come from a database background find embedded SQL easy to use because queries are written in a SQL language that they are already familiar with. The other advantage of embedded SQL is that it allows the application developers to exploit the properties and function of their particular database, at the expense of reducing portability of the application. From a debugging perspective, the database logic is directly embedded within the source code, which avoids the yo-yo effect between the application source and database source. [7].

Stored procedures fare slightly better and have some benefits over embedded SQL. [8] They decouple the database logic from the application logic, where database programmers can work on database logic and application programmers can independently work on application logic. In addition, stored procedures can be changed without having to recompile the application, leading to more modular implementations.

Still, the use of stored procedures is not without criticism. Stored Procedures are written in big iron database languages like PL/SQL or T-SQL, which tend to be archaic in functionality. Stored procedures cannot be debugged easily because they cannot be debugged in the same process as the application. Finally, stored procedures cannot pass objects, and instead must pass primitive data types back and forth, often resulting in passing back and forth an overly large number of parameters to these procedures to accomplish a task. [9]

ORM frameworks have the advantage in that developers work with objects and the mapping tools that enable data persistence. transparently. For most applications, this provides a natural object-oriented way to access relational databases. When queries are required that cannot easily be expressed in an object-oriented domain, these ORM tools typically provide standardized, proprietary SQL-like syntax, such as HSQL, thereby abstracting the details of the database itself.[10]

While most ORM frameworks are easy to program from an application developer perspective, the initial configuration of such frameworks can be daunting due to the framework's generality and large number of configuration parameters. Configuration of ORM is difficult enough that there exist meta code generation tools such as XDoclet and Middlegen, which can be thought of as compilers in their own right.

Finally, let us look at LINQ, a variation of ORM which adds querying capabilities to .NET 2.0 and provides operations similar to that of SQL. LINQ's major advantage is that it provides consistent domain modeling, while hiding the mundane code (LINQ-to-SQL) that often gets exposed either in configuration of ORM or in embedded SQL. [11] LINQ also provides one source and query language for multiple data stores, such as relational data, XML data, and other .NET objects [12], and it is integrated within the syntax of the language. With respect to ease of programming, LINQ is a clear winner, but its model of embedding database metadata and querying directly within application logic may be met with resistance by individuals who prefer separating database programmers from application developers. [13] LINQ is also specifically developed for use in Microsoft Visual Studio, and does not work in other languages or environments.

Robustness

While embedded SQL is one of the easiest ways to connect to a database, it is also one of the least robust. Embedded SQL is commonly subject to SQL injection attacks, though these attacks can be mitigated by libraries with careful programming. Embedded SQL also reduces the elegance of code by increasing coupling between the database and the application itself. In the PHP sample code provided, for instance, the code is tied to a MySQL database using the mysql_connect function.

Since the SQL code is interspersed between lines of non-SQL code, it can be difficult to maintain and modify if the SQL code needs to be modified. Indeed, there is significant impact to the understandability, testability, adaptability, and other quality aspects of the overall system.[14] Despite these issues, embedded SQL remains a popular choice because it works relatively well for many smaller business applications that mainly do basic CRUD operations, even in languages like Java that support object-oriented design.

Stored procedures are certainly more robust than embedded SQL. Since SQL code stays in the database, and application code stays in the application, it is easier to maintain. The .NET Data Access Architecture Guide specifies other reasons that stored procedures are more robust:

  1. Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
  2. Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.
  3. Stored procedures add an extra level of abstraction from the underlying database schema. The client of the stored procedure is isolated from the implementation details of the stored procedure and from the underlying schema.

Finally, we turn to ORM and LINQ. Both of these mechanisms can be considered robust because they abstract entirely the database connection details and handling from the application developer. One can change databases from say, Microsoft SQL Server to PostgreSQL, with no change in application logic other than the editing of the ORM database configuration files. However, this increased robustness is a trade off and requires sacrifices in ease of programming with respect to smaller projects and potential sacrifices in performance and efficiency, as will be discussed in the following section.

Efficiency

The efficiency of the different mechanisms for database access are depend on the application, implementation, and a variety of other factors. As such, we describe the differences in efficiency only qualitatively rather than quantitatively.

At first glance, it may appear that embedded SQL is highly efficient. After all, embedded SQL queries are low-level strings passed directly to the SQL database, and SQL queries can be custom written to take advantage of special capabilities of the particular database. Similarly, others believe that the overhead of processing stored procedures results in a performance penalty. The real answer is not as cut and dried, especially when comparing against stored procedures. Here's why: [15]

  1. In many cases, you can get better performance by looping and filtering data within the database than at the Data Access Layer. This is because databases are intrinsically designed to do this, while application developers have to write their own code.
  2. Stored procedures can be used to batch common work together or retrieve multiple sets of data. This batching of data reduces network traffic and and consolidates work.
  3. Many databases can also take advantage of execution plans, which allow them to cache stored procedures versus embedded SQL where each execution would have to be recalculated for each request.
  4. Databases tend to optimize better the closer they are to the bare metal. The easiest way to get performance out of your database is to do everything you can to take advantage of the platform you are running on, which means running queries as close to the database as possible.

ORM approaches have the opposite default criticism since many assume that their efficiency is lower than stored procedures or embedded SQL because of their overhead in performing O/R mapping. Again, the answer is not as clear cut. Let us examine one of more popular ORM frameworks, Hibernate, for evidence: Hibernate implements an extremely high-concurrency architecture with no resource-contention issues (apart from the obvious - contention for access to the database). This architecture scales extremely well as concurrency increases in a cluster or on a single machine. [16]

LINQ users can achieve optimum performance by leveraging the capabilities of their platform [17]. Knowing how to utilize the LINQ data context in conjunction with appropriately crafted queries permits the implementation to avoid unnecessary database activity, such as eliminating write-backs when no data has changed.

For those interested in competitive performance challenges, the PolePosition benchmark suite provides rigorous comparison of ORM application-implementation pairs, with some test data archived at their site for specific implementations running well-defined test cases.

Implementations

Languages and environments as diverse as .Net and PHP support ORM [18].

We will focus on specific implementations in Java, Ruby on Rails, Microsoft .NET, PHP, ASP Classic, and JDBC.

ORM in Java

In recent years, Java has experienced a paradigm shift from complex heavy-weight frameworks such as Enterprise Java Beans to more light-weight agile frameworks that rely instead of simple Plain Old Java Objects (POJOs). This in turn, has increased the popularity of ORM for Java developers.

Indeed, Object-relational mapping is especially popular in the Java community, compared, for example to .NET developers. [19] Although a plethora of ORM frameworks exist for Java, among the most popular and widespread ORM layers today include Sun's JDO and the somewhat entrenched open source O/R mapping framework, Hibernate.

We begin with Java Data Objects (JDO), which by itself is not a framework, but a specification. The API is a standard interface-based Java model abstraction of persistence, developed under the auspices of the Java Community Process. Frameworks like Apache JDO then implement this specification. JDO aims to provide implementations for not only relational databases, but also object databases, and file systems.

Hibernate is another ORM implementation, and though it is open source, it is often considered "proprietary" because it does not directly implement the JDO specification or Java Community Process specifications. Still, Hibernate's momentum has resulted in it becoming a de facto standard in the Java industry, furthered by frameworks such as Spring that use it as a building block.

ORM in Ruby on Rails

Ruby on Rails is an interesting example of the use of ORM, which provides the ActiveRecord pattern implemented as ActiveRecord in Rails. In addition to mapping objects to table rows, and providing persistence in the traditional ORM sense, Ruby on Rails has language-level support for ORM with ActiveRecord. Rails allows database-oriented web services to be configured and deployed with a minumum of development, featuring the classic Model, View, Controller (M-V-C) architecture. In this context, Rails ActiveRecord implements the Model layer of the M-V-C architecture.

ORM in Microsoft .NET

Although ORM has achieved significant penetration of the Java world, there are now a variety of .NET implementations enabling ORM.

An obvious choice for those already aware of Hibernate for Java, is the .NET port of Hibernate known as NHibernate. NHibernate supports a number of database platforms [20] and permits developers to work in native .NET [21], integrating Plain Old CLR Objects (POCOs) data with the underlying database. For a practical getting-started developer guide, read Using NHibernate as an ORM Solution for .NET.

LINQ in Microsoft .NET

As described previously, Microsoft's LINQ provides language-integrated query[22] to the .NET environment. Supporting C# and Visual Basic, LINQ provides separate, optimized providers for access from the programming language level to relational data objects [23], XML documents[24], and SQL databases [25].

Dynamic SQL in PHP, ASP Classic, and JDBC

Despite the availability of object-relational libraries, dynamic SQL continues to be a popular development mechanism for interfacing with databases. Dynamic SQL is not an object-oriented methodology, but rather a "bare metal" programming approach where SQL strings are directly constructed through concatenation or other low-level mechanisms and then directly passed to the database. The results of such queries are themselves lower level objects like record sets or hash tables.

Summary

This article has introduced ORM from the user's perspective, motivating the discussion with the design patterns relevant to integrating a relational database to an object oriented application. We have learned the basic advantages of ORM and the techniques by which it is employed. In addition, we have seen examples of ORM usage in several implementations, and we have explored the various approaches to ORM in languages and environments as diverse as Java / Hibernate, and Microsoft LINQ. In addition, we have compared the Efficiency and Programming Ease of the different implementations. ORM has become ubiquitous as object technology development marches forward in concert with advances in networks and distributed architectures.

External Links

This section summarizes important links from the article for those wishing to quickly access the most practical ORM resources.