CSC/ECE 517 Summer 2008/wiki3 1 th

From Expertiza_Wiki
Jump to navigation Jump to search

RBP/OO Interactions

It would be good if OO programs could interact with OO databases, but alas, relational databases have a 99% market share. This has led to many attempts to access them from OO languages. Design patterns and implementations for doing this have been developed, starting with Crossing Chasms and extending to Rails' ActiveRecord [1][2]. Here, we investigate the various approaches for marrying OO programs to relational databases, comparing them in terms of ease of programming, robustness, and efficiency.

Introduction

This article explores object-relation mapping (ORM), a programming technique that bridges object-oriented languages against relational databases, and compares them against more traditional approaches to database programming such as stored procedures and dynamic SQL. In the process, we examine basic design patterns for bridging this gap, and evaluate several popular ORM frameworks found in popular program languages in the process.

One of the primary problems that object-relational mapping (ORM) attempts to solve is that of transparent object persistence, which allows an object to outlive the process that created it. The state of an object can be stored to disk, and an object with the same state can be re-created in the future. This object data is typically internally stored in a relational database using SQL.

Unfortunately, relational databases lie at the core of any modern Enterprise application, and such tabular representation of SQL data is fundamentally different than the network of objects used in object-oriented applications. ORM allows us to interact with business objects directly in an object-oriented domain model, instead of having to work with rows and columns at the programming level. For further introduction to ORM and the surrounding issues, refer to [3][4][5][6].

Design Patterns

Design patterns provide the theoretical underpinnings for the object-relational tools that we use in practice today. One of the first design pattern languages exploring the bridge between object-oriented domains and relational domains is Crossing Chasms.

  1. The Chasm in Crossing Chasms refers to the semantic and structural gap between data in an object oriented application and the representation and access of that data in a database. As a Pattern Language, Crossing Chasms captures multiple proven methods for effectively integrating an object oriented application and a relational database. These solutions are grouped into two broad categories, Static Patterns, useful in data modeling and database table design, and Architectural Patterns, useful primarily for designing overall systems to be as effective as possible in the dynamic interaction between data and objects. Originally developed to integrate Smalltalk applications to relational databases, and rooted in the theory of Object Orientation from Jacobsen to Rumbaugh, the patterns have been abstracted and refined to capture the best practices of the experts in OO database integration, whatever the platform. The architectural patterns include Four Layer Architecture, Trim And Fit Client, and Phase In Tiers, which we summarize here.
  2. The Four Layer Architecture improves upon the classical 'Model-View-Controller' application design by first recognizing the 'M-V-C' architecture, while an evolutionary gain, is inadequate to support every possible application environment, especially, the challenges associated with database integration and communication. Three of the layers of the 'Four Layer Architecture' closely follow the 'Model', 'View', and 'Controller', with the addition of the 'Infrastructure' Layer to handle the database tables and communication links.
  3. Closely related to the Four Layer Architecture is the Trim And Fit Client, the pattern describing how to partition the processing workload between the client and server. The factors driving the decision of how much work the client should do are very dynamic, given Moore's Law and increases in efficiency for communication links as well as memory, yet the 'Trim And Fit Client' pattern provides a timeless solution. Leveraging the architectural segments provided by the 'Four Layer Architecture', 'Trim And Fit Client' guides the designer to partition responsibility between the client and server at an arbitrary point either in the Application Models, or between the Application and Domain Model layers. This scheme gives the designer the flexibility to allocate the workload in an optimal way for the particular constraints in place, without being forced into a design situation with an imbalanced workload.
  4. Phase In Tiers presents the best practices to engage when designing a constantly expanding data center. Although it may be difficult for the architect to estimate the capacity required for the system, and nearly impossible for the developers to continuously migrate new applications and data to the growing network of database, client applications, and servers, the 'Phase In Tiers' pattern helps the enterprise architect stay ahead of the challenge. Incorporating another pattern, the Three-Tier Distribution Architecture, which provides justification for the now commonplace three-tiered enterprise application-platform split of client-server-database, 'Phase In Tiers' shows how to implement new designs in stages, over time, without breaking existing applications, or slowing the roll-out of new features. The key idea of 'Phase In Tiers' is to build new functionality into clients, and execute staged migration of application layers from the client to servers.

Concrete Examples

To motivate our discussions, it is helpful to provide practical, concrete reference examples of object-relational frameworks against traditional dynamic SQL approaches. One can think of these examples as the "Hello World" of DB interfacing.

Embedded (Ad-Hoc) SQL

The traditional approach uses simple strings and API calls to connect to databases and return their results. In this example in PHP, the title and date of the event with an ID greater than 5 is echoed:

 <?php
 $link = mysql_connect('localhost', 'mysql_user',
     'mysql_password');
 if (!$link) {
     die('Could not connect: ' . mysql_error());
 }
 echo 'Connected successfully';
 mysql_close($link);
 
 $sql = "SELECT title, date
       FROM   events
       WHERE  id > 5";
 
 $result = mysql_query($sql);
 
 while ($row = mysql_fetch_assoc($result)) {
   echo $row["title"];
   echo $row["date"];
 }
 ?>

Stored Procedures

Stored procedures are mechanisms that encapsulate SQL queries and other SQL business logic within the database server itself. It has the advantage of decoupling the SQL syntax from the client application, but calls to these stored procedures themselves are still not transparent from within the client application. An example of a SQL stored procedure in Microsoft SQL Server is as follows:

 CREATE PROCEDURE spCaliforniaAuthors
 AS
   SELECT * FROM authors
   WHERE state = 'CA'
   ORDER BY zip

This stored procedure is kept on the database. The application can then call the stored procedure spCaliforniaAuthors directly, without concerning themselves with the lower level implementation details.

The ORM Approach

Using object-relational mapping, the internal connection details are hidden, usually in external configuration files. The application developer need not have any knowledge of SQL programming, and can manipulate and access database objects in an object-oriented domain. In this example using Java and Hibernate, the programmer created a new event and then saves it to a database:

 Session session = HibernateUtil.getSessionFactory().
     getCurrentSession();
 session.beginTransaction();
 
 Event theEvent = new Event();
 theEvent.setTitle(title);
 theEvent.setDate(date);
 session.save(theEvent);
 
 session. getTransaction().commit();]

Microsoft LINQ

Microsoft provides a hybrid approach to object-relational mapping, called LINQ, which interleaves database metadata with object-oriented programming to allow for language-integrated querying of relational databases.

 [Table(Name="Customers")]
 public class Customer
 {
    [Column(IsPrimaryKey=true)]
    public string CustomerID;
    [Column]
    public string City;
 }

Unlike other ORM mechanics, which try to hide non-OO querying as much as possible, or use external library calls for direct SQL queries, LINQ to SQL provides a run time infrastructure for managing relational data as objects without losing the ability to query through SQL-like constructs:

 var q =
    from c in Customers
    where c.City == "London"
    select c;
 
 foreach (var cust in q)
    Console.WriteLine("id = {0}, City = {1}", 
          cust.CustomerID, cust.City);

These SQL-like constructs are integrated into the language itself, without the need for external library calls, and are available to non-database data structures as well, if they support the appropriate interfaces.

Comparison

With our examples in hand, we can now discuss the advantages and disadvantages of the various approaches to marrying object-oriented programming and relational databases.

Ease of Programming

Those who come from a database background find embedded SQL easy to use because queries are written in a SQL language that they are already familiar with. The other advantage of embedded SQL is that it allows the application developers to exploit the properties and function of their particular database, at the expense of reducing portability of the application. From a debugging perspective, the database logic is directly embedded within the source code, which avoids the yo-yo effect between the application source and database source. [7].

Stored procedures fare slightly better and have some benefits over embedded SQL. [8] They decouple the database logic from the application logic, where database programmers can work on database logic and application programmers can independently work on application logic. In addition, stored procedures can be changed without having to recompile the application, leading to more modular implementations.

Still, the use of stored procedures is not without criticism. Stored Procedures are written in big iron database languages like PL/SQL or T-SQL, which tend to be archaic in functionality. Stored procedures cannot be debugged easily because they cannot be debugged in the same process as the application. Finally, stored procedures cannot pass objects, and instead must pass primitive data types back and forth, often resulting in passing back and forth an overly large number of parameters to these procedures to accomplish a task. [9]

ORM frameworks have the advantage in that developers work with objects and the mapping tools that enable data persistence. transparently. For most applications, this provides a natural object-oriented way to access relational databases. When queries are required that cannot easily be expressed in an object-oriented domain, these ORM tools typically provide standardized, proprietary SQL-like syntax, such as HSQL, thereby abstracting the details of the database itself.[10]

While most ORM frameworks are easy to program from an application developer perspective, the initial configuration of such frameworks can be daunting due to the framework's generality and large number of configuration parameters. Configuration of ORM is difficult enough that there exist meta code generation tools such as XDoclet and Middlegen, which can be thought of as compilers in their own right.

Finally, let us look at LINQ, a variation of ORM which adds querying capabilities to .NET 2.0 and provides operations similar to that of SQL. LINQ's major advantage is that it provides consistent domain modeling, while hiding the mundane code (LINQ-to-SQL) that often gets exposed either in configuration of ORM or in embedded SQL. [11] LINQ also provides one source and query language for multiple data stores, such as relational data, XML data, and other .NET objects [12], and it is integrated within the syntax of the language. With respect to ease of programming, LINQ is a clear winner, but its model of embedding database metadata and querying directly within application logic may be met with resistance by individuals who prefer separating database programmers from application developers. [13] LINQ is also specifically developed for use in Microsoft Visual Studio, and does not work in other languages or environments.

Robustness

While embedded SQL is one of the easiest ways to connect to a database, it is also one of the least robust. Embedded SQL is commonly subject to SQL injection attacks, though these attacks can be mitigated by libraries with careful programming. Embedded SQL also reduces the elegance of code by increasing coupling between the database and the application itself. In the PHP sample code provided, for instance, the code is tied to a MySQL database using the mysql_connect function.

Since the SQL code is interspersed between lines of non-SQL code, it can be difficult to maintain and modify if the SQL code needs to be modified. Indeed, there is significant impact to the understandability, testability, adaptability, and other quality aspects of the overall system.[14] Despite these issues, embedded SQL remains a popular choice because it works relatively well for many smaller business applications that mainly do basic CRUD operations, even in languages like Java that support object-oriented design.

Stored procedures are certainly more robust than embedded SQL. Since SQL code stays in the database, and application code stays in the application, it is easier to maintain. The .NET Data Access Architecture Guide specifies other reasons that stored procedures are more robust:

  1. Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
  2. Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.
  3. Stored procedures add an extra level of abstraction from the underlying database schema. The client of the stored procedure is isolated from the implementation details of the stored procedure and from the underlying schema.

Finally, we turn to ORM and LINQ. Both of these mechanisms can be considered robust because they abstract entirely the database connection details and handling from the application developer. One can change databases from say, Microsoft SQL Server to PostgreSQL, with no change in application logic other than the editing of the ORM database configuration files. However, this increased robustness is a trade off and requires sacrifices in ease of programming with respect to smaller projects and potential sacrifices in performance and efficiency, as will be discussed in the following section.

Efficiency

The efficiency of the different mechanisms for database access are depend on the application, implementation, and a variety of other factors. As such, we describe the differences in efficiency only qualitatively rather than quantitatively.

At first glance, it may appear that embedded SQL is highly efficient. After all, embedded SQL queries are low-level strings passed directly to the SQL database, and SQL queries can be custom written to take advantage of special capabilities of the particular database. Similarly, others believe that the overhead of processing stored procedures results in a performance penalty. The real answer is not as cut and dried, especially when comparing against stored procedures. Here's why: [15]

  1. In many cases, you can get better performance by looping and filtering data within the database than at the Data Access Layer. This is because databases are intrinsically designed to do this, while application developers have to write their own code.
  2. Stored procedures can be used to batch common work together or retrieve multiple sets of data. This batching of data reduces network traffic and and consolidates work.
  3. Many databases can also take advantage of execution plans, which allow them to cache stored procedures versus embedded SQL where each execution would have to be recalculated for each request.
  4. Databases tend to optimize better the closer they are to the bare metal. The easiest way to get performance out of your database is to do everything you can to take advantage of the platform you are running on, which means running queries as close to the database as possible.

ORM approaches have the opposite default criticism since many assume that their efficiency is lower than stored procedures or embedded SQL because of their overhead in performing O/R mapping. Again, the answer is not as clear cut. Let us examine one of more popular ORM frameworks, Hibernate, for evidence: Hibernate implements an extremely high-concurrency architecture with no resource-contention issues (apart from the obvious - contention for access to the database). This architecture scales extremely well as concurrency increases in a cluster or on a single machine. [16]

LINQ users can achieve optimum performance by leveraging the capabilities of their platform [17]. Knowing how to utilize the LINQ data context in conjunction with appropriately crafted queries permits the implementation to avoid unnecessary database activity, such as eliminating write-backs when no data has changed.

For those interested in competitive performance challenges, the PolePosition benchmark suite provides rigorous comparison of ORM application-implementation pairs, with some test data archived at their site for specific implementations running well-defined test cases.

Implementations

Languages and environments as diverse as .Net and PHP support ORM [18].

We will focus on specific implementations in Java, Ruby on Rails, Microsoft .NET, PHP, ASP Classic, and JDBC.

ORM in Java

In recent years, Java has experienced a paradigm shift from complex heavy-weight frameworks such as Enterprise Java Beans to more light-weight agile frameworks that rely instead of simple Plain Old Java Objects (POJOs). This in turn, has increased the popularity of ORM for Java developers.

Indeed, Object-relational mapping is especially popular in the Java community, compared, for example to .NET developers. [19] Although a plethora of ORM frameworks exist for Java, among the most popular and widespread ORM layers today include Sun's JDO and the somewhat entrenched open source O/R mapping framework, Hibernate.

We begin with Java Data Objects (JDO), which by itself is not a framework, but a specification. The API is a standard interface-based Java model abstraction of persistence, developed under the auspices of the Java Community Process. Frameworks like Apache JDO then implement this specification. JDO aims to provide implementations for not only relational databases, but also object databases, and file systems.

Hibernate is another ORM implementation, and though it is open source, it is often considered "proprietary" because it does not directly implement the JDO specification or Java Community Process specifications. Still, Hibernate's momentum has resulted in it becoming a de facto standard in the Java industry, furthered by frameworks such as Spring that use it as a building block.

ORM in Ruby on Rails

Ruby on Rails is an interesting example of the use of ORM, which provides the ActiveRecord pattern implemented as ActiveRecord in Rails. In addition to mapping objects to table rows, and providing persistence in the traditional ORM sense, Ruby on Rails has language-level support for ORM with ActiveRecord. Rails allows database-oriented web services to be configured and deployed with a minumum of development, featuring the classic Model, View, Controller (M-V-C) architecture. In this context, Rails ActiveRecord implements the Model layer of the M-V-C architecture.

ORM in Microsoft .NET

Although ORM has achieved significant penetration of the Java world, there are now a variety of .NET implementations enabling ORM.

An obvious choice for those already aware of Hibernate for Java, is the .NET port of Hibernate known as NHibernate. NHibernate supports a number of database platforms [20] and permits developers to work in native .NET [21], integrating Plain Old CLR Objects (POCOs) data with the underlying database. For a practical getting-started developer guide, read Using NHibernate as an ORM Solution for .NET.

LINQ in Microsoft .NET

As described previously, Microsoft's LINQ provides language-integrated query[22] to the .NET environment. Supporting C# and Visual Basic, LINQ provides separate, optimized providers for access from the programming language level to relational data objects [23], XML documents[24], and SQL databases [25].

Dynamic SQL in PHP, ASP Classic, and JDBC

Despite the availability of object-relational libraries, dynamic SQL continues to be a popular development mechanism for interfacing with databases. Dynamic SQL is not an object-oriented methodology, but rather a "bare metal" programming approach where SQL strings are directly constructed through concatenation or other low-level mechanisms and then directly passed to the database. The results of such queries are themselves lower level objects like record sets or hash tables.

Summary

This article has introduced ORM from the user's perspective, motivating the discussion with the design patterns relevant to integrating a relational database to an object oriented application. We have learned the basic advantages of ORM and the techniques by which it is employed. In addition, we have seen examples of ORM usage in several implementations, and we have explored the various approaches to ORM in languages and environments as diverse as Java / Hibernate, and Microsoft LINQ. In addition, we have compared the Efficiency and Programming Ease of the different implementations. ORM has become ubiquitous as object technology development marches forward in concert with advances in networks and distributed architectures.

External Links

This section summarizes important links from the article for those wishing to quickly access the most practical ORM resources.