CSC/ECE 517 Summer 2008/wiki3 1 th: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 143: Line 143:
The efficiency of the different mechanisms for database access are largely depending on the application, implementation, and a variety of other factors. As such, we describe the differences in efficiency only qualitatively rather than quantitatively.
The efficiency of the different mechanisms for database access are largely depending on the application, implementation, and a variety of other factors. As such, we describe the differences in efficiency only qualitatively rather than quantitatively.


At first glance, it may appear that embedded SQL is highly efficient. After all, embedded SQL queries are low-level strings passed directly to the SQL database, and SQL queries can be custom written to take advantage of special capabilities of the particular database. While this may be true for rare queries, in practice, embedded SQL is not particularly efficient for several reasons.
At first glance, it may appear that embedded SQL is highly efficient. After all, embedded SQL queries are low-level strings passed directly to the SQL database, and SQL queries can be custom written to take advantage of special capabilities of the particular database. The real answer is not as cut and dry, especially when comparing against stored procedures.


= Impedence Mismatches =
= Impedence Mismatches =

Revision as of 22:22, 24 July 2008

RBP/OO Interactions

It would be good if OO programs could interact with OO databases, but alas, relational databases have a 99% market share. This has led to many attempts to access them from OO languages. Design patterns for doing this have been developed, starting with Crossing Chasms and extending to Rails' ActiveRecord [1][2]. Here, we investigate the various approaches for marrying OO programs to relational databases, comparing them in terms of ease of programming, robustness, and efficiency.

Introduction

This article explores object-relation mapping (ORM), a programming technique that bridges object-oriented languages against relational databases, and compares them against more traditional approaches to database programming such as stored procedures and dynamic SQL. In the process, we examine basic design patterns for bridging this gap, and evaluate several popular ORM frameworks found in popular program languages in the process.

One of the primary problems that object-relational mapping (ORM) attempts to solve is that of transparent object persistence, which allows an object to outlive the process that created it. The state of an object can be stored to disk, and an object with the same state can be re-created in the future. This object data is typically internally stored in a relational database using SQL.

Unfortunately, relational databases lie at the core of any modern Enterprise application, and such tabular representation of SQL data is fundamentally different than the network of objects used in object-oriented applications. ORM allows us to interact with business objects directly in an object-oriented domain model, instead of having to work with rows and columns at the programming level. For an introduction to ORM and the surrounding issues, refer to [3].

Design Patterns

Design patterns provide the theoretical underpinnings for the object-relational tools that we use in practice today. One of the first design patterns exploring the bridge between object-oriented domains and relational domains is Crossing Chasms.

TODO

  1. Crossing Chasms
  2. Four Layer Architecture
  3. Trim And Fit Client
  4. Phase In Tiers

Concentrate Examples

To motivate our discussions, it is helpful to provide a practical, concentrate reference example of object-relational frameworks against traditional dynamic SQL approaches. One can think of these examples as the "Hello World" of DB interfacing.

Embedded (Dynamic) SQL

The traditional approach uses simple strings and API calls to connect to databases and return their results. In this example in PHP, the title and date of the event with an ID greater than 5 is echoed:

 <?php
 $link = mysql_connect('localhost', 'mysql_user',
     'mysql_password');
 if (!$link) {
     die('Could not connect: ' . mysql_error());
 }
 echo 'Connected successfully';
 mysql_close($link);
 
 $sql = "SELECT title, date
       FROM   events
       WHERE  id > 5";
 
 $result = mysql_query($sql);
 
 while ($row = mysql_fetch_assoc($result)) {
   echo $row["title"];
   echo $row["date"];
 }
 ?>

Stored Procedures

Stored procedures are mechanisms that encapsulate SQL queries and other SQL business logic within the database server itself. It has the advantage of decoupling the SQL syntax from the client application, but calls to these stored procedures themselves are still not transparent from within the client application. An example of a SQL stored procedure in Microsoft SQL Server is as follows:

 CREATE PROCEDURE spCaliforniaAuthors
 AS
   SELECT * FROM authors
   WHERE state = 'CA'
   ORDER BY zip

This stored procedure is kept on the database. The application can then call the stored procedure spCaliforniaAuthors directly, without concerning themselves with the lower level implementation details.

The ORM Approach

Using object-relational mapping, the internal connection details are hidden, usually in external configuration files. The application developer need not have any knowledge of SQL programming, and can manipulate and access database objects in an object-oriented domain. In this example using Java and Hibernate, the programmer created a new event and then saves it to a database:

 Session session = HibernateUtil.getSessionFactory().
     getCurrentSession();
 session.beginTransaction();
 
 Event theEvent = new Event();
 theEvent.setTitle(title);
 theEvent.setDate(date);
 session.save(theEvent);
 
 session. getTransaction().commit();]

Microsoft LINQ

Microsoft provides a hybrid approach to object-relational mapping, called LINQ, which interleaves database metadata with object-oriented programming to allow for language-integrated querying of relational databases.

 [Table(Name="Customers")]
 public class Customer
 {
    [Column(IsPrimaryKey=true)]
    public string CustomerID;
    [Column]
    public string City;
 }

Unlike other ORM mechanics, which try to hide non-OO querying as much as possible, or use external library calls for direct SQL queries, LINQ to SQL provides a runtime infrastructure for managing relational data as objects without losing the ability to query through SQL-like constructs:

 var q =
    from c in Customers
    where c.City == "London"
    select c;
 
 foreach (var cust in q)
    Console.WriteLine("id = {0}, City = {1}", 
          cust.CustomerID, cust.City);

These SQL-like constructs are integrated into the language itself, without the need for external library calls, and are available to non-database data structures as well, if they support the appropriate interfaces.

Comparison

With our examples in hand, we can now discuss the advantages and disadvantages of the various approaches to marrying object-oriented programming and relational databases.

Ease of Programming

Those who come from a database background find embedded SQL easy to use because queries are written in a SQL language that they are already familiar with. The other advantage of embedded SQL is that it allows the application developers to exploit the properties and function of their particular database, at the expense of reducing portability of the application. From a debugging perspective, the database logic is directly embedded within the source code, which avoids the yo-yo effect between the application source and database source. [4].

Stored procedures fare slightly better and have some benefits over embedded SQL. [5] They decouple the database logic from the application logic, where database programmers can work on database logic and application programmers can independently work on application logic. In addition, stored procedures can be changed without having to recompile the application.

Still, the use of stored procedures is not without criticism. Stored Procedures are written in big iron database languages like PL/SQL or T-SQL, which tend to be archaic in functionality. Stored procedures cannot be debugged easily because they cannot be debugged in the same process as the application. Finally, stored procedures cannot pass objects, and instead must pass primitive data types back and forth, often resulting in passing back and forth and overly large number of parameters to these procedures to accomplish a task. [6]

ORM frameworks have the advantage in that developers to work with objects and the mapping tools that enable data persistency transparently. For most applications, this provides a natural object-oriented way to access relational databases. When queries are required that cannot easily be expressed in an object-oriented domain, these ORM tools typically provide standardized, proprietary SQL-like syntax, such as HSQL, thereby abstracting the details of the database itself.[7]

While most ORM frameworks are easy to program from an application developer perspective, the initial configuration of such frameworks can be daunting due to the framework's generality and large number of configuration parameters. Configuration of ORM is difficult enough that there exist meta code generation tools such as XDoclet and Middlegen, which can be thought of as compilers in their own right.

Finally, let us look at LINQ, a variation of ORM which adds querying capabilities to .NET 2.0 and provides operations similar to that of SQL. LINQ's major advantage is that it provides consistent domain modeling, while hiding the mundane code (LINQ-to-SQL) that often gets exposed either in configuration of ORM or in embedded SQL. [8] LINQ also provides one source and query language for multiple data stores, such as relational data, XML data, and other .NET objects. [9], and it is integrated within the syntax of the language. With respect to ease of programmer, LINQ is a clear winner with respect to ease of programing, but its model of embedding database metadata and querying directly within application logic may be met with resistance for individuals who prefer separating database programmers from application developers. [10] LINQ is also specifically developed for use in Microsoft Visual Studio, and does not work in other languages or environments.

Robustness

While embedded SQL is one of the easiest way to connect to a database, it is also one of the least robust. Embedded SQL is commonly subject to SQL injection attacks, though these attacks can be mitigated by libraries with careful programming. Embedded SQL also reduces the elegance of code by increasing coupling between the database and the application itself. In the PHP sample code provided, for instance, the code is tried to a MySQL database using the mysql_connect function.

Since the SQL code is interspersed between lines of non-SQL code. it can be difficult to maintain and modify if the SQL code needs to be modified. Indeed, there are significant impact on the understandability, testability, adaptability, and other quality aspects of the overall system.[11] Despite these issues, embedded SQL remains a popular choice because it works relatively well for many smaller business applications that mainly do basic CRUD operations, even in languages like Java that support object-oriented design.

Stored procedures are certainly more robust than embedded SQL. Since SQL code stays in the database, and application code stays in the application, it is easier to maintain. The .NET Data Access Architecture Guide specifies other reasons that stored procedures are more robust:

  1. Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
  2. Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.
  3. Stored procedures add an extra level of abstraction from the underlying database schema. The client of the stored procedure is isolated from the implementation details of the stored procedure and from the underlying schema.

Finally, we turn to ORM and LINQ. Both of these mechanisms can be considered robust because they abstract entirely the database connection details and handling from the application developer. One can change databases from say, Microsoft SQL Server to PostgreSQL, with no change in application logic other than the editing of the ORM database configuration files. However, this increased robustness is a trade off and requires sacrifices in ease of programming with respect to smaller projects and potential sacrifices in performance and efficiency, as will be discussed in the following section.

Efficiency

The efficiency of the different mechanisms for database access are largely depending on the application, implementation, and a variety of other factors. As such, we describe the differences in efficiency only qualitatively rather than quantitatively.

At first glance, it may appear that embedded SQL is highly efficient. After all, embedded SQL queries are low-level strings passed directly to the SQL database, and SQL queries can be custom written to take advantage of special capabilities of the particular database. The real answer is not as cut and dry, especially when comparing against stored procedures.

Impedence Mismatches

WE SHOULD PROBABLY JUST TAKE THIS SECTION OUT

ORM has experienced criticism, including the notion ORM is The Vietnam of Computer Science. There is so much here I don't know what to incorporate - please suggest.

Other problems that ORM attempts to solve are that of paradigm mismatches. These are outlined as follows: [12]

  • Problems relating to subtypes. Object-oriented languages implement inheritance through superclasses and subclasses. SQL tables, in contrast, do not generally implement any sort of table inheritance, and they additionally lack an obvious way to implement polymorphism. Mapping class inheritance from the object domain to the relational domain comprises one of the many goals of ORM.
  • Problems relating to associations. In domain models, associations represent the relationship between entities. Object-oriented languages represent associations using object references, but in relational databases, an association is represented through foreign keys. Object-relational mapping bridges these two concepts.
  • Problems relating to data navigation. There is also a key difference in the way data is accessed in object-oriented languages and in relational databases. In OOP, one walks the object network, navigating from one object to another. This is not an efficient way to retrieve data from a SQL database, where the goal is to minimize the number of SQL queries. Efficient access in SQL relies on set operations, like joining multiple tables of interest. This mismatch between the way objects are accessed in OOP versus a relational database is the single most common source of performance problems.

Implementations

Languages and environments as diverse as .Net and PHP support ORM [13]. We will focus our comparison on specific implementations of ORM in Java, Ruby on Rails, Microsoft .NET, PHP, ASP Classic, and JDBC.

ORM in Java

In recent years, Java has experienced a paradigm shift from complex heavy-weight frameworks such as Enterprise Java Beans to more light-weight agile frameworks that rely instead of simple Plain Old Java Objects (POJOs). This in turn, has increased the popularity of ORM for Java developers.

Indeed, Object-relational mapping is especially popular in the Java community, compared. for example to .NET developers. [14] Although a plethora of ORM frameworks exist for Java, among the most popular and widespread ORM layers today include Sun's JDO and the somewhat entrenched open source O/R mapping framework, Hibernate.

We begin with Java Data Objects (JDO), which by itself is not a framework, but a specification. The API is a standard interface-based Java model abstraction of persistence, developed under the auspices of the Java Community Process. Frameworks like Apache JDO then implement this specification. JDO aims to provide implementations for not only relational databases, but also object databases, and file systems.

Hibernate is another ORM implementation, and though it is open source, it is often considered "proprietary" because it does not directly implement the JDO specification or Java Community Process specifications. Still, Hibernate's momentum has resulted in it becoming a de facto standard in the Java industry, furthered by frameworks such as Spring that use it as a building block.


http://www.kuro5hin.org/story/2006/3/11/1001/81803

ORM in Ruby on Rails

Ruby on Rails is an interesting example of the use of ORM.

http://wiki.rubyonrails.org/rails/pages/ActiveRecord

ORM in Microsoft .NET

LINQ in Microsoft .NET

http://msdn.microsoft.com/en-us/netframework/aa904594.aspx

Dynamic SQL in PHP, ASP Classic, and JDBC

Despite the availability of object-relational libraries, dynamic SQL continues to be a popular development mechanism for interfacing with databases. Dynamic SQL is not an object-oriented methodology, but rather a "bare metal" programming approach where SQL strings are directly constructed through concatenation or other low-level mechanisms and then directly passed to the database. The results of such queries are themselves lower level objects like record sets or hash tables.

Summary

Links

http://www.google.com/search?hl=en&q=ORM&btnG=Google+Search

http://en.wikipedia.org/wiki/Object-relational_mapping

http://en.wikipedia.org/wiki/Object-relational_database

http://www.aspfree.com/c/a/Database/Introduction-to-RDBMS-OODBMS-and-ORDBMS/

http://developers.slashdot.org/article.pl?sid=03/09/23/2016224&threshold=4&mode=nested

http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch

http://www.google.com/search?hl=en&q=RDB+OO+patterns+faq&btnG=Search

http://ootips.org/persistent-objects.html

http://dtemplatelib.sourceforge.net/

http://soci.sourceforge.net/

http://trac.butterfat.net/public/StactiveRecord

http://www.metro-design-dev.com/modeler_portal.htm

http://www.ksc.com/articles/patternlanguage.htm

http://en.wikipedia.org/wiki/ActiveRecord_%28Rails%29

http://www.agiledata.org/essays/mappingObjects.html


http://www.service-architecture.com/object-relational-mapping/articles/transparent_persistence.html

http://www.service-architecture.com/object-oriented-databases/articles/odbms_faq.html

http://www.google.com/search?hl=en&q=object+oriented+database+design+pattern+%28faq+OR+tutorial%29&btnG=Google+Search

http://portal.acm.org/citation.cfm?id=253810

http://www.pearsonhighered.com/educator/academic/course/0,3119,604655,00.html

http://www.cmcrossroads.com/bradapp/links/oo-links.html

http://www.edcomp.com/results/Relational+and+Object+oriented+Database+Management+System+.html

http://cbbrowne.com/info/rdbms.html

http://en.wikipedia.org/wiki/Object-oriented_programming

http://www.sei.cmu.edu/str/descriptions/oodatabase_body.html

http://www.service-architecture.com/object-oriented-databases//articles/object-relational_mapping.html

http://www.service-architecture.com/object-oriented-databases/

http://www.arrakis.es/~devis/oo.html

http://www.google.com/search?hl=en&q=object+oriented+database+site%3Anist.gov&btnG=Google+Search

http://csrc.nist.gov/nissc/1996/papers/NISSC96/paper072_073_074/SCO_.PDF

http://madgeek.com/Articles/ORMapping/EN/mapping.htm

http://en.wikipedia.org/wiki/Object_database

http://www.polepos.org/