CSC/ECE 517 Summer 2008/wiki3 1 th: Difference between revisions
Line 16: | Line 16: | ||
# The ''Chasm'' in [http://www.ksc.com/articles/patternlanguage.htm Crossing Chasms] refers to the semantic and structural gap between data in an object oriented application and the representation and access of that data in a database. As a [http://en.wikipedia.org/wiki/Pattern_language Pattern Language], Crossing Chasms captures multiple proven methods for effectively integrating an object oriented application and a relational database. These solutions are grouped into two broad categories, [http://www.ksc.com/articles/staticpatterns.htm Static Patterns], useful in data modeling and database table design, and [http://www.ksc.com/articles/crossingchasms.htm Architectural Patterns], useful primarily for designing overall systems to be as effective as possible in the dynamic interaction between data and objects. Originally developed to integrate Smalltalk applications to relational databases, and rooted in the theory of Object Orientation from Jacobsen to Rumbaugh, the patterns have been abstracted and refined to capture the best practices of the experts in OO database integration, whatever the platform. The architectural patterns include ''Four Layer Architecture'', ''Trim And Fit Client'', and ''Phase In Tiers'', which we summarize here. | # The ''Chasm'' in [http://www.ksc.com/articles/patternlanguage.htm Crossing Chasms] refers to the semantic and structural gap between data in an object oriented application and the representation and access of that data in a database. As a [http://en.wikipedia.org/wiki/Pattern_language Pattern Language], Crossing Chasms captures multiple proven methods for effectively integrating an object oriented application and a relational database. These solutions are grouped into two broad categories, [http://www.ksc.com/articles/staticpatterns.htm Static Patterns], useful in data modeling and database table design, and [http://www.ksc.com/articles/crossingchasms.htm Architectural Patterns], useful primarily for designing overall systems to be as effective as possible in the dynamic interaction between data and objects. Originally developed to integrate Smalltalk applications to relational databases, and rooted in the theory of Object Orientation from Jacobsen to Rumbaugh, the patterns have been abstracted and refined to capture the best practices of the experts in OO database integration, whatever the platform. The architectural patterns include ''Four Layer Architecture'', ''Trim And Fit Client'', and ''Phase In Tiers'', which we summarize here. | ||
# The [http://c2.com/cgi/wiki?FourLayerArchitecture Four Layer Architecture] improves upon the classical 'Model-View-Controller' application design by first recognizing the 'M-V-C' architecture, while an evolutionary gain, is inadequate to support every possible application environment, especially, the challenges associated with database integration and communication. | # The [http://c2.com/cgi/wiki?FourLayerArchitecture Four Layer Architecture] improves upon the classical 'Model-View-Controller' application design by first recognizing the 'M-V-C' architecture, while an evolutionary gain, is inadequate to support every possible application environment, especially, the challenges associated with database integration and communication. Three of the layers of the 'Four Layer Architecture' closely follow the 'Model', 'View', and 'Controller', with the addition of the 'Infrastructure' Layer to handle the database tables and communication links. | ||
# [http://c2.com/cgi/wiki?TrimAndFitClient Trim And Fit Client] | # [http://c2.com/cgi/wiki?TrimAndFitClient Trim And Fit Client] | ||
# [http://c2.com/cgi-bin/wiki/wiki?PhaseInTiers Phase In Tiers] | # [http://c2.com/cgi-bin/wiki/wiki?PhaseInTiers Phase In Tiers] |
Revision as of 04:46, 25 July 2008
RBP/OO Interactions
- It would be good if OO programs could interact with OO databases, but alas, relational databases have a 99% market share. This has led to many attempts to access them from OO languages. Design patterns and implementations for doing this have been developed, starting with Crossing Chasms and extending to Rails' ActiveRecord [1][2]. Here, we investigate the various approaches for marrying OO programs to relational databases, comparing them in terms of ease of programming, robustness, and efficiency.
Introduction
This article explores object-relation mapping (ORM), a programming technique that bridges object-oriented languages against relational databases, and compares them against more traditional approaches to database programming such as stored procedures and dynamic SQL. In the process, we examine basic design patterns for bridging this gap, and evaluate several popular ORM frameworks found in popular program languages in the process.
One of the primary problems that object-relational mapping (ORM) attempts to solve is that of transparent object persistence, which allows an object to outlive the process that created it. The state of an object can be stored to disk, and an object with the same state can be re-created in the future. This object data is typically internally stored in a relational database using SQL.
Unfortunately, relational databases lie at the core of any modern Enterprise application, and such tabular representation of SQL data is fundamentally different than the network of objects used in object-oriented applications. ORM allows us to interact with business objects directly in an object-oriented domain model, instead of having to work with rows and columns at the programming level. For an introduction to ORM and the surrounding issues, refer to [3][4][5].
Design Patterns
Design patterns provide the theoretical underpinnings for the object-relational tools that we use in practice today. One of the first design pattern languages exploring the bridge between object-oriented domains and relational domains is Crossing Chasms.
- The Chasm in Crossing Chasms refers to the semantic and structural gap between data in an object oriented application and the representation and access of that data in a database. As a Pattern Language, Crossing Chasms captures multiple proven methods for effectively integrating an object oriented application and a relational database. These solutions are grouped into two broad categories, Static Patterns, useful in data modeling and database table design, and Architectural Patterns, useful primarily for designing overall systems to be as effective as possible in the dynamic interaction between data and objects. Originally developed to integrate Smalltalk applications to relational databases, and rooted in the theory of Object Orientation from Jacobsen to Rumbaugh, the patterns have been abstracted and refined to capture the best practices of the experts in OO database integration, whatever the platform. The architectural patterns include Four Layer Architecture, Trim And Fit Client, and Phase In Tiers, which we summarize here.
- The Four Layer Architecture improves upon the classical 'Model-View-Controller' application design by first recognizing the 'M-V-C' architecture, while an evolutionary gain, is inadequate to support every possible application environment, especially, the challenges associated with database integration and communication. Three of the layers of the 'Four Layer Architecture' closely follow the 'Model', 'View', and 'Controller', with the addition of the 'Infrastructure' Layer to handle the database tables and communication links.
- Trim And Fit Client
- Phase In Tiers
Concentrated Examples
To motivate our discussions, it is helpful to provide practical, concentrated reference examples of object-relational frameworks against traditional dynamic SQL approaches. One can think of these examples as the "Hello World" of DB interfacing.
Embedded (Ad-Hoc) SQL
The traditional approach uses simple strings and API calls to connect to databases and return their results. In this example in PHP, the title and date of the event with an ID greater than 5 is echoed:
<?php
$link = mysql_connect('localhost', 'mysql_user',
'mysql_password');
if (!$link) {
die('Could not connect: ' . mysql_error());
}
echo 'Connected successfully';
mysql_close($link);
$sql = "SELECT title, date
FROM events
WHERE id > 5";
$result = mysql_query($sql);
while ($row = mysql_fetch_assoc($result)) {
echo $row["title"];
echo $row["date"];
}
?>
Stored Procedures
Stored procedures are mechanisms that encapsulate SQL queries and other SQL business logic within the database server itself. It has the advantage of decoupling the SQL syntax from the client application, but calls to these stored procedures themselves are still not transparent from within the client application. An example of a SQL stored procedure in Microsoft SQL Server is as follows:
CREATE PROCEDURE spCaliforniaAuthors AS SELECT * FROM authors WHERE state = 'CA' ORDER BY zip
This stored procedure is kept on the database. The application can then call the stored procedure spCaliforniaAuthors directly, without concerning themselves with the lower level implementation details.
The ORM Approach
Using object-relational mapping, the internal connection details are hidden, usually in external configuration files. The application developer need not have any knowledge of SQL programming, and can manipulate and access database objects in an object-oriented domain. In this example using Java and Hibernate, the programmer created a new event and then saves it to a database:
Session session = HibernateUtil.getSessionFactory(). getCurrentSession(); session.beginTransaction(); Event theEvent = new Event(); theEvent.setTitle(title); theEvent.setDate(date); session.save(theEvent); session. getTransaction().commit();]
Microsoft LINQ
Microsoft provides a hybrid approach to object-relational mapping, called LINQ, which interleaves database metadata with object-oriented programming to allow for language-integrated querying of relational databases.
[Table(Name="Customers")] public class Customer { [Column(IsPrimaryKey=true)] public string CustomerID; [Column] public string City; }
Unlike other ORM mechanics, which try to hide non-OO querying as much as possible, or use external library calls for direct SQL queries, LINQ to SQL provides a runtime infrastructure for managing relational data as objects without losing the ability to query through SQL-like constructs:
var q = from c in Customers where c.City == "London" select c; foreach (var cust in q) Console.WriteLine("id = {0}, City = {1}", cust.CustomerID, cust.City);
These SQL-like constructs are integrated into the language itself, without the need for external library calls, and are available to non-database data structures as well, if they support the appropriate interfaces.
Comparison
With our examples in hand, we can now discuss the advantages and disadvantages of the various approaches to marrying object-oriented programming and relational databases.
Ease of Programming
Those who come from a database background find embedded SQL easy to use because queries are written in a SQL language that they are already familiar with. The other advantage of embedded SQL is that it allows the application developers to exploit the properties and function of their particular database, at the expense of reducing portability of the application. From a debugging perspective, the database logic is directly embedded within the source code, which avoids the yo-yo effect between the application source and database source. [6].
Stored procedures fare slightly better and have some benefits over embedded SQL. [7] They decouple the database logic from the application logic, where database programmers can work on database logic and application programmers can independently work on application logic. In addition, stored procedures can be changed without having to recompile the application, leading to more modular implementations.
Still, the use of stored procedures is not without criticism. Stored Procedures are written in big iron database languages like PL/SQL or T-SQL, which tend to be archaic in functionality. Stored procedures cannot be debugged easily because they cannot be debugged in the same process as the application. Finally, stored procedures cannot pass objects, and instead must pass primitive data types back and forth, often resulting in passing back and forth an overly large number of parameters to these procedures to accomplish a task. [8]
ORM frameworks have the advantage in that developers work with objects and the mapping tools that enable data persistency transparently. For most applications, this provides a natural object-oriented way to access relational databases. When queries are required that cannot easily be expressed in an object-oriented domain, these ORM tools typically provide standardized, proprietary SQL-like syntax, such as HSQL, thereby abstracting the details of the database itself.[9]
While most ORM frameworks are easy to program from an application developer perspective, the initial configuration of such frameworks can be daunting due to the framework's generality and large number of configuration parameters. Configuration of ORM is difficult enough that there exist meta code generation tools such as XDoclet and Middlegen, which can be thought of as compilers in their own right.
Finally, let us look at LINQ, a variation of ORM which adds querying capabilities to .NET 2.0 and provides operations similar to that of SQL. LINQ's major advantage is that it provides consistent domain modeling, while hiding the mundane code (LINQ-to-SQL) that often gets exposed either in configuration of ORM or in embedded SQL. [10] LINQ also provides one source and query language for multiple data stores, such as relational data, XML data, and other .NET objects [11], and it is integrated within the syntax of the language. With respect to ease of programming, LINQ is a clear winner, but its model of embedding database metadata and querying directly within application logic may be met with resistance by individuals who prefer separating database programmers from application developers. [12] LINQ is also specifically developed for use in Microsoft Visual Studio, and does not work in other languages or environments.
Robustness
While embedded SQL is one of the easiest ways to connect to a database, it is also one of the least robust. Embedded SQL is commonly subject to SQL injection attacks, though these attacks can be mitigated by libraries with careful programming. Embedded SQL also reduces the elegance of code by increasing coupling between the database and the application itself. In the PHP sample code provided, for instance, the code is tied to a MySQL database using the mysql_connect function.
Since the SQL code is interspersed between lines of non-SQL code, it can be difficult to maintain and modify if the SQL code needs to be modified. Indeed, there is significant impact to the understandability, testability, adaptability, and other quality aspects of the overall system.[13] Despite these issues, embedded SQL remains a popular choice because it works relatively well for many smaller business applications that mainly do basic CRUD operations, even in languages like Java that support object-oriented design.
Stored procedures are certainly more robust than embedded SQL. Since SQL code stays in the database, and application code stays in the application, it is easier to maintain. The .NET Data Access Architecture Guide specifies other reasons that stored procedures are more robust:
- Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
- Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.
- Stored procedures add an extra level of abstraction from the underlying database schema. The client of the stored procedure is isolated from the implementation details of the stored procedure and from the underlying schema.
Finally, we turn to ORM and LINQ. Both of these mechanisms can be considered robust because they abstract entirely the database connection details and handling from the application developer. One can change databases from say, Microsoft SQL Server to PostgreSQL, with no change in application logic other than the editing of the ORM database configuration files. However, this increased robustness is a trade off and requires sacrifices in ease of programming with respect to smaller projects and potential sacrifices in performance and efficiency, as will be discussed in the following section.
Efficiency
The efficiency of the different mechanisms for database access are depend on the application, implementation, and a variety of other factors. As such, we describe the differences in efficiency only qualitatively rather than quantitatively.
At first glance, it may appear that embedded SQL is highly efficient. After all, embedded SQL queries are low-level strings passed directly to the SQL database, and SQL queries can be custom written to take advantage of special capabilities of the particular database. Similarly, others believe that the overhead of processing stored procedures results in a performance penalty. The real answer is not as cut and dried, especially when comparing against stored procedures. Here's why: [14]
- In many cases, you can get better performance by looping and filtering data within the database than at the Data Access Layer. This is because databases are intrinsically designed to do this, while application developers have to write their own code.
- Stored procedures can be used to batch common work together or retrieve multiple sets of data. This batching of data reduces network traffic and and consolidates work.
- Many databases can also take advantage of execution plans, which allow them to cache stored procedures versus embedded SQL where each execution would have to be recalculated for each request.
- Databases tend to optimize better the closer they are to the bare metal. The easiest way to get performance out of your database is to do everything you can to take advantage of the platform you are running on, which means running queries as close to the database as possible.
ORM approaches have the opposite default criticism since many assume that their efficiency is lower than stored procedures or embedded SQL because of their overhead in performing O/R mapping. Again, the answer is not as clear cut. Let us examine one of more popular ORM frameworks, Hibernate, for evidence: [15]
The PolePosition benchmark suite provides rigorous comparison of ORM application-implementation pairs, with some test data archived at their site.
http://www.singingeels.com/Articles/Improving_Performance_With_LINQ.aspx
Implementations
Languages and environments as diverse as .Net and PHP support ORM [16]. We will focus our comparison on specific implementations in Java, Ruby on Rails, Microsoft .NET, PHP, ASP Classic, and JDBC.
ORM in Java
In recent years, Java has experienced a paradigm shift from complex heavy-weight frameworks such as Enterprise Java Beans to more light-weight agile frameworks that rely instead of simple Plain Old Java Objects (POJOs). This in turn, has increased the popularity of ORM for Java developers.
Indeed, Object-relational mapping is especially popular in the Java community, compared. for example to .NET developers. [17] Although a plethora of ORM frameworks exist for Java, among the most popular and widespread ORM layers today include Sun's JDO and the somewhat entrenched open source O/R mapping framework, Hibernate.
We begin with Java Data Objects (JDO), which by itself is not a framework, but a specification. The API is a standard interface-based Java model abstraction of persistence, developed under the auspices of the Java Community Process. Frameworks like Apache JDO then implement this specification. JDO aims to provide implementations for not only relational databases, but also object databases, and file systems.
Hibernate is another ORM implementation, and though it is open source, it is often considered "proprietary" because it does not directly implement the JDO specification or Java Community Process specifications. Still, Hibernate's momentum has resulted in it becoming a de facto standard in the Java industry, furthered by frameworks such as Spring that use it as a building block.
ORM in Ruby on Rails
Ruby on Rails is an interesting example of the use of ORM.
http://wiki.rubyonrails.org/rails/pages/ActiveRecord
ORM in Microsoft .NET
LINQ in Microsoft .NET
http://msdn.microsoft.com/en-us/netframework/aa904594.aspx
Dynamic SQL in PHP, ASP Classic, and JDBC
Despite the availability of object-relational libraries, dynamic SQL continues to be a popular development mechanism for interfacing with databases. Dynamic SQL is not an object-oriented methodology, but rather a "bare metal" programming approach where SQL strings are directly constructed through concatenation or other low-level mechanisms and then directly passed to the database. The results of such queries are themselves lower level objects like record sets or hash tables.
Summary
Also assigning this to you.