|
|
(19 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
| <p style="font-size:20px"> | | <p style="font-size:15px"> |
| "<i>A source code system is a giant UNDO key-a project-wide time machine</i>" | | "<i>A source code system is a giant UNDO key-a project-wide time machine</i>" |
|
| |
|
Line 18: |
Line 18: |
| = Source Code Control System = | | = Source Code Control System = |
|
| |
|
| The very first version control system[5] is the Source Code Control System, which was originally written by Marc J. Rochkind in 1972 at Bell Labs, NJ. It is designed to help programming projects control changes to source code. SCCS provides facilities for storing, updating and retrieving all versions of modules by version number, and it records: who made software change, when and where it was made as well as the reason of the change. The first two implementations of SCCS were: one for IBM 370 under OS and other one for PDP 11under UNIX. [2][3] | | The very first version control system[5] is the Source Code Control System, which was originally written by [http://www.informit.com/authors/bio.aspx?a=AFAC0A13-A141-47D6-B508-04A03D7B4634 Marc J. Rochkind] in 1972 at Bell Labs, NJ. It is designed to help programming projects control changes to source code. SCCS provides facilities for storing, updating and retrieving all versions of modules by version number, and it records: who made software change, when and where it was made as well as the reason of the change. The first two implementations of SCCS were: one for IBM 370 under OS and other one for PDP 11under UNIX. [2][3] |
|
| |
|
|
| |
|
Line 35: |
Line 35: |
| Additionally, in the early 80s, when RCS was released, there were also competitors such as: IBM CLEAR/CASTER, AT&T SCCS, and CMU Software Development control system. | | Additionally, in the early 80s, when RCS was released, there were also competitors such as: IBM CLEAR/CASTER, AT&T SCCS, and CMU Software Development control system. |
|
| |
|
| == Embedded (Ad-Hoc) SQL == | | =Concurrent Version System= |
|
| |
|
| The traditional approach uses simple strings and API calls to connect to databases and return their results. In this example in PHP, the title and date of the event with an ID greater than 5 is echoed:
| | CVS , while using RCS underneath, is a lot more powerful tool and can control a complete source code tree. It can be greatly customized with scripting languages like PERL, Korn and bash shells [9]. |
|
| |
|
| <code>
| | CVS offers the following significant advantages over RCS: |
| <?php
| | * It can run scripts which log CVS operations or enforce site-specific polices. |
| $link = mysql_connect('localhost', 'mysql_user',
| | * CVS enables developers from different geographical location to function as a single team. Information is stored on a single central server and the client machines have a copy of all the files. The client- server connection must be up to perform CVS operations but need not be up to edit or manipulate the current versions of the files. |
| 'mysql_password');
| | * It can merge changes from non-CVS vendor branches. |
| if (!$link) {
| | * Allows more than one developer to work on the same file at the same time |
| die('Could not connect: ' . mysql_error());
| | * CVS servers run on most OS including unix-variants, Windows and OS/2 etc |
| }
| |
| echo 'Connected successfully';
| |
| mysql_close($link);
| |
|
| |
| $sql = "SELECT title, date
| |
| FROM events
| |
| WHERE id > 5";
| |
|
| |
| $result = mysql_query($sql);
| |
|
| |
| while ($row = mysql_fetch_assoc($result)) {
| |
| echo $row["title"];
| |
| echo $row["date"];
| |
| }
| |
| ?>
| |
| </code>
| |
|
| |
|
| == Stored Procedures ==
| |
|
| |
|
| Stored procedures are mechanisms that encapsulate SQL queries and other SQL business logic within the database server itself. It has the advantage of decoupling the SQL syntax from the client application, but calls to these stored procedures themselves are still not transparent from within the client application. An [http://www.sqlteam.com/article/stored-procedures-an-overview example of a SQL stored procedure] in Microsoft SQL Server is as follows:
| | =SVN= |
|
| |
|
| CREATE PROCEDURE spCaliforniaAuthors
| | Subversion is the next-in-line of version control system. CVS has a number of problems, primarily caused by its dependency on the RCS file format for versioning files. These and other issues addressed by Subversion include the following ([10] [11] [12] [13]): |
| AS
| | * In CVS, atomicity is not guaranteed. Subversion ensures atomicity. |
| SELECT * FROM authors
| | * CVS has no way to rename files and save versioning history. In Subversion, the common history of file1 and file2 is conserved. Additionally, Subversion can be used for versioning a lot of different things. Directories and file metadata, as well as renamed or copied files, all have their own versioning. |
| WHERE state = 'CA'
| | * In CVS, branching and tagging are expensive operations for big repositories and directory trees, which have a cost proportional to the number of files being branched or tagged. Subversion has made both branching and tagging constant time operations. They are implemented simply by copying the directory being tagged. |
| ORDER BY zip
| | * CVS is not binary file friendly. Any change to a binary file results in the replacement of the old file. Subversion uses a different approach to provide efficient binary diffing which means it can store pdfs and other binary files efficiently. |
| | * If we change a file locally using CVS, and we want to know the difference, then the entire file has to be sent to the server. When we change a file using Subversion repository, a copy of the latest repository revision is made locally. The differences are sent in both directions, which mean a lot less use of bandwidth. |
|
| |
|
| This stored procedure is kept on the database. The application can then call the stored procedure <tt>spCaliforniaAuthors</tt> directly, without concerning themselves with the lower level implementation details.
| |
|
| |
|
| == The ORM Approach == | | =Bazaar= |
|
| |
|
| Using object-relational mapping, the internal connection details are hidden, usually in external configuration files. The application developer need not have any knowledge of SQL programming, and can manipulate and access database objects in an object-oriented domain. In this example using Java and Hibernate, the programmer created a new event and then saves it to a database:
| | Bazaar is one of the most famous distributed style tree versioning tool. Distributed structure is superior to central style version control systems in [http://wiki.bazaar.canonical.com/BzrWhy many terms]. |
|
| |
| Session session = HibernateUtil.getSessionFactory().
| |
| getCurrentSession();
| |
| session.beginTransaction();
| |
|
| |
| Event theEvent = new Event();
| |
| theEvent.setTitle(title);
| |
| theEvent.setDate(date);
| |
| session.save(theEvent);
| |
|
| |
| session. getTransaction().commit();]
| |
|
| |
|
| == Microsoft LINQ == | | = Summary = |
| | |
| Microsoft provides a hybrid approach to object-relational mapping, called LINQ, which [http://msdn.microsoft.com/en-us/library/bb425822.aspx interleaves database metadata with object-oriented programming] to allow for language-integrated querying of relational databases.
| |
| | |
| [Table(Name="Customers")]
| |
| public class Customer
| |
| {
| |
| [Column(IsPrimaryKey=true)]
| |
| public string CustomerID;
| |
| [Column]
| |
| public string City;
| |
| }
| |
| | |
| Unlike other ORM mechanics, which try to hide non-OO querying as much as possible, or use external library calls for direct SQL queries, LINQ to SQL provides a run time infrastructure for managing relational data as objects without losing the ability to query through SQL-like constructs:
| |
| | |
| var q =
| |
| from c in Customers
| |
| where c.City == "London"
| |
| select c;
| |
|
| |
| foreach (var cust in q)
| |
| Console.WriteLine("id = {0}, City = {1}",
| |
| cust.CustomerID, cust.City);
| |
| | |
| These SQL-like constructs are integrated into the language itself, without the need for external library calls, and are available to non-database data structures as well, if they support the appropriate interfaces.
| |
| | |
| = Comparison =
| |
| | |
| With our examples in hand, we can now discuss the advantages and disadvantages of the various approaches to marrying object-oriented programming and relational databases.
| |
| | |
| == Ease of Programming ==
| |
| | |
| Those who come from a database background find embedded SQL easy to use because queries are written in a SQL language that they are already familiar with. The other advantage of embedded SQL is that it allows the application developers to exploit the properties and function of their particular database, at the expense of reducing portability of the application. From a debugging perspective, the database logic is directly embedded within the source code, which avoids the [http://wapedia.mobi/en/Yo-yo_problem yo-yo effect] between the application source and database source. [http://www.codinghorror.com/blog/archives/000117.html].
| |
| | |
| Stored procedures fare slightly better and have some benefits over embedded SQL. [http://articles.techrepublic.com.com/5100-10878_11-5766837.html] They decouple the database logic from the application logic, where database programmers can work on database logic and application programmers can independently work on application logic. In addition, stored procedures can be changed without having to recompile the application, leading to more modular implementations.
| |
| | |
| Still, the use of stored procedures is not without criticism. Stored Procedures are written in big iron database languages like PL/SQL or T-SQL, which tend to be archaic in functionality. Stored procedures cannot be debugged easily because they cannot be debugged in the same process as the application. Finally, stored procedures cannot pass objects, and instead must pass primitive data types back and forth, often resulting in passing back and forth an overly large number of parameters to these procedures to accomplish a task. [http://www.codinghorror.com/blog/archives/000117.html]
| |
| | |
| ORM frameworks have the advantage in that developers work with objects and the mapping tools that enable data persistence.
| |
| transparently. For most applications, this provides a natural object-oriented way to access relational databases. When queries are required that cannot easily be expressed in an object-oriented domain, these ORM tools typically provide standardized, proprietary SQL-like syntax, such as HSQL, thereby abstracting the details of the database itself.[http://espresso.cs.up.ac.za/publications/Espresso%20SAICSIT%20ODBMS%20Presentation%20v6.pdf]
| |
| | |
| While most ORM frameworks are easy to program from an application developer perspective, the initial configuration of such frameworks can be daunting due to the framework's generality and [http://www.hibernate.org/hib_docs/reference/en/html/session-configuration.html large number of configuration parameters]. Configuration of ORM is difficult enough that there exist meta code generation tools such as [http://www.hibernate.org/72.html XDoclet] and [http://boss.bekk.no/boss/middlegen/ Middlegen], which can be thought of as compilers in their own right.
| |
| | |
| Finally, let us look at [http://en.wikipedia.org/wiki/Language_Integrated_Query LINQ], a variation of ORM which adds querying capabilities to .NET 2.0 and provides operations similar to that of SQL. LINQ's major advantage is that it provides consistent domain modeling, while hiding the mundane code (LINQ-to-SQL) that often gets exposed either in configuration of ORM or in embedded SQL. [http://blogs.vertigo.com/personal/petar/Blog/archive/2008/01/04/why-we-notneed-to-use-linq.aspx] LINQ also provides one source and query language for multiple data stores, such as relational data, XML data, and other .NET objects [http://www.eleves.ens.fr/home/rossant/docs/linq.pdf], and it is integrated within the syntax of the language. With respect to ease of programming, LINQ is a clear winner, but its model of embedding database metadata and querying directly within application logic may be met with resistance by individuals who prefer separating database programmers from application developers. [http://reddevnews.com/features/article.aspx?editorialsid=707] LINQ is also specifically developed for use in Microsoft Visual Studio, and does not work in other languages or environments.
| |
| | |
| == Robustness ==
| |
| | |
| While embedded SQL is one of the easiest ways to connect to a database, it is also one of the least robust. Embedded SQL is commonly subject to [http://en.wikipedia.org/wiki/SQL_injection SQL injection] attacks, though these attacks can be mitigated by libraries with careful programming. Embedded SQL also reduces the elegance of code by increasing coupling between the database and the application itself. In the PHP sample code provided, for instance, the code is tied to a MySQL database using the <tt>mysql_connect</tt> function.
| |
| | |
| Since the SQL code is interspersed between lines of non-SQL code, it can be difficult to maintain and modify if the SQL code needs to be modified. Indeed, there is significant impact to the understandability, testability, adaptability, and other quality aspects of the overall system.[http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/scam/2007/2880/00/2880toc.xml&DOI=10.1109/SCAM.2007.23] Despite these issues, embedded SQL remains a popular choice because it works relatively well for many smaller business applications that mainly do basic [http://en.wikipedia.org/wiki/Create,_read,_update_and_delete CRUD] operations, even in languages like Java that support object-oriented design.
| |
| | |
| Stored procedures are certainly more robust than embedded SQL. Since SQL code stays in the database, and application code stays in the application, it is easier to maintain. The [http://msdn.microsoft.com/en-us/library/ms978510.aspx .NET Data Access Architecture Guide] specifies other reasons that stored procedures are more robust:
| |
| | |
| # Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
| |
| # Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.
| |
| # Stored procedures add an extra level of abstraction from the underlying database schema. The client of the stored procedure is isolated from the implementation details of the stored procedure and from the underlying schema.
| |
| | |
| Finally, we turn to ORM and LINQ. Both of these mechanisms can be considered robust because they [http://www.hibernate.org/hib_docs/reference/en/html/session-configuration.html abstract entirely the database connection details] and handling from the application developer. One can change databases from say, Microsoft SQL Server to PostgreSQL, with no change in application logic other than the editing of the ORM database configuration files. However, this increased robustness is a trade off and requires sacrifices in ease of programming with respect to smaller projects and potential sacrifices in performance and efficiency, as will be discussed in the following section.
| |
| | |
| == Efficiency ==
| |
| | |
| The efficiency of the different mechanisms for database access are depend on the application, implementation, and a variety of other factors. As such, we describe the differences in efficiency only qualitatively rather than quantitatively.
| |
| | |
| At first glance, it may appear that embedded SQL is highly efficient. After all, embedded SQL queries are low-level strings passed directly to the SQL database, and SQL queries can be custom written to take advantage of special capabilities of the particular database. Similarly, others believe that the overhead of processing stored procedures results in a performance penalty. The real answer is not as cut and dried, especially when comparing against stored procedures. Here's why: [http://codebetter.com/blogs/karlseguin/archive/2008/01/02/foundations-of-programming-part-6-nhibernate.aspx]
| |
| | |
| # In many cases, you can get better performance by looping and filtering data within the database than at the Data Access Layer. This is because databases are intrinsically designed to do this, while application developers have to write their own code.
| |
| # Stored procedures can be used to batch common work together or retrieve multiple sets of data. This batching of data reduces network traffic and and consolidates work.
| |
| # Many databases can also take advantage of execution plans, which allow them to cache stored procedures versus embedded SQL where each execution would have to be recalculated for each request.
| |
| # Databases tend to optimize better the closer they are to the bare metal. The easiest way to get performance out of your database is to do everything you can to take advantage of the platform you are running on, which means running queries as close to the database as possible.
| |
| | |
| ORM approaches have the opposite default criticism since many assume that their efficiency is lower than stored procedures or embedded SQL because of their overhead in performing O/R mapping. Again, the answer is not as clear cut. Let us examine one of more popular ORM frameworks, Hibernate, for evidence: Hibernate implements an extremely high-concurrency architecture with no resource-contention issues (apart from the obvious - contention for access to the database). This architecture scales extremely well as concurrency increases in a cluster or on a single machine. [http://www.hibernate.org/15.html]
| |
| | |
| LINQ users can achieve optimum performance by leveraging the capabilities of their platform [http://www.singingeels.com/Articles/Improving_Performance_With_LINQ.aspx]. Knowing how to utilize the LINQ data context in conjunction with appropriately crafted queries permits the implementation to avoid unnecessary database activity, such as eliminating write-backs when no data has changed.
| |
| | |
| For those interested in competitive performance challenges, the [http://www.polepos.org/ PolePosition] benchmark suite provides rigorous comparison of ORM application-implementation pairs, with some test data archived at their site for specific implementations running well-defined test cases.
| |
| | |
| = Implementations =
| |
| | |
| Languages and environments as diverse as .Net and PHP support ORM [http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software].
| |
| | |
| We will focus on specific implementations in Java, Ruby on Rails, Microsoft .NET, PHP, ASP Classic, and JDBC.
| |
| | |
| == ORM in Java ==
| |
| | |
| In recent years, Java has experienced a paradigm shift from complex heavy-weight frameworks such as [http://java.sun.com/products/ejb/ Enterprise Java Beans] to more light-weight agile frameworks that rely instead of simple Plain Old Java Objects ([http://en.wikipedia.org/wiki/POJO POJOs]). This in turn, has increased the popularity of ORM for Java developers.
| |
| | |
| Indeed, Object-relational mapping is especially popular in the Java community, compared, for example to .NET developers. [http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/mags/co/&toc=comp/mags/co/2005/01/r1toc.xml&DOI=10.1109/MC.2005.22] Although a [http://java-source.net/open-source/persistence plethora of ORM frameworks exist] for Java, among the most popular and widespread ORM layers today include Sun's [http://java.sun.com/jdo/ JDO] and the somewhat entrenched open source O/R mapping framework, [http://www.hibernate.org Hibernate].
| |
| | |
| We begin with Java Data Objects (JDO), which by itself is not a framework, but a specification. The API is a standard interface-based Java model abstraction of persistence, developed under the auspices of the [http://jcp.org/ Java Community Process]. Frameworks like [http://db.apache.org/jdo/ Apache JDO] then implement this specification. JDO aims to provide implementations for not only relational databases, but also object databases, and file systems.
| |
| | |
| Hibernate is another ORM implementation, and though it is open source, it is often considered "proprietary" because it does not directly implement the JDO specification or Java Community Process specifications. Still, Hibernate's momentum has resulted in it becoming a de facto standard in the Java industry, furthered by frameworks such as [http://www.springframework.org/ Spring] that use it as a building block.
| |
|
| |
|
| == ORM in Ruby on Rails ==
| | Using VCS has many benefits. For example: if a team conducts the project, there will be harmony among the team members, and no one will write over other people’s code. Moreover, every change (version) will be stored in VCS repository. This will enable team members to see the differences between versions of the same file, and they will know the time of the changes as well as the responsible team member. Furthermore, Development can be spit into different branches, each branch, lets say keeping track of the fixes associated with a software release. Then file versions can be obtained with a branch and can be applied to the same fix to multiple branches. Another benefit is that, VCS repository can be useful to understand how good the project was: How many lines changed from the previous version? Which are the most and least productive days of the week? Which team member made the most contribution? [8] |
|
| |
|
| Ruby on Rails is an interesting example of the use of ORM, which provides the [http://en.wikipedia.org/wiki/Active_record_pattern ActiveRecord] pattern implemented as [http://wiki.rubyonrails.org/rails/pages/ActiveRecord ActiveRecord in Rails]. In addition to mapping objects to table rows, and providing persistence in the traditional ORM sense, Ruby on Rails has language-level support for ORM with ActiveRecord. Rails allows database-oriented web services to be configured and deployed with a minumum of development, featuring the classic ''Model'', ''View'', ''Controller'' (M-V-C) architecture. In this context, ''Rails ActiveRecord'' implements the Model layer of the M-V-C architecture.
| | = References = |
|
| |
|
| == ORM in Microsoft .NET ==
| | [http://doc.bazaar.canonical.com/bzr.2.1/en/user-guide/introducing_bazaar.html] Bazaar v2.1 Documentation |
| Although ORM has achieved significant penetration of the Java world, there are now a variety of [http://en.wikipedia.org/wiki/Category:.Net_Object-relational_mapping_tools .NET implementations enabling ORM.]
| |
|
| |
|
| An obvious choice for those already aware of [http://en.wikipedia.org/wiki/Hibernate_(Java) Hibernate for Java], is the .NET port of ''Hibernate'' known as [http://www.hibernate.org/343.html NHibernate]. ''NHibernate'' supports a number of database platforms [http://www.hibernate.org/361.html] and permits developers to work in native .NET [http://www.hibernate.org/343.html], integrating Plain Old CLR Objects ([http://en.wikipedia.org/wiki/POCO POCOs]) data with the underlying database. For a practical getting-started developer guide, read [http://www.developer.com/net/asp/article.php/3709346 Using NHibernate as an ORM Solution for .NET].
| | [http://code.google.com/p/pysync/wiki/VCSHistory] A Brief History of VCS |
|
| |
|
| == LINQ in Microsoft .NET ==
| | [http://sccs.berlios.de/] SCCS - The POSIX Source Code Control System |
|
| |
|
| As described previously, Microsoft's [http://msdn.microsoft.com/en-us/netframework/aa904594.aspx LINQ] provides ''language-integrated query''[http://msdn.microsoft.com/en-us/library/bb308959.aspx#linqoverview_topic1] to the .NET environment. Supporting C# and Visual Basic, ''LINQ'' provides separate, optimized ''providers'' for access from the programming language level to relational data objects [http://msdn.microsoft.com/en-us/library/bb425822.aspx], XML documents[http://msdn.microsoft.com/en-us/library/bb308960.aspx], and SQL databases [http://msdn.microsoft.com/en-us/library/bb425822.aspx].
| | [http://basepath.com/aup/talks/SCCS-Slideshow.pdf] The Source Code Control System by Marc J. Rochkind |
| | |
| == Dynamic SQL in PHP, ASP Classic, and JDBC ==
| |
| | |
| Despite the availability of object-relational libraries, dynamic SQL continues to be a popular development mechanism for interfacing with databases. Dynamic SQL is not an object-oriented methodology, but rather a "bare metal" programming approach where SQL strings are directly constructed through concatenation or other low-level mechanisms and then directly passed to the database. The results of such queries are themselves lower level objects like record sets or hash tables.
| |
| | |
| = Summary =
| |
|
| |
|
| This article has introduced ORM from the user's perspective, motivating the discussion with the design patterns relevant to integrating a relational database to an object oriented application. We have learned the basic advantages of ORM and the techniques by which it is employed. In addition, we have seen examples of ORM usage in several implementations, and we have explored the various approaches to ORM in languages and environments as diverse as Java / Hibernate, and Microsoft LINQ. In addition, we have compared the Efficiency and Programming Ease of the different implementations. ORM has become ubiquitous as object technology development marches forward in concert with advances in networks and distributed architectures.
| | [http://en.wikipedia.org/wiki/Source_Code_Control_System] Source Code Control Systems |
|
| |
|
| == References ==
| | [http://www.cs.purdue.edu/homes/trinkle/RCS/rcs.ps] Tichy, Walter RCS-A system for Version Control |
|
| |
|
| [1] [http://doc.bazaar.canonical.com/bzr.2.1/en/user-guide/introducing_bazaar.html] Bazaar v2.1 Documentation
| | [http://en.wikipedia.org/wiki/Revision_Control_System] Revision control System |
|
| |
|
| [2] [http://code.google.com/p/pysync/wiki/VCSHistory] A Brief History of VCS
| | [http://www.computer.org/portal/web/csdl/doi/10.1109/MS.2006.32] Version Control Systems,” by Diomidis Spinellis (IEEE Software, vol. 22, no. 5, 2005, pp. 108–109) |
|
| |
|
| [3] [http://sccs.berlios.de/] SCCS - The POSIX Source Code Control System
| | [http://www.faqs.org/docs/Linux-HOWTO/CVS-RCS-HOWTO.html#s2] CVS or RCS ? |
|
| |
|
| [4] [http://basepath.com/aup/talks/SCCS-Slideshow.pdf] The Source Code Control System by Marc J. Rochkind
| | [http://www.pushok.com/soft_svn_vscvs.php] SVN vs CVS |
|
| |
|
| [5] [http://en.wikipedia.org/wiki/Source_Code_Control_System] Source Code Control Systems
| | [http://www.linux.ie/articles/subversion/] Subversion - a better CVS |
|
| |
|
| [6] [http://www.cs.purdue.edu/homes/trinkle/RCS/rcs.ps] Tichy, Walter RCS-A system for Version Control
| | [http://www.differencebetween.net/technology/difference-between-cvs-and-subversion/] Difference between CVS and Subversion |
|
| |
|
| [7] [http://en.wikipedia.org/wiki/Revision_Control_System]
| | [http://en.wikipedia.org/wiki/Apache_Subversion] Apache Subversion |
"A source code system is a giant UNDO key-a project-wide time machine"
-Andy Hunt and Dave Thomas
Introduction
Dozens of version control systems are available in the market. Most of them are open source and free. First version control system was released in the early 1970s and new tools are released even today. Briefly, VCSs can be categorized into four [1]:
- File versioning tools, examples: SCCS, RCS
- Tree versioning Tools-central style, example: CVS
- Tree versioning tools-central style, example: Subversion
- Tree versioning tool-distributed style, example Bazaar
Source Code Control System
The very first version control system[5] is the Source Code Control System, which was originally written by Marc J. Rochkind in 1972 at Bell Labs, NJ. It is designed to help programming projects control changes to source code. SCCS provides facilities for storing, updating and retrieving all versions of modules by version number, and it records: who made software change, when and where it was made as well as the reason of the change. The first two implementations of SCCS were: one for IBM 370 under OS and other one for PDP 11under UNIX. [2][3]
Source Code control System can be categorized under file versioning tools. It only versions individual files. SCCS was an effective method for small projects. One Source file results in one SCCS history file. It is easy to understand format of the history files that allows manual intervention. Also, checksums and forward deltas grant file integrity and immediate detection of corruption. It was the major form of source code control especially on UNIX platforms until the release of Revision Control Systems.
Revision Control System
Revision Control System[7] is the successor to SCCS and written by Walter F. Tichy (now a faculty of University Karlsruhe, Germany) in 1982 at Purdue University. Just like SCCS, RVS is a file versioning tool. The fundamental storage unit is a revision group. Also it supports branches within a file. Unlike SCCS it supports merges. RCS storage on the mainline uses reserve deltas: The latest revision is stored intact but earlier revisions are stored as deltas from the latest. On branches, revisions are stored as forward deltas. Thus checking out branches is slow. For Example:Main line of code has 100 revisions. Assume that there is branch at revision 10. This branch has 80 revisions. To check out the branch tip, RCS must do the followings:
- Retrieve mainline revision 100
- Retrieve and apply reverse deltas from 99 down to 10
- Retrieve and apply 80 forward deltas on the branch
Additionally, in the early 80s, when RCS was released, there were also competitors such as: IBM CLEAR/CASTER, AT&T SCCS, and CMU Software Development control system.
Concurrent Version System
CVS , while using RCS underneath, is a lot more powerful tool and can control a complete source code tree. It can be greatly customized with scripting languages like PERL, Korn and bash shells [9].
CVS offers the following significant advantages over RCS:
- It can run scripts which log CVS operations or enforce site-specific polices.
- CVS enables developers from different geographical location to function as a single team. Information is stored on a single central server and the client machines have a copy of all the files. The client- server connection must be up to perform CVS operations but need not be up to edit or manipulate the current versions of the files.
- It can merge changes from non-CVS vendor branches.
- Allows more than one developer to work on the same file at the same time
- CVS servers run on most OS including unix-variants, Windows and OS/2 etc
SVN
Subversion is the next-in-line of version control system. CVS has a number of problems, primarily caused by its dependency on the RCS file format for versioning files. These and other issues addressed by Subversion include the following ([10] [11] [12] [13]):
- In CVS, atomicity is not guaranteed. Subversion ensures atomicity.
- CVS has no way to rename files and save versioning history. In Subversion, the common history of file1 and file2 is conserved. Additionally, Subversion can be used for versioning a lot of different things. Directories and file metadata, as well as renamed or copied files, all have their own versioning.
- In CVS, branching and tagging are expensive operations for big repositories and directory trees, which have a cost proportional to the number of files being branched or tagged. Subversion has made both branching and tagging constant time operations. They are implemented simply by copying the directory being tagged.
- CVS is not binary file friendly. Any change to a binary file results in the replacement of the old file. Subversion uses a different approach to provide efficient binary diffing which means it can store pdfs and other binary files efficiently.
- If we change a file locally using CVS, and we want to know the difference, then the entire file has to be sent to the server. When we change a file using Subversion repository, a copy of the latest repository revision is made locally. The differences are sent in both directions, which mean a lot less use of bandwidth.
Bazaar
Bazaar is one of the most famous distributed style tree versioning tool. Distributed structure is superior to central style version control systems in many terms.
Summary
Using VCS has many benefits. For example: if a team conducts the project, there will be harmony among the team members, and no one will write over other people’s code. Moreover, every change (version) will be stored in VCS repository. This will enable team members to see the differences between versions of the same file, and they will know the time of the changes as well as the responsible team member. Furthermore, Development can be spit into different branches, each branch, lets say keeping track of the fixes associated with a software release. Then file versions can be obtained with a branch and can be applied to the same fix to multiple branches. Another benefit is that, VCS repository can be useful to understand how good the project was: How many lines changed from the previous version? Which are the most and least productive days of the week? Which team member made the most contribution? [8]
References
[1] Bazaar v2.1 Documentation
[2] A Brief History of VCS
[3] SCCS - The POSIX Source Code Control System
[4] The Source Code Control System by Marc J. Rochkind
[5] Source Code Control Systems
[6] Tichy, Walter RCS-A system for Version Control
[7] Revision control System
[8] Version Control Systems,” by Diomidis Spinellis (IEEE Software, vol. 22, no. 5, 2005, pp. 108–109)
[9] CVS or RCS ?
[10] SVN vs CVS
[11] Subversion - a better CVS
[12] Difference between CVS and Subversion
[13] Apache Subversion