CSC/ECE 517 Spring 2013/ch1b 1m yk: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
<big>'''Continous Integration'''</big>
<big>'''Continuous Integration'''</big>


__TOC__
__TOC__
==Definition==
'''"We keep our code ready to ship"'''
[http://en.wikipedia.org/wiki/Continuous_integration Continuous Integration] is a software development practice where members of a team integrate their work frequently. This approach leads to significantly reduced integration problems and allows a team to develop [http://en.wikipedia.org/wiki/Cohesion_(computer_science) cohesive] software more rapidly


==Introduction==
==Introduction==
Continuous Integration [http://en.wikipedia.org/wiki/Continuous_integration (CI)] is a software development practice that is commonly used in [http://en.wikipedia.org/wiki/Agile_software_development Agile software development] methodology. <ref name="Agile">"Manifesto for Agile Software Development" http://www.agilemanifesto.org/</ref>Agile is a software development method based on iterative and incremental development. In principle, it works with small team, requires developers to deliver working code in short iterations, and uses Test Driven Development [http://en.wikipedia.org/wiki/Test-driven_development (TDD)], which requires developers to develop tests before writing code. Also it requires all the codes to be reviewed and tested continuously as the development goes, which is where CI fits in.
Software development landscape has changed lately. In the previous years, one third of the development time was spent on writing exhaustive specifications, which then went through intensive reviews, and then spend months before something was released; in the last years the focus was changed. Nowadays the specs are sketches at most, and releases are at least twice a month in order to get rapid feedback from the customer, allowing for a more focused product.
 
The big question is though: How can we developers adapt to the new world?
 
Continuous Integration aims to improve the quality of software, and to reduce the time taken to deliver it, by replacing the traditional practice of applying quality control after completing all development. It is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day.
 
Continuous Integration is about integrating changes to the development project continuously and automatically. These changes usually come in form of modifications and additions to the source code. When we keep them as small as possible and automatically run all checks available (starting from the compilation process and up to the full test routines), we can detect most breaking issues whenever they are introduced. Since the changes are small, it is much easier to isolate and fix the root cause
 
==Process Involved==
Continuous integration is achieved by reducing development friction involved in project integration. This is done with the help of:
* Version Control Systems that allow large and distributed teams to work together on the same project, while sending their code changes (commits) to the central server. Each commit is stored on the server and could be reviewed or reverted.
* Build Automation that is basically about build tools and scripts that allow to start the complete integration build on a local machine with a single mouse-click.
* Unit Tests ensuring that code stays stable through all the changes (this normally comes with the Test-Driven Development practices).
* Code Quality Tests that consistently enforce development guidelines accepted by the team.
* Regular Commits by developers that reduce amount of change involved in every integration.
* Integrating Every Commit (as soon as it comes) on a central server allows to detect any compilation failures or broken tests immediately. Also this is the best way to deal with "but it works on my machine" problem.
* Publishing Integration Results that get created along with the build project might simplify getting feedback from people outside of the development team. Examples of these results are: unit test results, performance metrics, install packages, code quality reports etc.
 
==Basic Practices to follow==
===Maintain a single source [http://en.wikipedia.org/wiki/Software_repository repository]===
 
Software projects involve lots of files that need to be orchestrated together to build a product. Keeping track of all of these is a major effort, particularly when there's multiple people involved. These tools - called [http://www.sourcecodemanagementtools.com/ Source Code Management tools], configuration management, version control systems, repositories, or various other names - are an integral part of most development projects. The base of a Continuous Integration system is to implement a good source control management system to keep track and control all of the files needed to build a product.
 
You must put everything required for a build in the source control system. One of the features of version control systems is that they allow you to create multiple branches, to handle different streams of development. This is a useful feature - but it's frequently overused and gets people into trouble. Keep your use of branches to a minimum. In particular have a mainline: a single branch of the project currently under development. Pretty much everyone should work off this mainline most of the time.
 
[http://en.wikipedia.org/wiki/Comparison_of_revision_control_software A comparison of the common version control systems]
 
===Automate the build===
 
Getting the sources turned into a running system can often be a complicated process involving compilation, moving files around, loading schemas into the databases, and so on. However like most tasks in this part of software development it can be automated - and as a result should be automated. Asking people to type in strange commands or clicking through dialog boxes is a waste of time and a breeding ground for mistakes.
 
To get an efficient Continuous Integration system you need to implement an automatic build process. The details of this step depend upon the implementation language and [http://en.wikipedia.org/wiki/Development_environment development environment] and may involve compiling and linking and/or other processes that produce the build artifacts, such as the program [http://en.wikipedia.org/wiki/Executable executable].
 
[http://en.wikipedia.org/wiki/List_of_build_automation_software#Comparison_of_build_automation_software A comparison of the common build automation systems]
 
===Make the build self-testing===
 
Traditionally a build means [http://en.wikipedia.org/wiki/Compile compiling], [http://en.wikipedia.org/wiki/Links_(programming_language) linking], and all the additional stuff required to get a program to execute. A program may run, but that doesn't mean it does the right thing. Modern statically typed languages can catch many bugs, but far more slip through that net. A good practice that will help you to detect errors quickly is to include some automated tests in your build process. If possible, execute a sub-set of the system tests as part of the build to ensure that the build is suitable prior to committing resources to the full scope of system testing. While the level of testing may vary, focus on gaining confidence that the increment is of sufficient quality to establish a baseline for system testing. If some testing fails, the build should also fail.
 
[http://en.wikipedia.org/wiki/Test_automation#Notable_test_automation_tools List of some common test automation tools]
 
===Everyone commits to the baseline every day===


==What is CI?==
Integration is primarily about communication. Integration allows developers to tell other developers about the changes they have made. Frequent communication allows people to know quickly as changes develop. A good practice is to force developers to commit their changes to the main development stream at least once every day; before delivering the files, each developer must verify that his/her working copy is consistent with the main development stream, resolve conflicts if they exist, build a local test, and if this passes, commit changes to the main development stream.  
CI is used to ensure that all changes to a software project’s code are successfully built, tested, reported on, and rapidly made available to all parties after they are introduced. <ref name="CIRef1">Enterprise Solution Providers, Inc. "Why and How to build a Continuous Integration Environment for the .NET platform" http://www.espusa.com/whitepapers/continuous_integration_v1.0.pdf</ref> It is aimed to deliver high quality software using the least amount of time, by applying quality control throughout the whole software development process. Without CI, there is no chance for Agile to succeed in delivering refined products in short release cycles.


==How does CI work?==
===Every Commit Should Build the Mainline on an Integration Machine===
[[File:ContinuousIntegration.jpg|center]]


The above diagram shows an overview of a simple CI environment, which will give us a good idea of how CI works. <ref name="CIRef2">Thomas Jaspers.  "Continuous Integration – Overview" http://blog.codecentric.de/en/2009/11/</ref>.In order to apply quality control throughout the project, CI requires members of the team integrate their work into the main code repository at least daily, which leads to multiple integrations per day. Each developer will first build and run all the tests locally to ensure all the changes are good, then commit the changes to the repository using some version control tools, such as Subversion. Once the CI sever detects the new commits, it will kick out an automated build that runs all the automated tests that could detect compile/integration errors as quickly as possible. If the automated build fails, the team will be notified and required to resolve the error immediately. In addition, at least once every 24 hours, a nightly build will be executed to ensure the quality of the code in the main repository. <ref name="CIRef5">Laurie Williams.  "Scrum + Engineering Practices: Experiences of Three Microsoft Teams" http://collaboration.csc.ncsu.edu/laurie/Papers/ESEM11_SCRUM_Experience_CameraReady.pdf</ref>
Although developers must run local builds and tests in their local machines before delivering to the main development stream, there may be differences between each developer's machine, or code integration errors. This is why it is very important to ensure that an integration build is run on an integration machine each time each a developer commits some changes. The developer that delivers the changes should be responsible for monitoring this integration build and fix it if there is any problem. Working this way ensures that errors can be found and solved easily and quickly. Developers are expected to commit to fixing problems with the build quickly, resulting in a successful build every day. There are three approaches to it:
* To let the developer manually request a build execution after she has committed some new code to the stream.  
* To ensure that an automatic build will be executed after a code commit to the stream (uses a continuous integration server).  
* To use a [http://en.wikipedia.org/wiki/Scheduling_(computing) scheduler] so that automatic builds can be executed daily, weekly basis, etc. (uses a continuous integration server).
[http://en.wikipedia.org/wiki/Comparison_of_continuous_integration_software List of some popular continuous integration servers]


==Why Implement CI?==
===Keep the build fast===
Most of the major companies producing software for sale are in the business to make money. Companies have discovered that the best way to make money is decrease the number of defects going out to customers. Tons of research has been done on how to make quality software with fewer defects. The usual cost of fixing a defect slipping from one phase to next phase increases by ten times.  So for example if a defect cost $60 to fix during the coding phase will cost $600 if found during validation testing. Moreover if a defect is shipped along with the product to the customers it can certain times cost millions of dollars. Additionally these small defects can have big impact on company’s reputation. Below are some examples <ref name="Bugs">Top Ten Most Infamous Software Bugs Of All Time" http://able2know.org/topic/129489-1</ref>which shows how small defects when slipped or goes out undetected can cost millions of dollars.


# In 2007 a single faulty piece of embedded software, on a network card, sends out faulty data on the United States Customs and Border Protection network, bringing the entire system to a halt. Nobody is able to leave or enter the U.S. from the LA Airport for over eight hours. Over 17,000 planes were grounded for the duration of the outage resulting in millions of dollars in damage.
It is important to reduce to a minimum the time spent running the integration build each time a developer commits changes to the main development stream. The problem is that this is not always possible, and complete integration builds can take a lot of time and resources. If this is your problem, a good best practice is to split your integration build into different stages: create a commit build that compiles and verifies the absence of critical errors when each developer commits changes to the main development stream, and create secondary build(s) to run slower and less important tests. You need to ensure that commit builds are always successful, but if any secondary build fails it does not have to be so critical as to stop everything. This allows your developers to continue working while these secondary errors are fixed.  
# A forgotten error handling statement which caused the famous ping of death also known as blue screen in 1995.  A lack of error handling in the IP fragmentation reassembly code makes it possible to crash many Windows, Macintosh, and Unix operating systems by sending a malformed “ping” packet from anywhere on the Internet.
# In 2004, EDS software giant introduced a large, complex IT system to the U.K.’s Child Support Agency (CSA). At the exact same time, the Department for Work and Pensions (DWP) decided to restructure the entire agency. The restructure and the new software were completely incompatible, and irreversible errors were introduced as a result. With over 500 bugs still reported as open in the new system, the clash of the two events has crippled the CSA’s network. As a result the system somehow managed to overpay 1.9 million people, underpay another 700,000, had $7 billion in uncollected child support payments, a backlog of 239,000 cases, 36,000 new cases “stuck” in the system, and has cost the UK taxpayers over $1 billion to date.


The above examples show why testing phase is as important as coding phase. The more you test the better the software becomes as fewer and fewer defects are found. Below are some of the reasons why big companies are leaning more towards adopting agile and CI for software development.
===Test in a clone of the production environment===
# Many developers work on a software through multiple agile teams. Though the focus of each agile team is a specific module of the software but it is important to make sure the overall integrity of the system is always correct.
# One of the main benefits of agile project development is to have a shippable product at the end of each sprint<ref name="Ready Product">Why Continuous Integration?" http://www.cavdar.net/2009/03/07/why-continuous-integration/</ref>. If testing is kept until the end than you can’t have a shippable product at the end of each sprint. You have to have some automated testing that can test this functionality on daily basis.
# The other problem that used to occur by following traditional software development was that management did not know the exact status of the product which includes the number of bugs in the system until the validation phase. This can cause problem in releasing software on time. CI gives an option to generate various reports to solve these kinds of problems and to give management early indication if the product release can be done on time.
# The main reason for using CI is to cut down on validation cost by finding the bugs early during coding phase.
# CI infrastructure provides way to decrease Time to Response [http://en.wikipedia.org/wiki/Response_time_(technology) (TTR)] for fixing defects.
# Additionally some CI infrastructure provides valuable tools like build management and build acceleration. With build acceleration developers and QA do not have to wait long periods to get a completed builds. Build acceleration decreases the build time by performing non-dependent tasks in parallel resulting in increased productivity.


==CI Setup and Reporting==
The point of testing is to flush out, under controlled conditions, any problem that the system will have in production. A significant part of this is the [http://en.wikipedia.org/wiki/Integrated_development_environment environment] within which the production system will run. If you test in a different environment, every difference results in a risk that what happens under test won't happen in production.
There are lots of different ways continuous integration can be incorporated with a software product development. The level of automation varies by project to project. Some project might be able to achieve complete automation and continuous integration whereas some project can be restricted by the testing infrastructure. For example when making a software for say airplane controls there is only limited amount of automated testing that you can do to test. It requires some level of manual steps. Whereas for computer application like word you can automate the testing and can achieve 100% continuous integration.
 
There are many different kinds of software that are available out there for implementing continuous integration within a software project. Some of the software includes Electric Commander<ref name="EC">Electric Commander" http://www.electriccloud.com/</ref> by [http://en.wikipedia.org/wiki/Electric_Cloud Electric Cloud], CI server by Hudson, and Team Foundation Server, [http://www.microsoft.com/visualstudio/en-us/scenarios/virtual-lab-management Lab Management], Test Manager by Microsoft.  All of the software listed above can be used as the backbone for setting up the CI infrastructure. Additionally, different reporting services can also be added to achieve additional goals. Below are the usual steps that are performed once the CI infrastructure is incorporated.  


As a result you want to set up your test environment to be as exact a mimic of your production environment as possible. Use the same database software, with the same versions, use the same version of operating system. Put all the appropriate libraries that are in the production environment into the test environment, even if the system doesn't actually use them.


===Make it easy for anyone to get the latest executable===


[[File:EC.jpg|center]]
Most builds produce useful output, such as an executable program, a packaged [http://en.wikipedia.org/wiki/ZIP_(file_format) zip] file, or other artifacts. Many people may need to get access to the latest executable to be able to run it or just to see what changed last week. Many times developers are not able to find it because there is not a well known place where these files are stored and they spend many hours just looking for this information. It is essential to make sure there's a well known place where people can find the latest executable. It may be useful to put several executables in such a store. For the very latest you should put the latest executable to pass the commit tests - such an executable should be pretty stable providing the commit suite is reasonably strong.


===Everyone can see the results of the latest build===


Communication is critical to implement a good Continuous Integration system. Each team member needs to have easy and transparent access to the state of the system, last changes made and state of the mainline integration build. Visibility and transparency of the flow of information between team members are essential.


# Whenever the new code or functionality is checked into the repository an automated build process is started.
===Automate deployment===
# Once the product is build unit tests are executed to make sure all the unit tests are passing and that the new functionality has not broken the old functionality. Reports are generated for the results.
# Once unit test are completed and passed automated smoke tests are performed. The application is usually deployed on a remote machine as a fresh copy and the system testing is performed. Reports are generated for the results.
# Then additional reporting are performed on the code to see it the code has fulfilled the required criteria (For example, code coverage > 90%, code complexity, Dead code, Duplicate code, Coding standard enforcement)
# Lastly, when all the 4 steps above are passed additional matrix is generated for management to see. For example TTR to fix a bug, Time between Failures (TBF), Error counts, and Warning reports.


If any of the above steps fail an automated email is generated and send to targeted user to notify that the new code submitted to the repository needs to be fixed. In this way teams can achieve that quick response instead of waiting to hear from validation team which can take days compared to minutes through CI.
To do Continuous Integration you need multiple environments, one to run commit tests, one or more to run secondary tests. Since you are moving executables between these environments multiple times a day, you'll want to do this automatically. So it's important to have scripts that will allow you to deploy the application into any environment easily.


===CI Reporting Metrics===
A natural consequence of this is that you should also have scripts that allow you to deploy into production with similar ease. You may not be deploying into production every day (although I've run into projects that do), but automatic deployment helps both speed up the process and reduce errors. It's also a cheap option since it just uses the same capabilities that you use to deploy into test environments.
The team can also generate reports that indicate the overall health of the application on the CI environment. This application health can be measured by the CI reporting metrics. Some examples of these metrics are shown below:<ref name="CIRef6">Designing data warehouse for equipment management system" http://www.tandfonline.com/doi/abs/10.1080/00207540701222776?journalCode=tprs20#preview</ref>


#'''Time To Response (TTR):''' the elapsed time to fix the most recent broken build
Prefer synchronous integration, in which you wait for the integration to succeed, to asynchronous integration, in which a tool tests the integration for you. Synchronous integration requires fast builds, but ensures that they never break.
#'''Time Between Failures (TBF):''' the elapsed time between consecutive breakages (non-zero errors) of the same build
#'''Unit Test Count:''' the total number of unit tests on each automated build
#'''Unit Test Passed Count:''' the total number of unit tests that are passed on each automated build
#'''Unit Test Covered Lines:''' the count of the non-commented source lines scanned by the coverage tool that are exercised by statement-level automated unit testing on each automated build
#'''Validation Test Count:''' the total number of validation tests on each automated build
#'''Validation Test Passed Count:''' the total number of validation tests that are passed on each automated build
#'''Error Count:''' the total number of errors occurs on each automated build
#'''Warning Count:''' the total number of warnings occurs on each automated build
#'''Code Base Size:''' the total lines of code of the overall organization <ref name="CIRef7">Code base size, complexity and language choice" http://ayende.com/blog/3070/code-base-size-complexity-and-language-choice</ref>


These reporting metrics provide a way to improve the transparency of the project. It helps the team to understand the build, design, and code quality of the overall application; it helps us understand the trend of development process, and eliminate defects before introduction; it also helps developers to understand their individual impact to the project, and improve personal performance.
==Agile Project Development: An Example Workflow==
Let's assume a Developer A has to do something to a piece of software, it doesn't really matter what the task is, for the moment let us assume it's small and can be done in a few hours.
Developer A begins by taking a copy of the current integrated source onto her local development machine, by using a source code management system by checking out a working copy from the mainline.


==Advantages and Disadvantages==
Now Developer A takes her working copy and does whatever she needs to do to complete her task. This will consist of both altering the production code, and also adding or changing automated tests. Once Developer A is done, she carry outs an automated build on her development machine. This takes the source code in her working copy, compiles and links it into an executable, and runs the automated tests. Only if it all builds and tests without errors is the overall build considered to be good.
===Advantages===
Continuous Integration has the following advantages:<ref name="CIRef1"></ref><ref name="wiki">Wikipedia "Continuous integration" http://en.wikipedia.org/wiki/Continuous_integration</ref>
*Early warning of broken/incompatible code and conflicting changes
*Guarantees successfully compiled software
*Visible program reporting and problem tracking
*Easy to revert the code base to bug-free state
*Reduce development integration effort
*Immediate unit testing for all changes
*Constant availability of functioning code for demo or release purposes
*High impact environment upgrade with low maintenance
*Help developers to understand their individual impact to project, and improve their personal performance
===Disadvantages===
Continuous Integration has the following disadvantages:<ref name="CIRef1"></ref><ref name="wiki"></ref>
*Migration of internal development projects into a CI environment requires a lot of initial setup time and tight planning
*Well-developed test-suite is required for the automated build
*Costs for CI building machines
*Requires a good understanding of CI and discretion when setting up projects


Assuming that the CI environment is configured the way it supposes to be, it doesn’t matter how experienced the developer is, once he/she understands how to build and commit code he/she will be always benefited from it. For example, they can find out immediately if the build is broken after they commit the code, and learn how to write high-quality code.
With a good build, Developer A can then think about committing her changes into the repository. The twist is that other people may, and usually have, made changes to the mainline before she gets a chance to commit. So first Developer A updates her working copy with their changes and rebuilds. If their changes clash with her changes, it will manifest as a failure either in the compilation or in the tests. In this case it's Developer A’s responsibility to fix this and repeat until she can build a working copy that is properly synchronized with the mainline. Once she has made her own build of a properly synchronized working copy, she can then finally commit her changes into the mainline, which then updates the repository.


==Conclusion==
However Developer A’s commits doesn't finish her work. At this point she builds again, but this time on an integration machine based on the mainline code. Only when this build succeeds can she say that my changes are done. There is always a chance that Developer A missed something on her machine and the repository wasn't properly updated. Only when her committed changes build successfully on the integration is the job done.
In the end continuous integration infrastructure provides that platform for big companies to produce software that is cost effective and profitable with less defects. Combining it with the agile software development methodologies makes sense as it gives big companies options to have shippable product that is well tested at the end of each sprint. Moreover CI also solves the overall integration problem of big software projects were small teams are working individually on small components that includes open source third party software by testing overall system functionality daily. The above reasoning shows why big companies that are adopting agile as their main software development strategy more than often also adopts continuous integration as their system and component integration and testing infrastructure.
 
If a clash occurs between two developers, it is usually caught when the second developer to commit builds their updated working copy. If not the integration build should fail. Either way the error is detected rapidly. At this point the most important task is to fix it, and get the build working properly again. In a Continuous Integration environment you should never have a failed integration build stay failed for long. A good team should have many correct builds a day. Bad builds do occur from time to time, but should be quickly fixed.
 
The result of doing this is that there is a stable piece of software that works properly and contains few bugs. Everybody develops off that shared stable base and never gets so far away from that base that it takes very long to integrate back with it. Less time is spent trying to find bugs because they show up quickly
 
 
[[File:CIFlowchart.jpeg | center]]
The above figure depicts a [http://falafel.com/ Workflow Diagram]
 
==Advantages==
'''Reduced risks.'''
By integrating many times a day, you can reduce risks on your project. Doing so facilitates the detection of defects, the measurement of software health and a reduction of assumptions.
* Defects are detected and fixed sooner.
* Health of software is measurable.
* Reduce assumptions.
 
Continuous Integration provides a safety net to reduce the risk that defects will be introduced into the code base. The following are some of the risks that Continuous Integration helps to mitigate:
* Lack of cohesive, deployable software
* Late defect discovery
* Low-quality software
* Lack of project visibility
 
'''Reduced repetitive manual processes.'''
Reducing repetitive processes saves time, costs and effort. These repetitive processes can occur across all project activities, including code compilation, database integration, testing, inspection, deployment and feedback. By automating Continuous Integration, you have a greater ability to ensure all of the following.
* The process runs the same way every time.
* An ordered process is followed. For example, you may run inspections (static analysis) before you run tests-in your build scripts.
* The processes will run every time a commit occurs in the version control repository.
 
'''Generate deployable software at any time and at any place.'''
Continuous Integration can enable you to release deployable software at any point in time. From an outside perspective, this is the most obvious benefit of Continuous Integration. With Continuous Integration, you make small changes to the source code and integrate these changes with the rest of the code base on a regular basis. If there are problems, the project members are informed and the fixes are applied to the software immediately.
 
'''Enable better project visibility.'''
Continuous Integration provides the ability to notice trends and make effective decisions, and it helps provide the courage to innovate new improvements. Projects suffer when there is no real or recent data to support decisions, so everyone offers their best guesses. Typically, project members collect this information manually, making the effort burdensome and untimely. The result is that often the information is never gathered. A Continuous Integration system can however provide just-in-time information on the recent build status and quality metrics. Some Continuous Integration systems can also show defect rates and feature completion statuses. Because integrations occur frequently with a Continuous Integration system, the ability to notice trends in build success or failure, overall quality and other pertinent project information becomes possible.
 
'''Establish greater confidence in the software product from the development team.'''
Overall, effective application of Continuous Integration practices can provide greater confidence in producing a software product. With every build, your team knows that tests are run against the software to verify behavior, that project coding and design standards are met, and that the result is a functionally testable product. Since a Continuous Integration system can inform you when something goes wrong, developers and other team members have more confidence in making changes. Because Continuous Integration encourages a single-source point from which all software assets are built, there is greater confidence in its accuracy.
 
==Disadvantages==
'''Increased overhead in maintaining the Continuous Integration system.'''
This is usually a misguided perception, because the need to integrate, test, inspect and deploy exists regardless of whether you are using Continuous Integration.
 
'''Too much change.'''
Some may feel there are too many processes that need to change to achieve Continuous Integration for their legacy project. An incremental approach to Continuous Integration is most effective; first add builds and tests with a lower occurrence (for example, a daily build), then increase the frequency as everyone gets comfortable with the results.
 
'''Too many failed builds.'''
Typically, this occurs when developers are not performing a private build before committing their code to the version-control repository. It could be that a developer forgot to check in a file or had some failed tests. Rapid response is imperative when using Continuous Integration because of the frequency of changes.
 
'''Additional hardware/software costs.'''
To effectively use Continuous Integration, a separate integration machine should be acquired.
 
'''Developers should be performing these activities.'''
Sometimes management feels that Continuous Integration is just duplicating the activities that developers should be performing anyway. Yes, developers should be performing some of these activities, but they need to perform them more effectively and reliably in a separate environment. Leveraging automated tools can improve the efficiency and frequency of these activities.


==References==
==References==
<references/>
* [http://jamesshore.com/Agile-Book/continuous_integration.html Art of Agile: Continuous Integration]
 
* [http://www.atlassian.com/agile/practices/continuous-integration.jsp How do we Agile?]
==Expand your knowledge==
* [http://martinfowler.com/articles/continuousIntegration.html Martin Fowler: Continuous Integration]
* Continuous Integration Wikipedia [http://en.wikipedia.org/wiki/Continuous_integration CI]
* [http://basementcoders.com/2009/02/surviving-integration-hell-or-how-not-to-handle-projects-outsourced-to-vendors/ Surviving Integration Hell]
* Why Continuous Integration [http://www.espusa.com/whitepapers/continuous_integration_v1.0.pdf]
* [http://www.javaworld.com/javaworld/jw-06-2007/jw-06-awci.html JavaWorld]
* [https://jazz.net/library/article/474/#1 Jazz Library]
* [http://falafel.com/ Workflow Diagram Courtesy]

Revision as of 19:17, 20 February 2013

Continuous Integration

Definition

"We keep our code ready to ship"

Continuous Integration is a software development practice where members of a team integrate their work frequently. This approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly

Introduction

Software development landscape has changed lately. In the previous years, one third of the development time was spent on writing exhaustive specifications, which then went through intensive reviews, and then spend months before something was released; in the last years the focus was changed. Nowadays the specs are sketches at most, and releases are at least twice a month in order to get rapid feedback from the customer, allowing for a more focused product.

The big question is though: How can we developers adapt to the new world?

Continuous Integration aims to improve the quality of software, and to reduce the time taken to deliver it, by replacing the traditional practice of applying quality control after completing all development. It is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day.

Continuous Integration is about integrating changes to the development project continuously and automatically. These changes usually come in form of modifications and additions to the source code. When we keep them as small as possible and automatically run all checks available (starting from the compilation process and up to the full test routines), we can detect most breaking issues whenever they are introduced. Since the changes are small, it is much easier to isolate and fix the root cause

Process Involved

Continuous integration is achieved by reducing development friction involved in project integration. This is done with the help of:

  • Version Control Systems that allow large and distributed teams to work together on the same project, while sending their code changes (commits) to the central server. Each commit is stored on the server and could be reviewed or reverted.
  • Build Automation that is basically about build tools and scripts that allow to start the complete integration build on a local machine with a single mouse-click.
  • Unit Tests ensuring that code stays stable through all the changes (this normally comes with the Test-Driven Development practices).
  • Code Quality Tests that consistently enforce development guidelines accepted by the team.
  • Regular Commits by developers that reduce amount of change involved in every integration.
  • Integrating Every Commit (as soon as it comes) on a central server allows to detect any compilation failures or broken tests immediately. Also this is the best way to deal with "but it works on my machine" problem.
  • Publishing Integration Results that get created along with the build project might simplify getting feedback from people outside of the development team. Examples of these results are: unit test results, performance metrics, install packages, code quality reports etc.

Basic Practices to follow

Maintain a single source repository

Software projects involve lots of files that need to be orchestrated together to build a product. Keeping track of all of these is a major effort, particularly when there's multiple people involved. These tools - called Source Code Management tools, configuration management, version control systems, repositories, or various other names - are an integral part of most development projects. The base of a Continuous Integration system is to implement a good source control management system to keep track and control all of the files needed to build a product.

You must put everything required for a build in the source control system. One of the features of version control systems is that they allow you to create multiple branches, to handle different streams of development. This is a useful feature - but it's frequently overused and gets people into trouble. Keep your use of branches to a minimum. In particular have a mainline: a single branch of the project currently under development. Pretty much everyone should work off this mainline most of the time.

A comparison of the common version control systems

Automate the build

Getting the sources turned into a running system can often be a complicated process involving compilation, moving files around, loading schemas into the databases, and so on. However like most tasks in this part of software development it can be automated - and as a result should be automated. Asking people to type in strange commands or clicking through dialog boxes is a waste of time and a breeding ground for mistakes.

To get an efficient Continuous Integration system you need to implement an automatic build process. The details of this step depend upon the implementation language and development environment and may involve compiling and linking and/or other processes that produce the build artifacts, such as the program executable.

A comparison of the common build automation systems

Make the build self-testing

Traditionally a build means compiling, linking, and all the additional stuff required to get a program to execute. A program may run, but that doesn't mean it does the right thing. Modern statically typed languages can catch many bugs, but far more slip through that net. A good practice that will help you to detect errors quickly is to include some automated tests in your build process. If possible, execute a sub-set of the system tests as part of the build to ensure that the build is suitable prior to committing resources to the full scope of system testing. While the level of testing may vary, focus on gaining confidence that the increment is of sufficient quality to establish a baseline for system testing. If some testing fails, the build should also fail.

List of some common test automation tools

Everyone commits to the baseline every day

Integration is primarily about communication. Integration allows developers to tell other developers about the changes they have made. Frequent communication allows people to know quickly as changes develop. A good practice is to force developers to commit their changes to the main development stream at least once every day; before delivering the files, each developer must verify that his/her working copy is consistent with the main development stream, resolve conflicts if they exist, build a local test, and if this passes, commit changes to the main development stream.

Every Commit Should Build the Mainline on an Integration Machine

Although developers must run local builds and tests in their local machines before delivering to the main development stream, there may be differences between each developer's machine, or code integration errors. This is why it is very important to ensure that an integration build is run on an integration machine each time each a developer commits some changes. The developer that delivers the changes should be responsible for monitoring this integration build and fix it if there is any problem. Working this way ensures that errors can be found and solved easily and quickly. Developers are expected to commit to fixing problems with the build quickly, resulting in a successful build every day. There are three approaches to it:

  • To let the developer manually request a build execution after she has committed some new code to the stream.
  • To ensure that an automatic build will be executed after a code commit to the stream (uses a continuous integration server).
  • To use a scheduler so that automatic builds can be executed daily, weekly basis, etc. (uses a continuous integration server).

List of some popular continuous integration servers

Keep the build fast

It is important to reduce to a minimum the time spent running the integration build each time a developer commits changes to the main development stream. The problem is that this is not always possible, and complete integration builds can take a lot of time and resources. If this is your problem, a good best practice is to split your integration build into different stages: create a commit build that compiles and verifies the absence of critical errors when each developer commits changes to the main development stream, and create secondary build(s) to run slower and less important tests. You need to ensure that commit builds are always successful, but if any secondary build fails it does not have to be so critical as to stop everything. This allows your developers to continue working while these secondary errors are fixed.

Test in a clone of the production environment

The point of testing is to flush out, under controlled conditions, any problem that the system will have in production. A significant part of this is the environment within which the production system will run. If you test in a different environment, every difference results in a risk that what happens under test won't happen in production.

As a result you want to set up your test environment to be as exact a mimic of your production environment as possible. Use the same database software, with the same versions, use the same version of operating system. Put all the appropriate libraries that are in the production environment into the test environment, even if the system doesn't actually use them.

Make it easy for anyone to get the latest executable

Most builds produce useful output, such as an executable program, a packaged zip file, or other artifacts. Many people may need to get access to the latest executable to be able to run it or just to see what changed last week. Many times developers are not able to find it because there is not a well known place where these files are stored and they spend many hours just looking for this information. It is essential to make sure there's a well known place where people can find the latest executable. It may be useful to put several executables in such a store. For the very latest you should put the latest executable to pass the commit tests - such an executable should be pretty stable providing the commit suite is reasonably strong.

Everyone can see the results of the latest build

Communication is critical to implement a good Continuous Integration system. Each team member needs to have easy and transparent access to the state of the system, last changes made and state of the mainline integration build. Visibility and transparency of the flow of information between team members are essential.

Automate deployment

To do Continuous Integration you need multiple environments, one to run commit tests, one or more to run secondary tests. Since you are moving executables between these environments multiple times a day, you'll want to do this automatically. So it's important to have scripts that will allow you to deploy the application into any environment easily.

A natural consequence of this is that you should also have scripts that allow you to deploy into production with similar ease. You may not be deploying into production every day (although I've run into projects that do), but automatic deployment helps both speed up the process and reduce errors. It's also a cheap option since it just uses the same capabilities that you use to deploy into test environments.

Prefer synchronous integration, in which you wait for the integration to succeed, to asynchronous integration, in which a tool tests the integration for you. Synchronous integration requires fast builds, but ensures that they never break.

Agile Project Development: An Example Workflow

Let's assume a Developer A has to do something to a piece of software, it doesn't really matter what the task is, for the moment let us assume it's small and can be done in a few hours. Developer A begins by taking a copy of the current integrated source onto her local development machine, by using a source code management system by checking out a working copy from the mainline.

Now Developer A takes her working copy and does whatever she needs to do to complete her task. This will consist of both altering the production code, and also adding or changing automated tests. Once Developer A is done, she carry outs an automated build on her development machine. This takes the source code in her working copy, compiles and links it into an executable, and runs the automated tests. Only if it all builds and tests without errors is the overall build considered to be good.

With a good build, Developer A can then think about committing her changes into the repository. The twist is that other people may, and usually have, made changes to the mainline before she gets a chance to commit. So first Developer A updates her working copy with their changes and rebuilds. If their changes clash with her changes, it will manifest as a failure either in the compilation or in the tests. In this case it's Developer A’s responsibility to fix this and repeat until she can build a working copy that is properly synchronized with the mainline. Once she has made her own build of a properly synchronized working copy, she can then finally commit her changes into the mainline, which then updates the repository.

However Developer A’s commits doesn't finish her work. At this point she builds again, but this time on an integration machine based on the mainline code. Only when this build succeeds can she say that my changes are done. There is always a chance that Developer A missed something on her machine and the repository wasn't properly updated. Only when her committed changes build successfully on the integration is the job done.

If a clash occurs between two developers, it is usually caught when the second developer to commit builds their updated working copy. If not the integration build should fail. Either way the error is detected rapidly. At this point the most important task is to fix it, and get the build working properly again. In a Continuous Integration environment you should never have a failed integration build stay failed for long. A good team should have many correct builds a day. Bad builds do occur from time to time, but should be quickly fixed.

The result of doing this is that there is a stable piece of software that works properly and contains few bugs. Everybody develops off that shared stable base and never gets so far away from that base that it takes very long to integrate back with it. Less time is spent trying to find bugs because they show up quickly


The above figure depicts a Workflow Diagram

Advantages

Reduced risks. By integrating many times a day, you can reduce risks on your project. Doing so facilitates the detection of defects, the measurement of software health and a reduction of assumptions.

  • Defects are detected and fixed sooner.
  • Health of software is measurable.
  • Reduce assumptions.

Continuous Integration provides a safety net to reduce the risk that defects will be introduced into the code base. The following are some of the risks that Continuous Integration helps to mitigate:

  • Lack of cohesive, deployable software
  • Late defect discovery
  • Low-quality software
  • Lack of project visibility

Reduced repetitive manual processes. Reducing repetitive processes saves time, costs and effort. These repetitive processes can occur across all project activities, including code compilation, database integration, testing, inspection, deployment and feedback. By automating Continuous Integration, you have a greater ability to ensure all of the following.

  • The process runs the same way every time.
  • An ordered process is followed. For example, you may run inspections (static analysis) before you run tests-in your build scripts.
  • The processes will run every time a commit occurs in the version control repository.

Generate deployable software at any time and at any place. Continuous Integration can enable you to release deployable software at any point in time. From an outside perspective, this is the most obvious benefit of Continuous Integration. With Continuous Integration, you make small changes to the source code and integrate these changes with the rest of the code base on a regular basis. If there are problems, the project members are informed and the fixes are applied to the software immediately.

Enable better project visibility. Continuous Integration provides the ability to notice trends and make effective decisions, and it helps provide the courage to innovate new improvements. Projects suffer when there is no real or recent data to support decisions, so everyone offers their best guesses. Typically, project members collect this information manually, making the effort burdensome and untimely. The result is that often the information is never gathered. A Continuous Integration system can however provide just-in-time information on the recent build status and quality metrics. Some Continuous Integration systems can also show defect rates and feature completion statuses. Because integrations occur frequently with a Continuous Integration system, the ability to notice trends in build success or failure, overall quality and other pertinent project information becomes possible.

Establish greater confidence in the software product from the development team. Overall, effective application of Continuous Integration practices can provide greater confidence in producing a software product. With every build, your team knows that tests are run against the software to verify behavior, that project coding and design standards are met, and that the result is a functionally testable product. Since a Continuous Integration system can inform you when something goes wrong, developers and other team members have more confidence in making changes. Because Continuous Integration encourages a single-source point from which all software assets are built, there is greater confidence in its accuracy.

Disadvantages

Increased overhead in maintaining the Continuous Integration system. This is usually a misguided perception, because the need to integrate, test, inspect and deploy exists regardless of whether you are using Continuous Integration.

Too much change. Some may feel there are too many processes that need to change to achieve Continuous Integration for their legacy project. An incremental approach to Continuous Integration is most effective; first add builds and tests with a lower occurrence (for example, a daily build), then increase the frequency as everyone gets comfortable with the results.

Too many failed builds. Typically, this occurs when developers are not performing a private build before committing their code to the version-control repository. It could be that a developer forgot to check in a file or had some failed tests. Rapid response is imperative when using Continuous Integration because of the frequency of changes.

Additional hardware/software costs. To effectively use Continuous Integration, a separate integration machine should be acquired.

Developers should be performing these activities. Sometimes management feels that Continuous Integration is just duplicating the activities that developers should be performing anyway. Yes, developers should be performing some of these activities, but they need to perform them more effectively and reliably in a separate environment. Leveraging automated tools can improve the efficiency and frequency of these activities.

References