CSC/ECE 517 Fall 2010/ch4 4h sk
What is Static Code Analysis?
- If there is one thing that has proven to be true over time is is that humans are fallible. We by nature, make mistakes. As a result, our humanity must be factored in to the software engineering and development that we produce. Development teams discovered early on that code review was the best way to discover mistakes and bugs in their software. This proved to be a daunting task however. Assembling teams to identify problems in the code base as well as training them took up large amounts of time and valuable resources. Obviously code review and bug discovery were critical areas of software development that needed to be addressed, but development teams needed a more efficient way to go about doing it.
- In the 1970's Stephen Johnson at Bell Laboratories wrote a program call Lint. Lint's functionality was to look at C programs that had made it through compilation without any errors and examine the source code to locate any bugs that were not detected. With this, static code analysis was born.
- Static code analysis defines tools used to verify the quality and reliability of software systems quickly and efficiently. There are many static code analysis tools out there for many different languages. Some of these are open source products and some are commercially based. Their scope and accuracy can vary quite a bit depending on what tool a development team uses. There are some traits of these tools that are for the most part universal however.
How do Static Code Analysis Tools Identify Problems?
- While humans are fallible, at least we tend to be consistent about it. No matter the company or region of the world, software developers tend to fall into the same traps over and over. This creates patterns and known high risk situation that can be identified through programmatic means. Key aspects of a program that pretty much all static code analysis tool focus on are:
- Input validation and representation - Based on different key combinations and encoding of alphanumeric characters, what user flows put the code at risk to attack.
- API abuse - Does the caller violate the API terms
- Security features - Are the security steps taken legal. Is the encryption used valid.
- Time and state - Are race conditions introduced a components seek to share state.
- Errors - Are there holes left in the code base that can be exploited.
- Code Quality - Poor code quality introduces high risk situations and unpredictable behavior. The system may be stressed in unexpected ways.
- Encapsulation - Are the boundaries between encapsulated code structures sound.
- Environment - Anything outside the code base that might still be a security risk.