DMS Capabilities

From Expertiza_Wiki
Revision as of 05:03, 5 October 2014 by Yzhang44 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

DMS provides a large set of robust, integrated facilities for building analysis and modification tools:

Full UNICODE-based parser and lexer generation with automatic error recovery. Accepts/generates files encoded in UTF-8 and UTF-16, 7 bit ASCII (ISO-646-US), 8 bit ASCII (ISO-8859-1 thru -16), EBCDIC (CP-37, CP-500), a number of Microsoft code pages (CP-1250 thru -1258), and Japanese Shift-JIS (CP-932 and JIS-0208). Standard support is included for reading multiple source files to enable INCLUDE file management and construct suitable preprocessors. The parser technology is based on GLR, and can handle any context-free language, even with ambiguities (much stronger than YACC/LALR). Proven on dozens of real languages.

Automatic construction of abstract (not concrete) syntax trees (non-value-carrying terminals and unit productions are suppressed; syntax-lists are converted into AST list nodes). Literal values (numbers, escaped strings) are converted to native, normalized binary values for fast internal manipulation. Source comments are captured and attached to AST nodes.

Pretty-printer generation converts ASTs back to nicely formatted legal source file form, according to a specified layout information, including source comments. In fidelity-printing mode, comments, spacing and lexical formatting information of unchanged code is preserved. Customizing allows generation of source code HTML form, or even as obfuscated source text. Trees may be output directly in XML format.

Multi-pass attribute-evaluator generation from grammar, to allow arbitrary analysis (including name/type analysis procedures) to be specified in terms of the concrete grammar provided.

Sophisticated symbol-table construction facilities for global, local, inherited, overloaded and other language-dependent name lookup and namespace management rules.

Control-flow graph construction including traditional entry/exit/action/condition nodes, but also fork/join nodes to model parallelism and/or indeterminate order (e.g., C sequence points). There predefined analyzers for constructing (post) dominators, and inducing structured control-flow regions. Additional machinery can compute compilation-unit local and system/global call graphs.

Data flow analysis framework, to allow data-flow analysis problems to be posed and answered, including predefined analyzers for constructing use-def and def-use chains. (See sample control and data flow graphs)

Points-to analysis for computing local or global points-to data, tested on systems of 13+ million lines of code.

Symbolic Range Analysis computing range constraints on program variables in terms of other variables. This is useful to detecting array-access errors, determining which switch case is selected, ... in conjunction with other analyses.

Multiple domains (notations/languages) can be represented at the same time. This enables processing or generating systems composed of parts from more than one domain (COBOL and JCL, C and Makefiles, etc.), and/or translation from one domain language to another.

Transforms and patterns can be written directly in surface-to-surface domain syntax form. Patterns can be matched against syntax trees and return bindings for parameter subtrees. Alternatively, procedural code can implement transforms, or refer to existing transforms and patterns to enable construction of very sophisticated transforms.

A full Associative/Commutative rewrite engine that operates on trees and DAGs, which can be used to apply sets of transforms.

A metaprogramming language, XCL, provides the ability to control the sequencing of the application of transforms and sets of transforms. (Future Release)

An algebraic specification subsystem can be used to specify arbitrary algebras (this is just a DMS domain!). The axioms can be treated as a set of rewrite rules. (This allows one to code arbitrary simplification procedures. (We have done simplification on boolean equations that are essentially 1 million terms in size; we have also modeled optimization of transistor [not gates!] circuits this way).