bard. asls an application-specific language (asl) is a programming language that supports: –a...
TRANSCRIPT
ASLs
• An Application-Specific Language (ASL) is a programming language that supports:– a particular class of applications– in a highly idiosyncratic domain
• In order to understand a particular ASL, it is usually necessary to understand the domain
• An ASL can be extremely effective when used for its intended purpose
Bard
• Bard is designed to perform pattern matching and manipulation of Abstract Syntax Trees (ASTs) in the reengineering domain.
• In order to understand Bard, it is necessary to understand a little about ASTs and reengineering
Outline of the talk
• This talk will cover:– What reengineering is all about – What an Abstract Syntax Tree (AST) is – What an "idiom" is – What the Idiom Tool does – How Bard and the Idiom Tool work together– The design decisions behind Bard
Legacy code
• The legacy code problem:– About 80% of programmer time goes to
maintenance – New software methodologies are the preferred
solution (that is, do it over from scratch!)
• BUT– 40 years of working systems can't be
abandoned.
Software reengineering
• Software reengineering is any activity that– improves one's understanding of software– improves the software documentation – improves the software itself – brings legacy code up to more current standards– aids in a new implementation of the software
Why reengineer?
• We reengineer code for many reasons:– add functionality– ease maintenance – move to new platforms and/or new languages – facilitate reuse – improve performance – decrease cost of keeping obsolete systems– combat software rot
The reengineering process
• Reverse engineering moves from "software assets" (code, documentation, etc.) to a higher level of abstraction
• Transformations modify the system at some level of abstraction
• Forward engineering moves from a higher level of abstraction back to code
Idiom tool
• Idiom Tool is the part of the reengineering suite that performs transformations on the AST
Creating the Abstract Syntax Tree
• Parsers convert programs written in various languages into their AST representations
• This AST can be manipulated by Idiom Tool, or used as input for various other tools (to create flowcharts, data flow diagrams, concordances, etc.)
• Idiom Tool is a programmable subsystem
• The programming language used is Bard
Gray
• The ASTs can expressed in a language called Gray
• Gray is itself an ASL (Application-Specific Language)
• Since ASTs are trees, Gray borrows the syntax (but not the operators) of LISP
Example Gray code
• (If_Op (Less Identifier:n Number_Literal:0) (Return_Value (Negate Identifier:n)) Epsilon) )
Idioms
• An idiom is a conventional way of expressing an idea
• Programming languages, like natural languages, have idioms
• Examples:– for (i = 0; i < n; i++) A[i] = 0;– temp = x; x = y; y = temp;
Idioms change
• Idioms vary over time and across languages– Arrays may be zero-based or one-based– FORTRAN used parallel arrays to simulate structs– Languages may use array indices to simulate
pointers– C lacks multiply-dimensioned arrays– No early languages were object-oriented– COBOL typically represented years as two digits
Recap
• To summarize, – ReEngineer is a tool that uses – parsers to translate programs in various languages
into– Abstract Syntax Trees, which can be represented in
a linear fashion by – Gray code, and which can be manipulated by
– Idiom Tool, which uses programs written in– the Bard language.
Regular expressions
• Recognizing idioms in a program is a matter of recognizing certain kinds of patterns
• Regular expressions (regexps) are a standard, well-developed mechanism for doing pattern recognition on strings
• Regular expressions are used by grep, Perl, Tcl, awk, Python, vi, emacs, and a number of other languages and tools
Common regexp operators
Expression Meaningabc The literal characters abc. Any single character [ x̂] Negation: any character but xx? Optional xx* Zero or more x's
More regexp operators
Expression Meaningx+ One or more x's (x| y) Either x or yxy x followed by yx̂ x at beginning of string
\1 Variable
Regular expressions in Bard
• Regular expressions in Bard are modelled after regular expressions in grep
• Because we are doing pattern matching on ASTs rather than strings, – Literals are represented by Gray expressions,– Regular expression syntax must be modified to
accomodate arbitrary Gray expressions, and– Regular expressions may occur within Gray
expressions, and vice versa
Some regexps in Bard
Expression Meaning(assignment identifier:x number_literal:0)
The literal assignment statementx=0
? Any node or subtree{not x} Negation: any node or subtree but
x{opt x} Optional x{* x} Zero or more x's
More regexps in Bard
Expression Meaning{+ x} One or more x's{x | y} Either x or yx y x followed by y?x Variable x$x Variable sequence x
Inner and outer languages
• Regular expressions are not enough -- we need some way to control pattern matching
• We also need to manipulate the parts of the AST found by pattern matching
• Solution: wrap a more-or-less conventional "outer language" around the "inner language" of regular expressions
• This is also the approach taken in Perl
Conventional statements in Bard
• Conventional statements in Bard include: – assignment statements – if statements – while loops – print statements – function and procedure calls – calls to native (Ada) code – tracing and debugging facilities
Special-purpose statements in Bard
• Examining and manipulating the AST: – match pattern [at position] – insert subtree as relative-position
• for example, insert identifier:x as first child;
– delete position – replace position with subtree – find pattern [up to position] – go to position
Efficiency
• ReEngineer has been used to process hundreds of thousands of lines of code
• Pattern-matching languages are not efficient enough for this purpose
• Bard procedures are inefficient
• Bard procedures are called only when there is a high likelihood that pattern matching will succeed
Fast pre-screening of nodes
• Each Bard routine specifies the types of nodes at which it might be applicable
• Idiom Tool walks the AST and calls Bard when it finds such a node
• Since Idiom Tool is written in Ada, it can screen nodes very, very fast
The structure of Bard procedures
• A "procedure" in Bard, like a "rule" in an expert system, consists of two parts:– The test part determines whether this particular
procedure is applicable at this point in the tree• Tests must not have have side effects
– The action part consists of arbitrary Bard code• Actions may modify the AST and/or may collect
and store information about the AST
Skeleton of a Bard procedure
• procedure name (pass, node_types) tests; -- may not have side effects commit; -- separates tests from actions actions; -- manipulate the ASTend procedure;
Bard example
idiom remove_null_statements; -- Remove null statements, except empty loop bodies. procedure remove_null_statements (1, Null_Statement); go to first child of parent; not match While_Op; not match Do_While; not match C_For_Loop; commit; go to @trigger; delete; end procedure;end idiom;
Diffuse patterns
• A pattern is compact if it is a single recognizable unit
• A pattern is diffuse if it consists of multiple pieces scattered throughout the AST
• Diffuse patterns may overlap in complex ways
• The problem is to collect the various pieces
• This is not easily solved with only simple data structures such as arrays
Dealing with diffuse patterns
• Idiom Tool makes multiple passes over the data base– Each Bard procedure indicates the pass for which it
is active– This allows some procedures to collect information
and other procedures to process the information.
• Idiom tool uses a fact base to store information between passes
Modifying the fact base
• Bard's fact base is modeled after that of Prolog
• Bard allows negative assertions as well as positive assertions
• assert fact;• deny fact;• retract fact;
Interrogating the fact base
• Because Bard does not used the Closed World Assumption, facts may be "true", "false", or (unlike Prolog) "unknown"
• true fact; • false fact; • known fact; • unknown fact;
Multiple occurrences
• It is sometimes necessary to recognize multiple occurrances of the same subtree. – For example, to recognize Pascal statements of
the form x:=x+1, you have to realize that the identifier x is repeated
• (Assignment Identifier:?id (Integer_Addition Identifier:?id Numeric_Literal:1))
Unification in Bard
• When pattern matching variable ?id against value v, – If ?id has no prior value, it is unified with v– If id? has a prior value, then the match succeeds
if and only if id? == v– If a test succeeds, all unifications performed
during the test are retained
– If a test fails, all unifications performed during the test are discarded
Summary I
• Idiom tool does multiple preorder traversals of the AST; at each node it may trigger one or more Bard procedures
• A Bard procedure consists of – a header that specifies appropriate node types– a test part that performs any additional tests– the keyword commit, and – an action part
Summary II
• Special features of Bard include– an "inner language" for pattern matching– procedures modeled after rules in expert systems– powerful tests that can comprise arbitrary actions– unification and unification unwinding– an integrated fact base
• These features make Bard far better for its purpose than a general language such as Ada
Summary III
• Bard has been used– in a Pascal to C translation system– to assist in converting CMS-2 to C– in a prototype Y2K system
Current status of Bard
• Currently owned by either Lockheed or Unisys
• No longer under active development
• Lost and presumed dead as a result of corporate mergers and reorganizations