EPFL - Swiss Federal Institute of Technology
Computer Science Department, Software Engineering Laboratory
Ada Text Formatter using Gramact
Alexandros Pappas
23 March 1995
A text formatter for Ada83 and Ada95 programs was written using the Gramact
programs of the LGL Components Library.
Introduction
Gramact provides a lexical and syntax analyser with which one can
easily write a parser for a given grammar. The grammar is specified
in an EBNF-like notation called GRAMOL. In addition to the grammar,
one can also specify, some actions to execute, during the parsing
process.
Here we wrote the grammar of Ada83 and Ada95, and "decorated" them
with actions for text formatting. We also wrote a program which
executes these actions given a decorated grammar and an input
program.
The advantage of using Gramact, is, that one can easily modify the
text formatter to his own needs and style, simply by editing the
decoration of the Ada83 or Ada95 grammar, and by changing the
implementation of the actions, if needed.
A very brief introduction to GRAMOL
GRAMOL is like EBNF. The main addition is a construct
to easily describe lists, with or without separators:
{+ b +} is equivalent to b { b } in EBNF and
{+ b $ "," +} is equivalent to b { "," b } in EBNF.
Actions can be inserted anywhere in a rule and have the
form:
<+action+>
Writing the Grammar
Left recursion and ambiguities
The grammar must not be left recursive in order to be accepted by
Gramact. That means, that rules like:
A = A b A | c.
are not allowed. The above rule can be rewritten as:
A = c {+ b A +}.
Also, the grammar, must have as few ambiguities as possible.
That means, constructs like:
A = B C D E G
H = B C D F G
should be grouped to one:
AorH = B C D (E | F) G.
Differences between Ada83 and Ada95
Although there are not many significant differences in the syntax,
there are 278 rules describing the grammar of Ada95 and only 180 for
Ada83. A sample of rules added in Ada95 is listed below.
basic_declaration:
abstract_subprogram_declaration.
type_declaratrion:
private_extension_declaration.
object_declaration:
single_task_declaration | single_protected_declaration.
derived_type_definition:
record_extension_part.
access_type_definition:
access_to_subprogram_definition.
proper_body:
protected_body.
name:
direct_name | type_conversion
| explicit_dereference | function_call.
aggregate:
record_aggregate | extension_aggregate | array_aggregate.
simple_statement:
requeue_statement.
renaming_declaration:
generic_renaming_declaration.
select_statement:
asynchronous_select.
generic_formal_parameter_declaration:
formal_object_declaration
| formal_type_declaration
| formal_subprogram_declaration
| formal_package_declaration.
new keywords:
abstract aliased protected requeue tagged until
Given these differences, we found that it is easier, to write the
syntax Ada95 from scratch, than to try to modify the syntax of Ada83.
In the following chapter, we present the difficulties in writing the
syntax of Ada95.
Difficulties in writing the syntax of Ada95
Fortunately, most rules are not left recursive. The one which is left
recursive is:
4.1:
name = direct_name | explicit_dereference
| indexed_component | slice
| selected_component | attribute_reference
| type_conversion | function_call
| CHARACTER_LITERAL.
explicit_dereference = name "." "ALL".
indexed_component = prefix "(" {+ expression $ "," +} ")".
= name indexed_component_body.
slice = prefix "(" discrete_range ")".
= name slice_body.
selected_component = prefix "." selector_name.
= name selected_component_body.
attribute_reference = prefix "'" attribute_designator.
= name attribute_designator_body
type_conversion = subtype_mark "(" expression ")".
= name type_conversion_body.
function_call = name | prefix actual_parameter_part.
= name [ actual_parameter_part ]
= name [ function_call_body ].
prefix = name.
subtype_mark = name.
Because all elements except direct_name and character_literal
start with name, we can rewrite name as following:
name = ( direct_name | CHARACTER_LITERAL )
{+ explicit_dereference_body
| indexed_component_body | slice_body
| selected_component_body | attribute_reference_body
| type_conversion_body | [ function_call_body ]
+}.
= ( direct_name | CHARACTER_LITERAL )
{ explicit_dereference_body
| indexed_component_body | slice_body
| selected_component_body | attribute_reference_body
| type_conversion_body | function_call_body
}.
indexed_component_body and type_converstion_body are a subset of
function_call_body, so we can eliminate these two. The problem is,
that slice_body and function_call_body are very similar, and the parser
needs a very long time to distinguish them. Here we write the rules
for slice_body and function_call_body:
slice_body = "(" discrete_range ")".
discrete_range = subtype_indication | range.
subtype_indication = subtype_mark [ constraint ].
= name [ subtype_indication_body ].
subtype_mark = name.
range = range_attribute_reference
| simple_expression ".." simple_expression.
range_attribute_reference = prefix "'" range_attribute_designator.
= name range_attribute_reference_body
range_attribute_designator = "RANGE" [ "(" expression ")" ].
function_call_body = actual_parameter_part.
actual_parameter_part = "(" {+ parameter_association $ "," +} ")".
parameter_association = [ selector_name "=>" ]
explicit_actual_parameter.
explicit_actual_parameter = expression.
slice_body = "(" ( name [ subtype_indication_body ]
| name range_attribute_reference_body
| simple_expression ".."
simple_expression
) ")".
function_call_body = "(" {+ [ selector_name "=>" ]
expression $ "," +} ")"
An example where slice_body and function_call_body are the same is:
(A(A(A))).
An example where they are similar: (A(A(A))range(A) is slice_body but
(A(A(A)), A) is function_call_body.
In order to unite slice_body and function_call_body we extend slice_body to
a superset of slice_body by replacing name with expression, because
every expression can be a name. Doing this for a compiler would be
unacceptable, but for a text formatter, it is not so important, if
we accept a small superset of Ada95. This way, the similarities
are clearly visible now:
slice_body = "(" expression
( [ subtype_indication_body ]
| range_attribute_reference_body
| ".." simple_expression
) ")".
function_call_body = "(" ( expression
{ "," [ selector_name "=>" ] expression }
| selector_name "=>" expression
{ "," [ selector_name "=>" ] expression }
) ")"
Note, that slice_body without subtype_indication_body is included
in function_call_body.
The union of these two gives:
slice_body_or_function_call_body =
"(" ( expression ( subtype_indication_body
| range_attribute_reference_body
| ".." simple_expression
| { "," [ selector_name "=>" ] expression }
)
| selector_name "=>" expression
{ "," [ selector_name "=>" ] expression }
) ")".
There were some other easier ambiguities to resolve like:
parameter_specification =
defining_identifier_list ":"
mode subtype_mark [ ":=" default_expression ].
parameter_specification =
defining_identifier_list ":"
access_definition [ ":=" default_expression ].
Which results to:
parameter_specification =
defining_identifier_list ":"
(mode subtype_mark | access_definition )
[ ":=" default_expression ].
Testing the grammar
We tested this grammar with the GNAT source files, and all of
them could be parsed. They are perhaps not the ideal test files,
but they represent nevertheless a big test set.
Decorating the grammar
In this text formatter we are concerned only about blanks, line breaks
and indentation. We take an ascii file and produce an ascii file. This
can subsequently be piped into other formatters, which recognize
keywords, and create postscript files.
So for our purpose we need 5 actions:
- line:
writes a new line and sets the starting column to 0.
- incr:
increments the starting column by a constant, used for indenting.
- decr:
decrements the starting column by a constant, used to end indenting.
- set:
sets the current cursor position to the starting column of the
next line.
- space:
puts a space to the current cursor position.
Here is an example of how we decorated the if-then-else statement:
sequence_of_statements = <+INCR+>
{+ <+SET+> statement +} <+DECR+>.
if_statement =
"IF" <+SPACE+> condition <+SPACE+> "THEN"
sequence_of_statements
{ <+SET+> "ELSIF" <+SPACE+> condition <+SPACE+> "THEN"
sequence_of_statements }
[ <+SET+> "ELSE" sequence_of_statements ]
<+SET+> "END" <+SPACE+> "IF" ";".
Doing it with Gramact
There are three files necessary for the text formatter:
- ada95.gra:
The grammar of Ada95 written in GRAMOL.
- ada95_actions.gwa:
The grammar of Ada95 decorated with the actions.
- structure.a:
Ada program which calls the syntax analyser of
Gramact, and treats the actions given by ada95_actions.gwa.
The grammar file ada95.gra must be compiled with grana
which produces ada95.gcn. The actions file ada95_actions.gwa must be
compiled with extract which produces ada95_actions.gca. The Ada
program structure.a must be compiled with an Ada compiler and linked
with the Gramact library. When executing, structure reads the compiled
actions file ada95_actions.gca and an Ada program and outputs an
indented program.
Directly available are:
Conclusion and possible ameliorations
Given the syntax of Ada by the LRM, we found that it is not trivial to
find and eliminate the ambiguities. However, we succeeded to write a
grammar with very few ambiguities, which are resolved quickly by the
parser.
The formatting is for the moment very trivial, and might not always
look good. For example, every parameter of a procedure definition or
procedure call is written on a new line. But when the parameters are
not very long, one might want to write several parameters on one line.
Aesthetic features like this may easily be added to the existent program.
Credits
Mikel Larrea in his semester project (June 94) wrote the decoration of
the grammar for Ada83, ada83_actions.gwa, and the program
structure.a. Alexandros Pappas (March 95) wrote the grammar for Ada95,
and its decoration with actions.
Mikel Larrea also wrote:
- apparence
which puts keywords and identifiers in small or capital letters, and
-
ada_to_postscript, which generates postscript code from Ada
programs.
These source files are available in the tar file of the text
formatter.
Bibliography
- [LRM83]
- Ada Programming Language, ANSI/MIL-STD-1815A,
EPFL/Mathematics Department, Lausanne 1983.
- [LRM95]
- Ada Reference Manual, ISO/IEC 8652:1995(E),
Intermetrics Inc., 1995
- [Larrea94]
- "Metteur
en page de sources Ada" (in french), EPFL/LGL/Mikel Larrea, June
1994
Ada Text Formatter / Alexandros Pappas /
pappas@danaos.dsclab.ece.ntua.gr