EPFL - Swiss Federal Institute of Technology
Computer Science Department, Software Engineering Laboratory

Ada Text Formatter using Gramact

Alexandros Pappas

23 March 1995

A text formatter for Ada83 and Ada95 programs was written using the Gramact programs of the LGL Components Library.

Introduction

Gramact provides a lexical and syntax analyser with which one can easily write a parser for a given grammar. The grammar is specified in an EBNF-like notation called GRAMOL. In addition to the grammar, one can also specify, some actions to execute, during the parsing process. Here we wrote the grammar of Ada83 and Ada95, and "decorated" them with actions for text formatting. We also wrote a program which executes these actions given a decorated grammar and an input program. The advantage of using Gramact, is, that one can easily modify the text formatter to his own needs and style, simply by editing the decoration of the Ada83 or Ada95 grammar, and by changing the implementation of the actions, if needed.

A very brief introduction to GRAMOL

GRAMOL is like EBNF. The main addition is a construct to easily describe lists, with or without separators:
{+ b +} is equivalent to b { b } in EBNF and
{+ b $ "," +} is equivalent to b { "," b } in EBNF.

Actions can be inserted anywhere in a rule and have the form:
<+action+>

Writing the Grammar

Left recursion and ambiguities

The grammar must not be left recursive in order to be accepted by Gramact. That means, that rules like:
A = A b A | c.
are not allowed. The above rule can be rewritten as:
A = c {+ b A +}.
Also, the grammar, must have as few ambiguities as possible. That means, constructs like:
A = B C D E G
H = B C D F G
should be grouped to one: AorH = B C D (E | F) G.

Differences between Ada83 and Ada95

Although there are not many significant differences in the syntax, there are 278 rules describing the grammar of Ada95 and only 180 for Ada83. A sample of rules added in Ada95 is listed below.

basic_declaration: 
  abstract_subprogram_declaration. 

type_declaratrion:
  private_extension_declaration. 

object_declaration: 
  single_task_declaration | single_protected_declaration. 

derived_type_definition: 
  record_extension_part. 

access_type_definition: 
  access_to_subprogram_definition. 

proper_body: 
  protected_body. 

name: 
    direct_name | type_conversion  
  | explicit_dereference | function_call. 

aggregate: 
  record_aggregate | extension_aggregate | array_aggregate. 

simple_statement: 
  requeue_statement. 

renaming_declaration: 
  generic_renaming_declaration. 

select_statement: 
  asynchronous_select. 

generic_formal_parameter_declaration: 
    formal_object_declaration 
  | formal_type_declaration 
  | formal_subprogram_declaration 
  | formal_package_declaration. 

new keywords:
  abstract aliased protected requeue tagged until

Given these differences, we found that it is easier, to write the syntax Ada95 from scratch, than to try to modify the syntax of Ada83. In the following chapter, we present the difficulties in writing the syntax of Ada95.

Difficulties in writing the syntax of Ada95

Fortunately, most rules are not left recursive. The one which is left recursive is:

4.1: 
name =	direct_name		| explicit_dereference 
      | indexed_component	| slice
      | selected_component	| attribute_reference
      | type_conversion		| function_call
      | CHARACTER_LITERAL.

explicit_dereference 	= name "." "ALL".
indexed_component 	= prefix "(" {+ expression $ "," +}  ")".
			= name indexed_component_body.
slice			= prefix "(" discrete_range ")".
			= name slice_body.
selected_component	= prefix "." selector_name.
			= name selected_component_body.
attribute_reference	= prefix "'" attribute_designator.
			= name attribute_designator_body	
type_conversion		= subtype_mark "(" expression ")".
			= name type_conversion_body.
function_call		= name | prefix actual_parameter_part.
			= name [ actual_parameter_part ]
			= name [ function_call_body ].
prefix			= name.
subtype_mark		= name.

Because all elements except direct_name and character_literal start with name, we can rewrite name as following:

name =  ( direct_name	| CHARACTER_LITERAL )
	{+ 			  	  explicit_dereference_body
	| indexed_component_body	| slice_body		
	| selected_component_body 	| attribute_reference_body	
	| type_conversion_body		| [ function_call_body ]
	+}.
     =  ( direct_name	| CHARACTER_LITERAL )
	{ 			  	  explicit_dereference_body
	| indexed_component_body	| slice_body		
	| selected_component_body 	| attribute_reference_body	
	| type_conversion_body		| function_call_body
	}.

indexed_component_body and type_converstion_body are a subset of function_call_body, so we can eliminate these two. The problem is, that slice_body and function_call_body are very similar, and the parser needs a very long time to distinguish them. Here we write the rules for slice_body and function_call_body:

slice_body				= "(" discrete_range ")".
discrete_range			= subtype_indication | range.
subtype_indication		= subtype_mark [ constraint ].
				= name [ subtype_indication_body ].
subtype_mark			= name.
range				=   range_attribute_reference 
			  	  | simple_expression ".." simple_expression.
range_attribute_reference	= prefix "'" range_attribute_designator.
				= name range_attribute_reference_body
range_attribute_designator	= "RANGE" [ "(" expression ")" ].

function_call_body		= actual_parameter_part.
actual_parameter_part		= "(" {+ parameter_association $ "," +} ")".
parameter_association		= [ selector_name "=>" ] 
				  explicit_actual_parameter.
explicit_actual_parameter	= expression.

slice_body			= "(" ( name [ subtype_indication_body ]
				      | name range_attribute_reference_body
			      	      | simple_expression ".." 
					simple_expression
				      ) ")".
function_call_body		= "(" {+ [ selector_name "=>" ] 
					 expression $ "," +} ")"

An example where slice_body and function_call_body are the same is: (A(A(A))).
An example where they are similar: (A(A(A))range(A) is slice_body but (A(A(A)), A) is function_call_body. In order to unite slice_body and function_call_body we extend slice_body to a superset of slice_body by replacing name with expression, because every expression can be a name. Doing this for a compiler would be unacceptable, but for a text formatter, it is not so important, if we accept a small superset of Ada95. This way, the similarities are clearly visible now:

slice_body			= "(" expression 
				( [ subtype_indication_body ]
				| range_attribute_reference_body
				| ".." simple_expression
				) ")".
function_call_body		= "(" ( expression 
				{ "," [ selector_name "=>" ] expression } 
			      | selector_name "=>" expression   
				{ "," [ selector_name "=>" ] expression }
			      ) ")"

Note, that slice_body without subtype_indication_body is included in function_call_body. The union of these two gives:

slice_body_or_function_call_body = 
	  "("	( expression	( subtype_indication_body
				| range_attribute_reference_body
				| ".." simple_expression
				| { "," [ selector_name "=>" ]  expression }
				)
		| selector_name "=>" expression 
		  { "," [ selector_name "=>" ] expression }
		) ")".

There were some other easier ambiguities to resolve like:

parameter_specification =
	defining_identifier_list ":" 
	mode subtype_mark [ ":=" default_expression ].
parameter_specification =
	defining_identifier_list ":" 
	access_definition [ ":=" default_expression ].

Which results to:

parameter_specification =
	defining_identifier_list ":" 
	(mode subtype_mark | access_definition )
	[ ":=" default_expression ].

Testing the grammar

We tested this grammar with the GNAT source files, and all of them could be parsed. They are perhaps not the ideal test files, but they represent nevertheless a big test set.

Decorating the grammar

In this text formatter we are concerned only about blanks, line breaks and indentation. We take an ascii file and produce an ascii file. This can subsequently be piped into other formatters, which recognize keywords, and create postscript files. So for our purpose we need 5 actions:

line: writes a new line and sets the starting column to 0.
incr: increments the starting column by a constant, used for indenting.
decr: decrements the starting column by a constant, used to end indenting.
set: sets the current cursor position to the starting column of the next line.
space: puts a space to the current cursor position.

Here is an example of how we decorated the if-then-else statement:

sequence_of_statements = <+INCR+>
	{+ <+SET+> statement +} <+DECR+>.
if_statement = 
	"IF" <+SPACE+> condition <+SPACE+> "THEN"
	 sequence_of_statements 
	{ <+SET+> "ELSIF" <+SPACE+> condition <+SPACE+> "THEN"
	   sequence_of_statements }
	[ <+SET+> "ELSE" sequence_of_statements ]
	<+SET+> "END" <+SPACE+> "IF" ";".

Doing it with Gramact

There are three files necessary for the text formatter:

ada95.gra: The grammar of Ada95 written in GRAMOL.
ada95_actions.gwa: The grammar of Ada95 decorated with the actions.
structure.a: Ada program which calls the syntax analyser of Gramact, and treats the actions given by ada95_actions.gwa.

The grammar file ada95.gra must be compiled with grana which produces ada95.gcn. The actions file ada95_actions.gwa must be compiled with extract which produces ada95_actions.gca. The Ada program structure.a must be compiled with an Ada compiler and linked with the Gramact library. When executing, structure reads the compiled actions file ada95_actions.gca and an Ada program and outputs an indented program. Directly available are:

A man page for structure.
All necessary files in tar format to build the text formatter.

Conclusion and possible ameliorations

Given the syntax of Ada by the LRM, we found that it is not trivial to find and eliminate the ambiguities. However, we succeeded to write a grammar with very few ambiguities, which are resolved quickly by the parser. The formatting is for the moment very trivial, and might not always look good. For example, every parameter of a procedure definition or procedure call is written on a new line. But when the parameters are not very long, one might want to write several parameters on one line. Aesthetic features like this may easily be added to the existent program.

Credits

Mikel Larrea in his semester project (June 94) wrote the decoration of the grammar for Ada83, ada83_actions.gwa, and the program structure.a. Alexandros Pappas (March 95) wrote the grammar for Ada95, and its decoration with actions.
Mikel Larrea also wrote:

apparence which puts keywords and identifiers in small or capital letters, and
ada_to_postscript, which generates postscript code from Ada programs.

These source files are available in the tar file of the text formatter.

Bibliography

[LRM83]: Ada Programming Language, ANSI/MIL-STD-1815A, EPFL/Mathematics Department, Lausanne 1983.
[LRM95]: Ada Reference Manual, ISO/IEC 8652:1995(E), Intermetrics Inc., 1995
[Larrea94]: "Metteur en page de sources Ada" (in french), EPFL/LGL/Mikel Larrea, June 1994

Ada Text Formatter / Alexandros Pappas / pappas@danaos.dsclab.ece.ntua.gr