Maxima as a compiler code generator component ?



I am a maxima newbie.  I used it about 3 years ago to solve a multi
variable algebraic problem and haven't looked at it since.  Haven't
needed to either, for that matter.

This is a somewhat long winded email.  At first you might wonder what it
has to do with Maxima, but it gets mathematical in a bit. 

I am working on developing a system for generating and optimizing
assembly language for the back end of the open source C compiler, sdcc.
( http://sdcc.sourceforge.net/ )  At present the compiler uses a hand
coded back end which we would like to replace with something more
universal, easier to port, better optimization, etc. 

Presently sdcc is comprised of 2 components: The front end and various
target specific back ends.  The front end of the compiler takes C source
code as input and converts it into what is known as iCode.  iCode is a
normalized representation of the C code in the form of 'iResult = iLeft
op iRight' statements, where i variables are temporary variables
assigned by the front end and op is various things like addition,
subtraction, multiplication, etc.  It is the function of the compiler
back end to transform the iCode into suitably representative assembly
language for the target processor.  In the process of transforming the
iCode into assembly language, various functions such as optimization,
register allocation and pattern matching/replacement are performed. 

One method of generating assembler code from iCode is to express it in
what has been termed Register Transfer Language.  It is essentially
expressing the iCode in an low level algebraic form.  I like to call it
nano code, but that isn't an industry term. 

For example, lets say we have an iCode of iResult = itemp1 + a.

I could rewrite this in an algebraic format as follows:

t1 = fetch(a)
t2 = fetch(itemp1)
t3 = t1+t2
store(t3,iResult)

We could write function Fetch as follows:

t1 = a;
t2 = SP + t1; 
t3 = M[t2];   Where M is the memory access function.  t1,t2,t3 are local
to fetch and should not be confused with t1,2,3 above.

How does this turn into assembly language ?  By combining expressions
and matching them to their assembly language counterparts.

For example, we can define the following rtl:

t1 = t2 + t3 as being equivalent to the assembly language statement ADD
r2,r3,r1
and
t3 = M(t2)  as being equivalent to the assembly language statement LDW
r3, r2 
or better yet:
t3 = M(SP,t1) as being equivalent to the assembly language statement LDW
SP,r1 

Likewise optimization of the rtl is just an algebraic manipulation as
well.  Register allocation isn't a trivial algebraic operation, but we
can do that outside of the algebraic kernel. 

So... 

We need an algebraic engine (kernel) that given the algebraic
representation of the iCode for a program can simplify and match it to
various algebraic expressions representative of the assembly language
statements.   We will need to control the simplification, ie give it
rules by which to do the simplification.

A typical program's iCode will expand to 10,000 or more rtl statements.
A typical processor's instruction set will have about 150 algebraic
statements to represent it.  RTL as boolean algebraic statements as
well. 

We envision the process working something like this:

Feed Maxima an rtl statement.
Let Maxima reduce or attempt to match it.
If no match is found, feed in another rtl statement. 
Until we determine that no match is available at which point we have
faulty rtl or we find a match, at which point we emit the equivalent
assembly language.

At any one time, Maxima would be working with 30-50 rtl statements,
meaning that if it didn't find matches in them, it would be an error.
But it would run through 10,000 or more statements in the course of
generating a program.  It would be attempting to match its resulting
code to the 150 algebraic statements representative of the assembly
language every time we fed it a new statement.   

The preservation of the order of the statements is essential.  Or is
it ?  If we represent the iCode as a true algebraic formula, then
algebraic manipulation should result in the same result when converted
to assembly language ?  There is a big debate about this in certain
compiler circles right now.  Whether instruction reordering is a
legitimate form of optimization.  

Is Maxima or parts of it suitable to perform this task ? 

I am open to hearing any thoughts or opinions on this topic. 

FWIW, these exists a proprietary system that works on this basis named
VPO.

An example of how VPO can be used is here:
http://compiler.snu.ac.kr/papers/cases01-6.pdf

Thanks.  

-- 
Kim Lux,  Diesel Research Inc.