
   next_inactive up previous

                  Implementation Notes (aka hacker guide)

   $Id: hacker.tex,v 1.1 2001/11/24 19:41:12 malekith Exp $

Contents

     * [1]Intro
     * [2]Code generation
     * [3]Argument passing conventions
     * [4]Datatypes representation
          + [5]Basic datatypes
          + [6]Strings
          + [7]Tuples
          + [8]Unions
          + [9]Structures
          + [10]By value or by reference
     * [11]About this document ...

                                     Intro

   The compiler is written in OCaml. This ensures high portability,
   reasonable speed, along with modularization and type safety.

                                Code generation

   It is done in module Gontcodegen.

   Compiler outputs intermediate language, called Ksi, that is later on
   compiled by gcc front end. Ksi itself is rather strike through
   interface to trees used by gcc internally. It uses lispy syntax.

   Ksi output is done through Ksi module. It exports datatype for Ksi
   expressions and declaration, and function to convert this data into
   output stream (that is indented, to allow easier debugging of compiler
   output).

                         Argument passing conventions

   Values passed as polymorphic parameters has to be all of the same
   size. Otherwise we wouldn't be able to generate efficient code.
   Specially for this purpose `int' Gont type is chosen with the same as
   size as machine pointer. (This is not the choice of some C compilers
   on some 64 bit architectures, for example on alpha-*-linux with GCC
   sizeof(int) = 4, sizeof(long) = sizeof(void*) = 8).

   They are few basic types of passing arguments in Gont.

    1. by value. int and bool are passed this way.
    2. by pointer to some location on the heap. structures, unions and
       tuples are passed this way.
    3. by pointer to location, that is not guaranteed to exists after
       function terminates (read: this can be pointer to the stack of
       function that called us). If such value needs to be saved after
       function return, is has to be copied to the heap. It also has to
       be copied if calling raise with it. Special functions are provided
       by the runtime to copy such values. Floats and functional values
       are passed this way.

   [[Functional values are currently always passed as pointers to the
   heap. I guess some semantic analysis will be done to pass it as
   pointers to stack when it is safe. Similarly for function closures.]]

   Functional values cannot be passed simply as pointers to functions,
   since they might need closure in some (most?) cases. Here is layout of
   data structure describing functional value:

        void *fnc;
        void *closure;

   closure has to be passed as the first argument to fnc. Closure must
   not point into the stack.

   Pseudocode example:

        foo()
        {
                int x;
                int y;
                g()
                {
                        int p;

                        use(p);
                        use(x);
                }

                use(x);
                use(y);
                use(g);
        }

   is translated to:

        struct foo_closure {
                int x;
        }

        foo_lambda1(foo_closure *ptr)
        {
                int p;
                use(p);
                use(ptr->x);
        }

        foo()
        {
                int y;
                foo_closure *cl = make_closure(...);
                void *g = make_lambda(foo_lambda1, cl);

                use(cl->x);
                use(y);
                use(g);
        }

   (Examples are given using pseudo-Gont as input, and pseudo-C as
   output, just to not confuse reader with Ksi syntax).

   One might note that it might be impossible to pass regular function
   this way. We need auxiliary function to be generated.

                           Datatypes representation

   ptroff is signed integer type of size of pointer.

        struct lambda_exp { void *fnc, *closure; }

Basic datatypes

   int and bool is represented on stack and passed as ptroff. float is
   passed around as double*, on stack it's simply double. void -
   (void*)whatever, if it needs to be passed, `whatever' cannot be relied
   upon to contain anything specific :). [*]->* - struct lambda\_exp*.

Strings

        struct string {
                int len;
                char *data;
        }

   len is length of string, in characters. data is string content, it
   does not have to be NUL terminated. Strings are passed as struct
   string*

Tuples

   *[t1, ..., tn] is represented as pointer to array containing tuple
   content. All ti are treated, as if they where passed to function, i.e.
   sizeof(ti) == sizeof(void*).

        f(*['a,int,'b] x) {
                let (_, y, _) = x {use(y);}
        }

   is translated to:

        f(void **x) {
                use((ptroff)x[1])
        }

Unions

        struct union_exp {
                ptroff sel;
                void *data;
        };

   Values of sel are assigned sequentially from 0 [[maybe 1 would be
   better, use 0 for error marks of some kind, but I don't know what they
   could be needed for...]], in order they are given in union definition.
   Union values are passed between functions are struct union_exp*

Structures

   They are treated much like C structures. They are always passed as
   pointers to structures. Similarly, if one structure contains another -
   it contains pointer to it, not it directly, thus:

        struct foo {
                int x;
                string s;
        }

        struct bar {
                foo f;
                int z;
        }

   is translated to:

        struct foo {
                int x;
                struct string *s;
        };

        struct bar {
                struct foo *f;
                int z;
        };

By value or by reference

   Ints, floats and bools are passed by value. For floats it will require
   sth like this:

        float x,y,r;
        ...
        r = tan2(x,y*2);

   to be translated to:

        {
                double _1, _2;
                _1 = x;
                _2 = y*2;
                r = *(double*)tan2(&_1,&_2);
        }

   Value returned from tan2() has to be stored in GC_malloc'ed() area,
   otherwise it wouldn't typeof(tan2) would be not a subtype of
   *('a,'a)->'a.

   Other datatypes (unions, structures, tuples and objects) are passed by
   reference.

   There can be confusion with:

        f(*[int,int] a)
        {
                let [x,y] = a in {
                        x = 10;
                        y = 20
                }
        }

        g()
        {
                int x, y;

                x = 5;
                f([x,y]);
                // whatever here x is still 5, or 10?
        }

   It is 5. However this more due to implementation, then to anything
   else, because it is inconsitent with:

        [x, y] = [10, 20];

   after which x is 10 and y is 20. But this is general problem with
   treating variables as addresses on the left side of `='.

                            About this document ...

   Implementation Notes (aka hacker guide)

   This document was generated using the [12]LaTeX2HTML translator
   Version 99.2beta8 (1.46)

   Copyright  1993, 1994, 1995, 1996, [13]Nikos Drakos, Computer Based
   Learning Unit, University of Leeds.
   Copyright  1997, 1998, 1999, [14]Ross Moore, Mathematics Department,
   Macquarie University, Sydney.

   The command line arguments were:
   latex2html -split 0 hacker.tex

   The translation was initiated by Michal/ Moskal on 2001-11-27
     _________________________________________________________________

   next_inactive up previous


    Michal/ Moskal 2001-11-27

References

   1. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/hacker.html
   2. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00030000000000000000
   3. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00040000000000000000
   4. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00050000000000000000
   5. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00051000000000000000
   6. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00052000000000000000
   7. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00053000000000000000
   8. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00054000000000000000
   9. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00055000000000000000
  10. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00056000000000000000
  11. file://localhost/home/users/malekith/gont/cvs/gont/doc/hacker/index.html#SECTION00060000000000000000
  12. http://www-dsed.llnl.gov/files/programs/unix/latex2html/manual/
  13. http://cbl.leeds.ac.uk/nikos/personal.html
  14. http://www.maths.mq.edu.au/~ross/
