February 19, 2003
http://manju.cs.berkeley.edu/cil/index.html

I was amused that CIL is written in OCaml.  OCaml just continues to amaze.  The CIL license is loose, so this tool might have uses for D.  I can envision a D front end written in OCaml that is one-quarter its present size and twice as robust.  The CIL tool has processed the ENTIRE linux kernel successfully, quirks and all.  -M.

---------------------------------------------------------------

CIL (C Intermediate Language) is a high-level representation along with a set of tools that permit easy analysis and source-to-source transformation of C programs.

CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous constructs and removing redundant ones, and also higher-level than typical intermediate languages designed for compilation, by maintaining types and a close relationship with the source program. The main advantage of CIL is that it compiles all valid C programs into a few core constructs with a very clean semantics. Also CIL has a syntax-directed type system that makes it easy to analyze and manipulate C programs. Furthermore, the CIL front-end is able to process not only ANSI-C programs but also those using Microsoft C or GNU C extensions. If you do not use CIL and want instead to use just a C parser and analyze programs expressed as abstract-syntax trees then your analysis will have to handle a lot of ugly corners of the language (let alone the fact that parsing C itself is not a trivial task). See Section 15 for some examples of such extreme programs that CIL simplifies for you.

In essence, CIL is a highly-structured, 'clean' subset of C. CIL features a reduced number of syntactic and conceptual forms. For example, all looping constructs are reduced to a single form, all function bodies are given explicit return statements, syntactic sugar like "->" is eliminated and function arguments with array types become pointers. (For an extensive list of how CIL simplifies C programs, see Section 3.) This reduces the number of cases that must be considered when manipulating a C program. CIL also separates type declarations from code and flattens scopes within function bodies. This structures the program in a manner more amenable to rapid analysis and transformation. CIL computes the types of all program expressions, and makes all type promotions and casts explicit. CIL supports all GCC and MSVC extensions except for nested functions and complex numbers. Finally, CIL organizes C's imperative features into expressions, instructions and statements based on the presence and absence of side-effects and control-flow. Every statement can be annotated with successor and predecessor information. Thus CIL provides an integrated program representation that can be used with routines that require an AST (e.g. type-based analyses and pretty-printers), as well as with routines that require a CFG (e.g., dataflow analyses).

CIL comes accompanied by a number of Perl scripts that perform generally useful operations on code: A driver which behaves as either the gcc or Microsoft VC compiler and can invoke the preprocessor followed by the CIL application. The advantage of this script is that you can easily use CIL and the analyses written for CIL with existing make files.

A whole-program merger that you can use as a replacement for your compiler and it learns all the files you compile when you make a project and merges all of the preprocessed source files into a single one. This makes it easy to do whole-program analysis.

A patcher makes it easy to create modified copies of the system include files. The CIL driver can then be told to use these patched copies instead of the standard ones.

CIL has been tested very extensively. It is able to process the SPECINT95 benchmarks, the Linux kernel, GIMP and other open-source projects. All of these programs are compiled to the simple CIL and then passed to gcc and they still run! We consider the compilation of Linux a major feat especially since Linux contains many of the ugly GCC extensions (see Section 15.2). This adds to about 1,000,000 lines of code that we tested it on. It is also able to process the few Microsoft NT device drivers that we have had access to. CIL was tested against GCC's c-torture testsuite and (except for the tests involving complex numbers and inner functions, which CIL does not currently implement) CIL passes most of the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests that it should pass. GCC itself fails 19 tests. A total of 1400 regression test cases are run automatically on each change to the CIL sources.

CIL is relatively independent on the underlying machine and compiler. When you build it CIL will configure itself according to the underlying compiler. However, CIL has only been tested on Intel x86 using the gcc compiler on Linux and cygwin and using the MS Visual C compiler. (See below for specific versions of these compilers that we have used CIL for.)

The largest application we have used CIL for is CCured, a compiler that compiles C code into type-safe code by analyzing your pointer usage and inserting runtime checks in the places that cannot be guaranteed statically to be type safe. [Note: the Cyclone folks think they did CCured one better; see their PDF intro which mentions CCured.]


February 19, 2003
Pretty neat!  Seems like a much easier route than implementing all the back-end pieces of a compiler yet again.

   Dan


"Mark Evans" <Mark_member@pathlink.com> wrote in message news:b2vekk$2940$1@digitaldaemon.com...
> http://manju.cs.berkeley.edu/cil/index.html
>
> I was amused that CIL is written in OCaml.  OCaml just continues to amaze.
The
> CIL license is loose, so this tool might have uses for D.  I can envision
a D
> front end written in OCaml that is one-quarter its present size and twice
as
> robust.  The CIL tool has processed the ENTIRE linux kernel successfully,
quirks
> and all.  -M.
>
> ---------------------------------------------------------------
>
> CIL (C Intermediate Language) is a high-level representation along with a
set of
> tools that permit easy analysis and source-to-source transformation of C programs.
>
> CIL is both lower-level than abstract-syntax trees, by clarifying
ambiguous
> constructs and removing redundant ones, and also higher-level than typical intermediate languages designed for compilation, by maintaining types and
a
> close relationship with the source program. The main advantage of CIL is
that it
> compiles all valid C programs into a few core constructs with a very clean semantics. Also CIL has a syntax-directed type system that makes it easy
to
> analyze and manipulate C programs. Furthermore, the CIL front-end is able
to
> process not only ANSI-C programs but also those using Microsoft C or GNU C extensions. If you do not use CIL and want instead to use just a C parser
and
> analyze programs expressed as abstract-syntax trees then your analysis
will have
> to handle a lot of ugly corners of the language (let alone the fact that
parsing
> C itself is not a trivial task). See Section 15 for some examples of such extreme programs that CIL simplifies for you.
>
> In essence, CIL is a highly-structured, 'clean' subset of C. CIL features
a
> reduced number of syntactic and conceptual forms. For example, all looping constructs are reduced to a single form, all function bodies are given
explicit
> return statements, syntactic sugar like "->" is eliminated and function arguments with array types become pointers. (For an extensive list of how
CIL
> simplifies C programs, see Section 3.) This reduces the number of cases
that
> must be considered when manipulating a C program. CIL also separates type declarations from code and flattens scopes within function bodies. This structures the program in a manner more amenable to rapid analysis and transformation. CIL computes the types of all program expressions, and
makes all
> type promotions and casts explicit. CIL supports all GCC and MSVC
extensions
> except for nested functions and complex numbers. Finally, CIL organizes
C's
> imperative features into expressions, instructions and statements based on
the
> presence and absence of side-effects and control-flow. Every statement can
be
> annotated with successor and predecessor information. Thus CIL provides an integrated program representation that can be used with routines that
require an
> AST (e.g. type-based analyses and pretty-printers), as well as with
routines
> that require a CFG (e.g., dataflow analyses).
>
> CIL comes accompanied by a number of Perl scripts that perform generally
useful
> operations on code: A driver which behaves as either the gcc or Microsoft
VC
> compiler and can invoke the preprocessor followed by the CIL application.
The
> advantage of this script is that you can easily use CIL and the analyses
written
> for CIL with existing make files.
>
> A whole-program merger that you can use as a replacement for your compiler
and
> it learns all the files you compile when you make a project and merges all
of
> the preprocessed source files into a single one. This makes it easy to do whole-program analysis.
>
> A patcher makes it easy to create modified copies of the system include
files.
> The CIL driver can then be told to use these patched copies instead of the standard ones.
>
> CIL has been tested very extensively. It is able to process the SPECINT95 benchmarks, the Linux kernel, GIMP and other open-source projects. All of
these
> programs are compiled to the simple CIL and then passed to gcc and they
still
> run! We consider the compilation of Linux a major feat especially since
Linux
> contains many of the ugly GCC extensions (see Section 15.2). This adds to
about
> 1,000,000 lines of code that we tested it on. It is also able to process
the few
> Microsoft NT device drivers that we have had access to. CIL was tested
against
> GCC's c-torture testsuite and (except for the tests involving complex
numbers
> and inner functions, which CIL does not currently implement) CIL passes
most of
> the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests
that
> it should pass. GCC itself fails 19 tests. A total of 1400 regression test
cases
> are run automatically on each change to the CIL sources.
>
> CIL is relatively independent on the underlying machine and compiler. When
you
> build it CIL will configure itself according to the underlying compiler. However, CIL has only been tested on Intel x86 using the gcc compiler on
Linux
> and cygwin and using the MS Visual C compiler. (See below for specific
versions
> of these compilers that we have used CIL for.)
>
> The largest application we have used CIL for is CCured, a compiler that
compiles
> C code into type-safe code by analyzing your pointer usage and inserting
runtime
> checks in the places that cannot be guaranteed statically to be type safe. [Note: the Cyclone folks think they did CCured one better; see their PDF
intro
> which mentions CCured.]
>
>