July 17, 2013
On Wednesday, 17 July 2013 at 09:27:53 UTC, Jacob Carlborg wrote:
> On 2013-07-17 10:14, Walter Bright wrote:
>
>> Yeah, you do need the full front end. It's pretty amazing how the
>> simplest of .h files seem determined to exercise every last, dark corner
>> of the language.
>>
>> If the converter doesn't accept the full language, you're just going to
>> get a dump truck unloading on it.
>
> When you do have a complete front end you can choose how to handle the language constructs the tool cannot (yet) translate. I.e. just skip it, insert a comment or similar.
>
> If you don't have a full front end and encounters something that you cannot translate, you will most likely have weird behaviors.

Thus we are back to the compiler as library discussion.

--
Paulo

July 17, 2013
On 2013-07-17 13:24, Paulo Pinto wrote:

> Thus we are back to the compiler as library discussion.

Yes, but for the C family of languages we already have a compiler as library, that is Clang.

-- 
/Jacob Carlborg
July 17, 2013
On Wednesday, 17 July 2013 at 07:17:07 UTC, Timothee Cour wrote:
> you'd still need to parse C files recursively (textual inclusion...),
> handle different C function calling conventions, different C standards,
> you'd need a way to forward to dmd different C compiler options (include
> paths to standard / custom libraries), and eventually people will want to
> parse C++ as well anyways. That can be a lot of work. Whereas using
> existing tools takes much less effort and is less error prone.

I'm talking about semantic analysis, you answer with parsing, I'm not sure this is going to lead anywhere. Do you understand that a parser is actually quite a small part of a frontend ?
July 17, 2013
On Wednesday, 17 July 2013 at 09:27:53 UTC, Jacob Carlborg wrote:
> On 2013-07-17 10:14, Walter Bright wrote:
>
>> Yeah, you do need the full front end. It's pretty amazing how the
>> simplest of .h files seem determined to exercise every last, dark corner
>> of the language.
>>
>> If the converter doesn't accept the full language, you're just going to
>> get a dump truck unloading on it.
>
> When you do have a complete front end you can choose how to handle the language constructs the tool cannot (yet) translate. I.e. just skip it, insert a comment or similar.
>
> If you don't have a full front end and encounters something that you cannot translate, you will most likely have weird behaviors.

My understanding is that we only want to convert declaration to D. Can you give me an example of such corner case that would require the full frontend ?
July 17, 2013
On 7/17/2013 2:27 AM, Jacob Carlborg wrote:
> On 2013-07-17 10:14, Walter Bright wrote:
>
>> Yeah, you do need the full front end. It's pretty amazing how the
>> simplest of .h files seem determined to exercise every last, dark corner
>> of the language.
>>
>> If the converter doesn't accept the full language, you're just going to
>> get a dump truck unloading on it.
>
> When you do have a complete front end you can choose how to handle the language
> constructs the tool cannot (yet) translate. I.e. just skip it, insert a comment
> or similar.

Yes, but the front end itself must be complete. Otherwise,
it's not really practical when you're dealing with things like the preprocessor - because a non-compliant front end won't even know it has gone off the rails.

There are other issues when dealing with C .h files:

1. there may be various #define's necessary to compile it that would normally be supplied on the command line to the C compiler

2. there are various behavior switches (see the PR for DMD that wants to set the signed'ness of char types)

3. rather few .h files seem to be standard compliant C. They always rely on various compiler extensions

These problems are not insurmountable, they just are non-trivial and need to be handled for a successful .h file importer.
July 17, 2013
On 7/17/2013 9:48 AM, deadalnix wrote:
> My understanding is that we only want to convert declaration to D. Can you give
> me an example of such corner case that would require the full frontend ?

One example:

--------------------------------
//**************************Header**********************\\

int x;
--------------------------------

Yes, this POS is real C code I got a bug report on. Note the trailing \\. Is that one line splice or two? You have to get the hairy details right. I've seen similar nonsense with trigraphs. I've seen metaprogramming tricks with token pasting. You can't dismiss this stuff.
July 17, 2013
On Wed, Jul 17, 2013 at 12:46:54PM -0700, Walter Bright wrote:
> On 7/17/2013 9:48 AM, deadalnix wrote:
> >My understanding is that we only want to convert declaration to D. Can you give me an example of such corner case that would require the full frontend ?
> 
> One example:
> 
> --------------------------------
> //**************************Header**********************\\
> 
> int x;
> --------------------------------
> 
> Yes, this POS is real C code I got a bug report on. Note the trailing \\. Is that one line splice or two? You have to get the hairy details right. I've seen similar nonsense with trigraphs. I've seen metaprogramming tricks with token pasting. You can't dismiss this stuff.

I've seen C code where the "header" file has function bodies in them.

Though about trigraphs... I've to admit I've never actually seen *real* C code that uses trigraphs, but yeah, needing to account for them can significantly complicate your code.

But as for preprocessor-specific stuff, couldn't we just pipe it through a standalone C preprocessor and be done with it? It can't be *that* hard, right?


T

-- 
Bare foot: (n.) A device for locating thumb tacks on the floor.
July 17, 2013
On 7/17/2013 3:20 PM, H. S. Teoh wrote:
> Though about trigraphs... I've to admit I've never actually seen *real*
> C code that uses trigraphs, but yeah, needing to account for them can
> significantly complicate your code.

Building a correct C front end is a known technology, doing a half-baked job isn't going to impress people.

> But as for preprocessor-specific stuff, couldn't we just pipe it through
> a standalone C preprocessor and be done with it? It can't be *that*
> hard, right?

You could, but then you are left with failing to recognize:

    #define FOO 3

and converting it to:

    enum FOO = 3;

July 17, 2013
On Wed, Jul 17, 2013 at 03:36:15PM -0700, Walter Bright wrote:
> On 7/17/2013 3:20 PM, H. S. Teoh wrote:
> >Though about trigraphs... I've to admit I've never actually seen *real* C code that uses trigraphs, but yeah, needing to account for them can significantly complicate your code.
> 
> Building a correct C front end is a known technology, doing a half-baked job isn't going to impress people.

IOW either you don't do it at all, or you have to go all the way and implement a fully-functional C frontend?

If so, libclang is starting to sound rather attractive...


> >But as for preprocessor-specific stuff, couldn't we just pipe it through a standalone C preprocessor and be done with it? It can't be *that* hard, right?
> 
> You could, but then you are left with failing to recognize:
> 
>     #define FOO 3
> 
> and converting it to:
> 
>     enum FOO = 3;

Hmm. We *could* pre-preprocess the code to do this conversion first to pick out these #define's, then suppress the #define's we understand from the input to the C preprocessor. Something like this:

	bool isSimpleValue(string s) {
		// basically, return true if s is something compilable
		// when put on the right side of "enum x = ...".
	}

	auto pipe = spawnCPreprocessor();
	string[string] manifestConstants;
	foreach (line; inputFile.byLine()) {
		if (auto m=match(line, `^\s*#define\s+(\w+)\s+(.*?)\s+`))
		{
			if (isSimpleValue(m.captures[2])) {
				manifestConstants[m.captures[1]] =
					m.captures[2];

				// Suppress enums that we picked out
				continue;
			}
			// whatever we don't understand, hand over to
			// the C preprocessor
		}
		pipe.writeln(line);
	}

Basically, whatever #define's we can understand, we handle, and anything else we let the C preprocessor deal with. By suppressing the #define's we've picked out, we force the C preprocessor to leave any reference to them as unexpanded identifiers, so that later on we can just generate the enums and the resulting code will match up correctly.


T

-- 
Prosperity breeds contempt, and poverty breeds consent. -- Suck.com
July 18, 2013
On Wednesday, 17 July 2013 at 19:46:54 UTC, Walter Bright wrote:
> On 7/17/2013 9:48 AM, deadalnix wrote:
>> My understanding is that we only want to convert declaration to D. Can you give
>> me an example of such corner case that would require the full frontend ?
>
> One example:
>
> --------------------------------
> //**************************Header**********************\\
>
> int x;
> --------------------------------
>
> Yes, this POS is real C code I got a bug report on. Note the trailing \\. Is that one line splice or two? You have to get the hairy details right. I've seen similar nonsense with trigraphs. I've seen metaprogramming tricks with token pasting. You can't dismiss this stuff.

This do not require semantic analysis.