February 05, 2016
On Friday, 5 February 2016 at 06:05:49 UTC, Chris Wright wrote:
> It doesn't know what targets I'm ultimately creating, and it doesn't know what files have been modified that I'm about to compile (but haven't compiled yet).
>
> Example 1:
>
> I compile one .c file referencing a function:
> void foo(int);
>
> That's going to end up in libfoo.so.
>
> I compile another .c file in the same directory defining a function:
> void foo(float);
>
> That's going to end up in libbar.so.
>
> No bug here. (The linker should tell us if someone depends on foo from libbar and foo from libfoo in the same executable.)
>
> How does your putative compiler plugin handle it? Either I have to define a build rule for every source file to specify where to put this symbol cache (and you need to add parameters for the plugin to look for multiple caches, because libfoo and libbar share a lot of source files), or the plugin gives me false positives.
>
> Example 2:
>
> I compile a.c:
> int foo(int i) { return i + 1; }
>
> In the course of refactoring, I delete that function from a.c and add it
> to b.c with modifications:
> int foo(int i, int increment) { return i + increment; }
>
> My build script recompiles b.c before it recompiles a.c. Your compiler plugin produces a build error, halting my build. I have to make clean && make in order to proceed -- and that's assuming I know your tool doesn't work well with incremental compilation.
>
> The first problem might be uncommon, but the second would crop up constantly. They have the same fix: collect the information when you compile, evaluate it when you link.

No spurious error is generated by my proposal in your example 2, because I specifically stated that the extra pass must be done once, after *all* modules have been compiled.

I see, however, that this would require one of:

1) Modifying build scripts to pass the complete list of .c files to the compiler in a single command, or
2) Modifying build scripts to run the compiler one extra time after processing all the .c files, or
3) Run the final check at link-time.

For a C tool chain with a clean-sheet design, any of those would handle example 2 fine. (1) or (3) could also handle example 1 without issue.

However, as you say, only (3) is backwards compatible with existing make files and what-not. (This is not a limitation of the C language or ABI, though.)
February 04, 2016
On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:
> On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
> >On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
> >The compiler doesn't have all the information you need. You could add it
> >to the build system or the linker as well as the compiler. Adding it to
> >the linker is almost identical to my previous suggestion of adding
> >optional name mangling to C.
> 
> What information, specifically, is the compiler missing?
> 
> The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to:
> 
> 1) Insert that information (together with what file and line number it
> came from) into a big list in a temporary file.
> 2) After all modules have been compiled, go back and sort the list by
> function name.

This would make compilation of large projects excruciatingly slow.


> 3) Finally, scan the list for entries that share the same name, but have incompatible type signatures. Emit warning messages as needed. (The compiler should be used for this step, because it already has a lot of information about C's type system built into it that can help define "incompatible" sensibly.)

This fails for multi-executable projects, which may legally have different functions under the same name. (Even though that's arguably a very bad idea.)


> As far as I can see, this requires an extra pass, but no additional information. What am I missing?

The fact that the C compiler only sees one file at a time, and has no idea which one, if any, of them will even end up in the final executable. Many projects produce multiple executables with some shared sources between them, and only the build system knows which file(s) go with which executables.

So as others have said, this can only work for compilers that are aware of the larger picture than just the single source file it's currently compiling. Even in D, for a sufficiently large project the compiler can't see everything at once either, because it won't fit into your RAM. Thankfully, D doesn't suffer from this particular problem because of name mangling.

Which is why I said, adding name mangling to the C compiler will solve this problem. Except that it breaks existing inter-language code, so it won't work for *all* C programs. And it will also break linkage with existing shared libraries, which are *not* name-mangled. (Recompiling said libraries may not be an option if they are OEM, binary-only blobs.) So it can only work for self-contained, independent projects with no inter-language linkage, which would be a very restricted subset of C codebases.


T

-- 
Nobody is perfect.  I am Nobody. -- pepoluan, GKC forum
February 04, 2016
On Fri, Feb 05, 2016 at 04:39:13AM +0000, tsbockman via Digitalmars-d wrote: [...]
> Thanks for the explanation. That does sound basically the same as the C issue.
> 
> Since .di files are normally generated automatically, this seems like an easily solvable problem:
> 
> 1) When compiling a library and its attendant .di file(s), generate a
> unique version identifier (such as a UUID or a hash of the completed
> binary) and append it to both the library and each .di file.
> 
> 2) Whenever someone tries to link against the library, verify that the version ID matches. If it does not, issue a prominent warning.
[...]

This would break shared library upgrades that do not change the ABI.

Plus, it doesn't fix wrong linkage at runtime, because the dynamic linker is part of the OS and the D compiler has no control over what it does beyond the standard symbol matching and relocation mechanisms. If you compile against libfoo, but at runtime the user happens to have a stale, ABI-incompatible version of libfoo hanging around that gets picked up by the dynamic linker, you'll have the same problem.


T

-- 
VI = Visual Irritation
February 05, 2016
On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
> On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:
>> On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
>> >On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
>> What information, specifically, is the compiler missing?
>> 
>> The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to:
>> 
>> 1) Insert that information (together with what file and line number it
>> came from) into a big list in a temporary file.
>> 2) After all modules have been compiled, go back and sort the list by
>> function name.
>
> This would make compilation of large projects excruciatingly slow.

It's a small fraction of the total data being handled by the compiler (smaller than the source code), and the list could probably be directly generated in a partially sorted state. Little-to-no random access to the list is required at any point in the process. It does not ever need to all be in RAM at the same time.

I can see it may cost more than it's actually worth, but where does the "excruciatingly slow" part come from?

>> 3) Finally, scan the list for entries that share the same name, but have incompatible type signatures. Emit warning messages as needed. (The compiler should be used for this step, because it already has a lot of information about C's type system built into it that can help define "incompatible" sensibly.)
>
> This fails for multi-executable projects, which may legally have different functions under the same name. (Even though that's arguably a very bad idea.)

Chris Wright pointed this out, as well. This just means the final pass should be done at link-time, though. It's not a fundamental problem with generating the warning.

>> As far as I can see, this requires an extra pass, but no additional information. What am I missing?
>
> The fact that the C compiler only sees one file at a time, and has no idea which one, if any, of them will even end up in the final executable. Many projects produce multiple executables with some shared sources between them, and only the build system knows which file(s) go with which executables.

This could be worked around with a little cooperation between the compiler and the linker. It's not even a feature of C the language - it's just the way current tool chains happen to work.
February 05, 2016
On Friday, 5 February 2016 at 07:15:56 UTC, H. S. Teoh wrote:
> This would break shared library upgrades that do not change the ABI.
>
> Plus, it doesn't fix wrong linkage at runtime, because the dynamic linker is part of the OS and the D compiler has no control over what it does beyond the standard symbol matching and relocation mechanisms. If you compile against libfoo, but at runtime the user happens to have a stale, ABI-incompatible version of libfoo hanging around that gets picked up by the dynamic linker, you'll have the same problem.

I should have clarified that I was considering static libraries, only. (I thought D's dynamic library support was kind of broken right at the moment, anyway?)

Dynamic libraries are definitely a harder problem. I think useful automated protection against bad .di files could be developed for dynamic libraries as well, but the scheme wouldn't be anywhere near as simple and it might require the maintainer to actually follow SemVer to be useful.
February 05, 2016
On Friday, 5 February 2016 at 01:10:53 UTC, tsbockman wrote:
> All along I have been saying this is something that *compilers* should warn about. As far as I can recall, I never suggested using linters, sanitizers, changing the C standard - or even compiler plugins.

Well, compilers "should" only implement the standard, then they "may" add extra static analysis.

The direction C and C++ takes is that increasing compilation times by doing extra static analysis on every build isn't desirable. Therefore compilers should focus on what is necessary for code gen and optimization and sanitizers should focus on correctness.

This is different from Rust, who do sanitization as part of their compilation, but that makes the compiler more complicated and/or much _slower_.

> (I did suggest the linker as an alternative, but you all have already explained why that can't work for C.)

It can work if you compile all source files with the same compiler, that has historically not been the case as commercial libraries would be compiled with other compilers or be handwritten assembly.

C compilers that do Whole Program Analysis have dedicated linkers that should be able to do extended type checking if the IR used in the object file provides typing info. I don't know if Clang or GCC does emit typing info though, but they _could_. Yes.

February 05, 2016
Let me add to this that the superior approach is to compile to an intermediated high level format that retains type information. I guess this is where Rust is heading.

It just isn't possible with C semantics to make a reasonable version of that, since the language itself is 90% unsafe and just a small step up from assembly (for good and bad).

February 05, 2016
On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
> On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:
>> 1) Insert that information (together with what file and line number it
>> came from) into a big list in a temporary file.
>> 2) After all modules have been compiled, go back and sort the list by
>> function name.
>
> This would make compilation of large projects excruciatingly slow.

I did some quick tests on my system, and even with 100,000,000 names (more names than there are lines of code in the Linux kernel...) this can be done in less than three minutes. Smaller projects take seconds or less.

I suspect there is a major disconnect between what I meant, and what you think I meant.
February 05, 2016
On 5/02/2016 11:03 AM, tsbockman wrote:
>
> The compiler cannot (in the general case) verify that `extern(C)`
> declarations are *correct*. What it could do, though, is verify that
> they are *consistent*.
>
> If the same `extern(C)` symbol is declared multiple places in the D
> source code for a program, the compiler should issue at least a warning
> if the D signatures don't agree with each other.

Currently D allows overloading extern(C) declarations, see
https://issues.dlang.org/show_bug.cgi?id=15217

Checking for invalid overloads with non-D linkage is covered here:
https://issues.dlang.org/show_bug.cgi?id=2789

But neither of these cover overloads that aren't simultaneously visible.
15217 shows us that this lack of checking, when combined with D's abundant binary-compatible-but-distinct types, is somewhat useful.

Apart from some scary ABI hacks there is nothing really stopping us from enforcing that all non-D function in all modules included in a single compilation have distinct symbol names or (at least binary-compatible) matching D parameters.
February 05, 2016
On Friday, 5 February 2016 at 10:49:50 UTC, Daniel Murphy wrote:
> Currently D allows overloading extern(C) declarations, see
> https://issues.dlang.org/show_bug.cgi?id=15217
>
> Checking for invalid overloads with non-D linkage is covered here:
> https://issues.dlang.org/show_bug.cgi?id=2789
>
> But neither of these cover overloads that aren't simultaneously visible.
> 15217 shows us that this lack of checking, when combined with D's abundant binary-compatible-but-distinct types, is somewhat useful.
>
> Apart from some scary ABI hacks there is nothing really stopping us from enforcing that all non-D function in all modules included in a single compilation have distinct symbol names or (at least binary-compatible) matching D parameters.

I think it makes sense (when actually linking to C) to allow stuff like druntime's creative use of overloads. The signatures of the two bsd_signal() overloads are compatible (from C's perspective), so why not?

However, multiple `extern(C)` overloads that differ in the number or size of arguments should trigger a warning. Signed versus unsigned or even int versus floating point is more of a gray area.

Overloads with conflicting pointer types should definitely be allowed, but ideally the compiler would force them to be marked @system or @trusted, since there is an implied unsafe cast in there somewhere.