February 04, 2016
On Thursday, 4 February 2016 at 23:25:58 UTC, Ola Fosheim Grøstad wrote:
> In C, compilation units are completely independent, and can in fact come from different compilers and different languages. C is very much a system level programming language.

I should also point out that D can link to (more or less) anything that C can, and yet does not have the weakness exploited by the winning entry.

The only real reason that D is one wit less of a "system level programming language" than C, is the heavyweight runtime library - but that is irrelevant to the problem of type-checking cross-module references within the same code base.
February 05, 2016
On 04.02.2016 23:57, tsbockman wrote:
> http://www.underhanded-c.org/#winner
>
> Actually, I'm surprised that this works even in C - I would have
> expected at least a compiler (or linker?) warning; this seems like it
> should be easy to detect automatically.

You can do the same thing in D, using extern(C) to get no mangling:

main.d:
----
alias float_t = double;
extern(C) float_t deref(float_t* a);
void main()
{
    import std.stdio: writeln;
    float_t d = 1.23;
    writeln(deref(&d)); /* prints "1.01856e-314" */
}
----

deref.d:
----
alias float_t = float;
extern(C) float_t deref(float_t* a) {return *a;}
----

Command to build and run:
----
dmd main.d deref.d && ./main
----

February 04, 2016
On Thursday, 4 February 2016 at 23:40:13 UTC, anonymous wrote:
> You can do the same thing in D, using extern(C) to get no mangling:
>
> main.d:
> ----
> alias float_t = double;
> extern(C) float_t deref(float_t* a);
> void main()
> {
>     import std.stdio: writeln;
>     float_t d = 1.23;
>     writeln(deref(&d)); /* prints "1.01856e-314" */
> }
> ----
>
> deref.d:
> ----
> alias float_t = float;
> extern(C) float_t deref(float_t* a) {return *a;}
> ----
>
> Command to build and run:
> ----
> dmd main.d deref.d && ./main
> ----

You can do the same thing in D if you try, but it's not natural at all to use `extern(C)` for *internal* linkage of an all-D program like that.

Any competent reviewer would certainly question why you were using `extern(C)`; this scores much lower in "underhanded-ness" than the original C program.

Even so, I think that qualifies as a compiler bug or a hole in the D spec.
February 04, 2016
On Thu, Feb 04, 2016 at 11:21:54PM +0000, tsbockman via Digitalmars-d wrote: [...]
> Definitely. What puzzles me about the winning entry, though, is that the compiler and/or linker should be able to trivially detect the type mismatch *after* the preprocessor pass(es) are already done.

It cannot, because C symbols are not mangled. The function name uniquely identifies the function, and the signature is not encoded anywhere.

The linker knows nothing about types or parameters; all it knows is that within offset X of binary blob B, there's a binary number (usually a 32- or 64-bit address) associated with a symbol that it needs to replace with the value (i.e., address) of that symbol, which it obtains from the object file that defines that symbol.

So as far as the linker is concerned, the function names match up, and that's all there is to it.

C provides zero protection against calling functions with mismatched parameters if the caller is not in the same file, and does not have the right declaration. E.g.:

	/* module1.c */
	void func(int a, int b) { ... }

	/* module2.c */
	extern int func(double x); /* I'm too lazy to #include a header */
	int main() {
		int x = func(1.0); /* kaboom */
	}

In theory, this problem is solved by #include'ing the appropriate header file, but even that isn't free from accidents like forgetting to update the header after you change the function signature.  Of course, most sane C projects will also #include the header in the file that defines the function, in which case, finally, the compiler will catch the mistake. But you can see just how fragile this is, and how many points of failure it has, and, believe it or not, there *are* still C projects out there that don't follow the convention of one header per .c file, and of those that do, a frightening number do not #include the header in the .c file.

This isn't the whole story, either. Even if you follow said conventions to prevent function signature mismatches, problems can still occur. For instance, once I've had to debug a mysterious crash problem in an enterprise project that, seemingly, cannot be found in the code.  Turns out, that it was caused by two shared libraries that defined two different functions under the same name. Since the conflicting functions are in separately-compiled libraries, the compiler is oblivious to the conflict. Furthermore, the linker doesn't detect it either, because, being shared libraries, all the linker knows is that it found symbol X in library1, so it didn't bother looking for symbol X again in library2 which is processed afterward. An unrelated code change caused the order of libraries linked to change, and suddenly now the linker finds symbol X in library2 first, leading to the function call being linked to the wrong implementation.  So at runtime, kaboom.

Name mangling singlehandedly solves all of the above problems.


T

-- 
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
February 05, 2016
On 05.02.2016 00:47, tsbockman wrote:
> You can do the same thing in D if you try, but it's not natural at all
> to use `extern(C)` for *internal* linkage of an all-D program like that.
>
> Any competent reviewer would certainly question why you were using
> `extern(C)`; this scores much lower in "underhanded-ness" than the
> original C program.

We do have a lot of bindings to C libraries, though. When there's a wrong alias in one of them, you have the same scenario.

> Even so, I think that qualifies as a compiler bug or a hole in the D spec.

Can anything be done about it? The compiler simply has no way to verify declarations, has it?
February 04, 2016
On Thursday, 4 February 2016 at 23:35:46 UTC, tsbockman wrote:
> Just because *sometimes* the source code of the other module must be compiled independently, is a poor excuse to skip obvious, useful safety checks *all* the time.

The context is a compilation system for building big software on very slow CPUs with kilobytes of RAM.

C was designed for always compiling independently and compiling source files that are bigger than what can be held in RAM, and also for building executables that can fill most of system RAM. So the compilation system was designed for using external memory (disk) and that affects C a lot. The forerunner for C, BCPL was a bootstrap language for writing compilers. So C is minimal by design.

BTW, C++ programmers sometimes use similar unsafe hacks of "pruned header files" to break dependencies and speed up compilation. So this is not unique to C, but C++ introduced the mangling of types into names to support overloading of functions on parameter types, which is why C++ detects (some) type issues at link time.

February 04, 2016
On Thu, Feb 04, 2016 at 11:47:53PM +0000, tsbockman via Digitalmars-d wrote: [...]
> You can do the same thing in D if you try, but it's not natural at all to use `extern(C)` for *internal* linkage of an all-D program like that.
> 
> Any competent reviewer would certainly question why you were using `extern(C)`; this scores much lower in "underhanded-ness" than the original C program.
> 
> Even so, I think that qualifies as a compiler bug or a hole in the D spec.

Nah... while D, by default, tries to be type-safe and prevent guffaws like the above, it *is* also a systems programming language (or at least, that's one of the stated goals), so it does allow you to go under the hood to do things that you normally aren't allowed to do.

Linking to foreign languages is a use case for allowing extern(C) function names: if you know the mangling scheme of the target language, you can declare the mangled name under extern(C) and that will allow D code to call functions written in the target language directly. Otherwise you'd have to change the compiler (and wait for the next release, etc.) before you could do that.


T

-- 
Do not reason with the unreasonable; you lose by definition.
February 05, 2016
On Thursday, 4 February 2016 at 23:51:57 UTC, anonymous wrote:
> We do have a lot of bindings to C libraries, though. When there's a wrong alias in one of them, you have the same scenario.
>
> On 05.02.2016 00:47, tsbockman wrote:
>> Even so, I think that qualifies as a compiler bug or a hole in the D spec.
>
> Can anything be done about it? The compiler simply has no way to verify declarations, has it?

The compiler cannot (in the general case) verify that `extern(C)` declarations are *correct*. What it could do, though, is verify that they are *consistent*.

If the same `extern(C)` symbol is declared multiple places in the D source code for a program, the compiler should issue at least a warning if the D signatures don't agree with each other.
February 05, 2016
On Thu, 04 Feb 2016 23:29:10 +0000, tsbockman wrote:

> That explains why the linker doesn't catch it. I still don't see much excuse for the compiler allowing it though, beyond a desire to allow each module to be compiled independently.

Doing this sort of validation requires build system integration (track the command line arguments that went into producing this object file; find which object files are combined into which targets; run the analysis on that) and costs as much time as compiling the whole project from scratch. Developing such a system is nontrivial, so it's not a matter of conjuring excuses; rather, someone would have to put in considerable effort to make it work.

I'm betting some of the commercial static analyzers for C do this, but they're not the sort of things you install on every dev machine and run on every build. Generally they're the sort of thing that you send off to the security company anda they send you a report some weeks later.
February 05, 2016
On Thu, 04 Feb 2016 15:59:06 -0800, H. S. Teoh via Digitalmars-d wrote:

> Nah... while D, by default, tries to be type-safe and prevent guffaws like the above, it *is* also a systems programming language (or at least, that's one of the stated goals), so it does allow you to go under the hood to do things that you normally aren't allowed to do.

Which suggests a check of this sort should be a warning rather than an error, or perhaps that a pragma or attribute could be offered to ignore it.

Systems languages let you go into "Here Be Dragons" territory, but it would be nice if they still pointed out the signs to you.