February 05, 2016
On Friday, 5 February 2016 at 00:03:20 UTC, tsbockman wrote:
> If the same `extern(C)` symbol is declared multiple places in the D source code for a program, the compiler should issue at least a warning if the D signatures don't agree with each other.

I guess D could do it, although this is a rather unlikely source for bugs.

C cannot do it. It would be annoying as declarations are file local.

C doesn't really build programs, it builds object files that are linked into a program.

It makes perfect sense for one compilation unit to type a parameter pointer to float  and another unit to type the same parameter as a simd-array of floats. The underlying code could be machine language. And in machine language there are no types (on current CPUs), only bit patterns. So you can have multiple reasonable interpretations of the same machine language entry.

A type is a constraint, but it isn't a property of the actual bits, it is a language specific interpretation.

February 05, 2016
On Thursday, 4 February 2016 at 23:53:58 UTC, Ola Fosheim Grøstad wrote:
> On Thursday, 4 February 2016 at 23:35:46 UTC, tsbockman wrote:
>> Just because *sometimes* the source code of the other module must be compiled independently, is a poor excuse to skip obvious, useful safety checks *all* the time.
>
> The context is a compilation system for building big software on very slow CPUs with kilobytes of RAM.
>
> C was designed for always compiling independently and compiling source files that are bigger than what can be held in RAM, and also for building executables that can fill most of system RAM. So the compilation system was designed for using external memory (disk) and that affects C a lot. The forerunner for C, BCPL was a bootstrap language for writing compilers. So C is minimal by design.

OK. That's a good reason for C's original design.

But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler running on such a system skip safety checks just because they would be too expensive to run on some *other* computer?

This isn't even a particularly expensive (in compile-time costs) check to perform anyway; all that is necessary is to store a temporary table of symbol signatures somewhere (it doesn't need to be in RAM), and check that any duplicate entries are consistent with each other before linking.

This is already a solved problem in most other programming languages; there is no fundamental reason that the solutions used in D, C++, or Java could not be applied to C - without even changing any of the language semantics.
February 05, 2016
On Friday, 5 February 2016 at 00:12:07 UTC, Ola Fosheim Grøstad wrote:
> It makes perfect sense for one compilation unit to type a parameter pointer to float  and another unit to type the same parameter as a simd-array of floats. The underlying code could be machine language. And in machine language there are no types (on current CPUs), only bit patterns. So you can have multiple reasonable interpretations of the same machine language entry.
>
> A type is a constraint, but it isn't a property of the actual bits, it is a language specific interpretation.

Aliasing types like that can be useful sometimes, but only within certain limits. In particular, the size (with alignment padding) of the types in question must match, otherwise you will corrupt the stack.

It is often useful to cast from one pointer type to another, but that is why C has void* and explicit casts - so that one may document that the reinterpretation is intentional.
February 05, 2016
On Thursday, 4 February 2016 at 23:59:06 UTC, H. S. Teoh wrote:
> On Thu, Feb 04, 2016 at 11:47:53PM +0000, tsbockman via Digitalmars-d wrote: [...]
>> Even so, I think that qualifies as a compiler bug or a hole in the D spec.
>
> Nah... while D, by default, tries to be type-safe and prevent guffaws like the above, it *is* also a systems programming language (or at least, that's one of the stated goals), so it does allow you to go under the hood to do things that you normally aren't allowed to do.
>
> Linking to foreign languages is a use case for allowing extern(C) function names: if you know the mangling scheme of the target language, you can declare the mangled name under extern(C) and that will allow D code to call functions written in the target language directly. Otherwise you'd have to change the compiler (and wait for the next release, etc.) before you could do that.
>
>
> T

I'm not saying that `extern(C)` is bad in general; I understand why it's necessary.

I'm saying that anonymous' example (http://forum.dlang.org/post/n90ngu$1r6v$1@digitalmars.com) showcases a hole in the spec, because in it the D compiler has access to the full source code of the function being linked to, and doesn't bother to verify that its signature in main.d is compatible with the definition in deref.d.

If the D compiler does *not* have access to the function's definition, then obviously it cannot perform this verification.
February 05, 2016
On Friday, 5 February 2016 at 00:03:56 UTC, Chris Wright wrote:
> Doing this sort of validation requires build system integration (track the command line arguments that went into producing this object file; find which object files are combined into which targets; run the analysis on that) and costs as much time as compiling the whole project from scratch.

There is no need to take "as much time as compiling the whole project from scratch".

The necessary information is already gathered during the normal course of compilation; all that is required is to actually save it somewhere until link-time, instead of throwing it away.

The time required for the check should be at most O(N log(N)), where N is the number of function and global variable declarations in the project. The space required for the table is O(N). In both cases the constant factors should be quite small.

> Developing such a system is nontrivial, so it's not a matter of
> conjuring excuses; rather, someone would have to put in
> considerable effort to make it work.

Adding any interesting feature to a build system is usually nontrivial, but I still think you're overestimating the cost of this one.

Again, the hard part (finding all the signatures and processing them into a semantically meaningful form) is already being done by the compiler. The results just need to be saved, sorted, and scanned for conflicts.
February 05, 2016
On Friday, 5 February 2016 at 00:07:45 UTC, Chris Wright wrote:
> Which suggests a check of this sort should be a warning rather than an error, or perhaps that a pragma or attribute could be offered to ignore it.
>
> Systems languages let you go into "Here Be Dragons" territory, but it would be nice if they still pointed out the signs to you.

Yes.
February 05, 2016
On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:
> But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler running on such a system skip safety checks just because they would be too expensive to run on some *other* computer?

C has to be backwards compatible, but I don't know why people do larger projects in C in 2016.

Libraries are done in C for portability and because it provides a FFI interface defined as the ABI by hardware and OS vendors. BeOS tried to define a specific C++ compiler as their ABI, but it was problematic.

C++ does not have an ABI, you cannot link object files from different C++ compilers. Java/C# are not system level languages. So, basically, there is no suitable industry standard other than C.

> This is already a solved problem in most other programming languages; there is no fundamental reason that the solutions used in D, C++, or Java could not be applied to C - without even changing any of the language semantics.

D and C++ change.  C uses the ABI defined by the hardware/OS vendor.  It is locked in stone, frozen, beyond discussion.

As mentioned BeOS adopted C++. Apple has adopted Objective-C and Swift. But how can you make _all_ the other vendors (Microsoft, Google, IBM etc) standardize on something that isn't C?

> Aliasing types like that can be useful sometimes, but only within certain limits. In particular, the size (with alignment padding) of the types in question must match, otherwise you will corrupt the stack.

I see where you are coming from, but I meant what I said literally. Machine language only deals with bitpatterns. When we interface with machine language we just add lots of constraints on what we hand over to it. Adding _more_ constraints the the creator of the machine language code intended is never wrong. Not adding enough constraints is not ideal, but often difficult to avoid if we care about performance.

So if I write a piece of machine language code and give you the object file you only have my words for what the input is supposed to be. And then you have to make a formulation of the constraints that fits your use case and is expressible in your language. Different languages have different levels of expressiveness for describing and enforcing type constraints.

February 05, 2016
On Friday, 5 February 2016 at 00:41:52 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:
>> But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler running on such a system skip safety checks just because they would be too expensive to run on some *other* computer?
>
> C has to be backwards compatible, but I don't know why people do larger projects in C in 2016.
> [...]

Why would simply adding a warning change any of that?

No ABI changes are required. Backwards compatibility is not broken.
February 05, 2016
On Fri, 05 Feb 2016 00:38:16 +0000, tsbockman wrote:

> On Friday, 5 February 2016 at 00:03:56 UTC, Chris Wright wrote:
>> Doing this sort of validation requires build system integration (track the command line arguments that went into producing this object file; find which object files are combined into which targets; run the analysis on that) and costs as much time as compiling the whole project from scratch.
> 
> There is no need to take "as much time as compiling the whole project from scratch".
> 
> The necessary information is already gathered during the normal course of compilation; all that is required is to actually save it somewhere until link-time, instead of throwing it away.

True. That works if this is baked into your compiler, or if your compiler has plugin support. And you'd have to compile with this plugin or the relevant options turned on by default in order for you not to duplicate work.

That's partly an engineering issue (build this thing in this particular way) and partly a social issue (get people to run it by default; have them add the extra flag to the makefile to specify to create the relevant output; possibly get your compiler vendor to build it in, depending on what compiler your devs are using).

I imagine Google, to take a random example where I have experience, would add this as a presubmit step rather than requiring it on every build.
February 05, 2016
On Friday, 5 February 2016 at 00:50:32 UTC, tsbockman wrote:
> On Friday, 5 February 2016 at 00:41:52 UTC, Ola Fosheim Grøstad wrote:
>> On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:
>>> But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler running on such a system skip safety checks just because they would be too expensive to run on some *other* computer?
>>
>> C has to be backwards compatible, but I don't know why people do larger projects in C in 2016.
>> [...]
>
> Why would simply adding a warning change any of that?
>
> No ABI changes are required. Backwards compatibility is not broken.

Not sure what you mean by adding a warning. You can probably find sanitizers that do it, but the standard does not require warnings for anything (AFAIK). That is up to compiler vendors.

As for why C isn't displaced by something better, maybe the right question is: why don't new languages stick to the C ABI and provide sensible C code gen.

Well, they want more features... and features... and features...

There is probably a market for it, but nobody can be bothered to create and maintain a simple modern system level language.