Type safety could prevent nuclear war (page 6)

On 5/02/2016 10:07 PM, tsbockman wrote: > > I think it makes sense (when actually linking to C) to allow stuff like > druntime's creative use of overloads. The signatures of the two > bsd_signal() overloads are compatible (from C's perspective), so why not? > > However, multiple `extern(C)` overloads that differ in the number or > size of arguments should trigger a warning. Signed versus unsigned or > even int versus floating point is more of a gray area. > That's what I meant by binary compatible. > Overloads with conflicting pointer types should definitely be allowed, > but ideally the compiler would force them to be marked @system or > @trusted, since there is an implied unsafe cast in there somewhere. Safety on C functions is always going to need to be hand verified, the presence of overloads doesn't change that. Conflicting pointer types are pretty much the same as a function taking void* - all the unsafe stuff is on the other side and invisible to the D compiler.

February 05, 2016

Re: Type safety could prevent nuclear war

Posted by H. S. Teoh
in reply to tsbockman

Permalink

H. S. Teoh

Posted in reply to tsbockman

Permalink

On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d wrote:
> On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
> >On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:
> >>On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
> >>>On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
> >>What information, specifically, is the compiler missing?
> >>
> >>The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to:
> >>
> >>1) Insert that information (together with what file and line number
> >>it came from) into a big list in a temporary file.
> >>2) After all modules have been compiled, go back and sort the list
> >>by function name.
> >
> >This would make compilation of large projects excruciatingly slow.
> 
> It's a small fraction of the total data being handled by the compiler (smaller than the source code), and the list could probably be directly generated in a partially sorted state. Little-to-no random access to the list is required at any point in the process. It does not ever need to all be in RAM at the same time.
> 
> I can see it may cost more than it's actually worth, but where does the "excruciatingly slow" part come from?

OK, probably I'm misunderstanding something here. :-P

> >>3) Finally, scan the list for entries that share the same name, but have incompatible type signatures. Emit warning messages as needed. (The compiler should be used for this step, because it already has a lot of information about C's type system built into it that can help define "incompatible" sensibly.)
> >
> >This fails for multi-executable projects, which may legally have different functions under the same name. (Even though that's arguably a very bad idea.)
> 
> Chris Wright pointed this out, as well. This just means the final pass should be done at link-time, though. It's not a fundamental problem with generating the warning.

The problem is, the linker knows nothing about the language. Arguably it should, but as things stand currently, it doesn't, and can't, because usually linkers are shipped with the OS, and are expected to link object files of *any* pedigree without needing to code for language-explicit checks.

Perhaps this is slowly starting to change, as LTO and other recent innovations are pushing the envelope of what the linker can do.  Maybe one day there will emerge a language-agnostic way for the linker to check for such errors... but I really don't see it happening, because languages *other* than C have already solved the problem with name mangling. There isn't much motivation for linkers to change just because C has some language design issues.

(And note that I'm not trying to disagree with you -- I'm totally in agreement that what C allows is oftentimes extremely dangerous and rather unwise. But the way things are is just so entrenched that it's unlikely to change in the near (or even distant) future.)

> >>As far as I can see, this requires an extra pass, but no additional information. What am I missing?
> >
> >The fact that the C compiler only sees one file at a time, and has no idea which one, if any, of them will even end up in the final executable. Many projects produce multiple executables with some shared sources between them, and only the build system knows which file(s) go with which executables.
> 
> This could be worked around with a little cooperation between the compiler and the linker. It's not even a feature of C the language - it's just the way current tool chains happen to work.

And that's where the sticky part lies. Current toolchains work in this, arguably suboptimal, way mainly because of historical baggage, but more because doing otherwise will make the toolchain incompatible with existing other toolchains and systems. The current divide between compiler and linker is actually IMO not in the best place it could be, as it hampers a lot of what, arguably, should be the compiler's job, not the linker's. Nevertheless, changing this means you become incompatible with much of the ecosystem and become a walled garden -- like Java (JNI was an afterthought, and requires a very specific setup to even work -- there's definitely no way to link Java objects with OS-level object files without jumping through lots of hoops with lots of caveats). I just don't see this ever happening, especially not for something that, in the big picture, really isn't *that* big of a deal. After all, C coders have gotten used to working with far more dangerous things in C than merely mismatched prototypes; it would take a LOT more than that for people to accept changing the way things work.

T

-- 
Skill without imagination is craftsmanship and gives us many useful objects such as wickerwork picnic baskets.  Imagination without skill gives us modern art. -- Tom Stoppard

On Fri, 05 Feb 2016 10:04:01 -0800, H. S. Teoh via Digitalmars-d wrote: > On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d wrote: >> On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote: >> >On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote: >> >>On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote: >> >>>On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote: >> >>What information, specifically, is the compiler missing? >> >> >> >>The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: >> >> >> >>1) Insert that information (together with what file and line number >> >>it came from) into a big list in a temporary file. >> >>2) After all modules have been compiled, go back and sort the list by >> >>function name. >> > >> >This would make compilation of large projects excruciatingly slow. >> >> It's a small fraction of the total data being handled by the compiler (smaller than the source code), and the list could probably be directly generated in a partially sorted state. Little-to-no random access to the list is required at any point in the process. It does not ever need to all be in RAM at the same time. >> >> I can see it may cost more than it's actually worth, but where does the "excruciatingly slow" part come from? > > OK, probably I'm misunderstanding something here. :-P I think you're talking about maintaining an in-memory, modifiable data structure, doing one insert per operation and one point query per use. That's useful for incremental compilation, but it's going to be pretty slow. tsbockman is thinking of a single pass at link time that checks everything at once. You append an entry to a list for each prototype and definition, then later sort all those lists together by name. Error on duplicate names with mismatched signatures. This is faster for fresh builds than it is for incremental compilation -- tsbockman mentioned a brief benchmark, and that cost would crop up on every build, even if you'd only changed one line of code. (Granted, that example was pretty huge.) But this might typically be faster than a bunch of point queries even with incremental compilation. Anyway, that's why I'm thinking most people who used such a feature would turn it on in their continuous integration server or as a presubmit step rather than every build. > The problem is, the linker knows nothing about the language. We're only talking about a linker because we need to run this tool after compiling all your files, and it has to know what input files you're putting into the linker. So this "linker" is really just a shell script that invokes our checker and then calls the system linker.

On Friday, 5 February 2016 at 20:35:16 UTC, Chris Wright wrote: > On Fri, 05 Feb 2016 10:04:01 -0800, H. S. Teoh via Digitalmars-d wrote: > >> On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d wrote: >>> On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote: >> OK, probably I'm misunderstanding something here. :-P > > I think you're talking about maintaining an in-memory, modifiable data structure, doing one insert per operation and one point query per use. That's useful for incremental compilation, but it's going to be pretty slow. > > tsbockman is thinking of a single pass at link time that checks everything at once. You append an entry to a list for each prototype and definition, then later sort all those lists together by name. Error on duplicate names with mismatched signatures. Yes. > This is faster for fresh builds than it is for incremental compilation -- tsbockman mentioned a brief benchmark, and that cost would crop up on every build, even if you'd only changed one line of code. (Granted, that example was pretty huge.) But this might typically be faster than a bunch of point queries even with incremental compilation. > > Anyway, that's why I'm thinking most people who used such a feature would turn it on in their continuous integration server or as a presubmit step rather than every build. It doesn't necessarily have to be slow when you only changed one line: * The list from the previous compilation could be re-used to speed things up considerably, although retaining it would cost some disk space. * If that's still too expensive, just don't cross-check files that aren't being recompiled. The check will be less useful on incremental builds, but not *useless*. The CI server can still do the full check (using the compiler), as you suggest. >> The problem is, the linker knows nothing about the language. > > We're only talking about a linker because we need to run this tool after compiling all your files, and it has to know what input files you're putting into the linker. > > So this "linker" is really just a shell script that invokes our checker and then calls the system linker. Yes. (Or, it's the compiler with a special option set, which then calls the linker after it finishes its global pre-link tasks.)

Forums