September 18, 2016
On 9/18/2016 8:20 AM, Andrei Alexandrescu wrote:
> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>> Simplest case is - source file is being changed, therefore a new object
>> file is being produced, therefore a new executable is being produced.
>
> Forgot to mention a situation here: if you change the source code of a module
> without influencing the object file (e.g. documentation, certain style changes,
> unittests in non-unittest builds etc) there'd be no linking upon rebuilding. --


The compiler currently creates the complete object file in a buffer, then writes the buffer to a file in one command. The reason is mostly because the object file format isn't incremental, the beginning is written last and the body gets backpatched as the compilation progresses.

I can't really see a compilation producing an object file where the first half of it matches the previous object file and the second half is different, because of the file format.

Interestingly, the win32 .lib format is designed for incredibly slow floppy disks, in that updating the library need not read/write every disk sector.

I'd love to design our own high speed formats, but then they'd be incompatible with everybody else's.
September 18, 2016
On 9/18/2016 7:05 PM, Brad Roberts via Digitalmars-d wrote:
> This is nice in the case of no changes, but problematic in the case of some
> changes.  The standard write new, rename technique never has either file in a
> half-right state.  The file is atomically either old or new and nothing in
> between.  This can be critical.

As for compilation, I bet considerable speed increases could be had by never writing object files at all. (Not only does it save the read/write file time, but it saves the encoding into the object file format and decoding of that format.) Have the compiler do the linking directly.

dmd already does this for generating library files directly, and it's been very successful (although sometimes I suspect nobody has noticed(!) which is actually a good thing). It took surprisingly little code to make that work, though doing a link step would be far more work.
September 19, 2016
On 2016-09-19 07:16, Walter Bright wrote:

> I'd love to design our own high speed formats, but then they'd be
> incompatible with everybody else's.

You already mentioned in an other post [1] that the compiler could do the linking as well. In that case you would need to write some form of linker. Then I suggest to develop the linker as a library, supporting all formats DMD currently supports. The library can be used both directly from DMD and to build an external linker. When we have our own linker we could create our own format too without having to worry about compatibility.

I guess we need to create other tools for the new format as well, like object dumpers. But I assume that's a natural thing to do anyway.

Bundle that with something like musl libc and we will have our own complete tool chain. It would also be easier to add support for cross-compiling.

[1] http://forum.dlang.org/post/nrnsn7$1h3k$1@digitalmars.com

-- 
/Jacob Carlborg
September 19, 2016
On Monday, 19 September 2016 at 05:16:37 UTC, Walter Bright wrote:
>
> I'd love to design our own high speed formats, but then they'd be incompatible with everybody else's.

I'd like that as well.

I recently had a look at the ELF and the COFF file formats both are definitely in need of rework and dust-off :-)

There are some nice things we could do if we had certain features on every platform, wrt. linking and symbol-tables.

However the maintenance burden is a bit heavy we don't have enough menpower as it is.
September 18, 2016
On 9/18/2016 11:33 PM, Stefan Koch wrote:
> However the maintenance burden is a bit heavy we don't have enough menpower as
> it is.

A major part of the problem (that working with Optlink has made painfully clear) is that although linking is conceptually a rather trivial task, the people who've designed the file formats have an unending love of making trivial things exceedingly complicated. Furthermore, the weird things about the format are 98% undocumented lore.

DMD still has problems generating "correct" Dwarf debug info because its correctness is not defined by the spec, but by lore and the idiosyncratic way that gcc emits it.

Doing a linker inside DMD means that object files imported from other C/C++ compilers have to be correctly interpreted. I could do it, but I couldn't do that and continue to work on D.
September 19, 2016
On Monday, 19 September 2016 at 06:53:47 UTC, Walter Bright wrote:
> Doing a linker inside DMD means that object files imported from other C/C++ compilers have to be correctly interpreted. I could do it, but I couldn't do that and continue to work on D.

yeah. there is a reason for absense of 100500 hobbyst FOSS linkers. ;-) contrary to what it may look like, correct linking is really hard task. and mostly not fun to write too. people usually trying, and then just silently returning to binutils. ;-)
September 19, 2016
On 09/19/2016 01:16 AM, Walter Bright wrote:
> On 9/18/2016 8:20 AM, Andrei Alexandrescu wrote:
>> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>>> Simplest case is - source file is being changed, therefore a new object
>>> file is being produced, therefore a new executable is being produced.
>>
>> Forgot to mention a situation here: if you change the source code of a
>> module
>> without influencing the object file (e.g. documentation, certain style
>> changes,
>> unittests in non-unittest builds etc) there'd be no linking upon
>> rebuilding. --
>
>
> The compiler currently creates the complete object file in a buffer,
> then writes the buffer to a file in one command. The reason is mostly
> because the object file format isn't incremental, the beginning is
> written last and the body gets backpatched as the compilation progresses.

Great. In that case, if the target .o file already exists, it should be compared against the buffer. If identical, there should be no write and the timestamp of the .o file should stay the same.

I need to re-emphasize this kind of stuff is important for tooling. Many files get recompiled to identical object files - e.g. the many innocent bystanders in a dense dependency structure when one module changes. We also embed documentation in source files. Being disciplined about reflecting actual changes in the actual file operations is very helpful for tools that track file writes and/or timestamps.

> I can't really see a compilation producing an object file where the
> first half of it matches the previous object file and the second half is
> different, because of the file format.

Interesting. What happens e.g. if one makes a change to a function whose generated code is somewhere in the middle of the object file? If it doesn't alter the call graph, doesn't the new .o file share a common prefix with the old one?

> Interestingly, the win32 .lib format is designed for incredibly slow
> floppy disks, in that updating the library need not read/write every
> disk sector.
>
> I'd love to design our own high speed formats, but then they'd be
> incompatible with everybody else's.

This (and the subsequent considerations) is drifting off-topic. This is about getting a useful function off the ground, and sadly is degenerating into yet another off-topic debate leading to no progress.


Andrei
September 19, 2016
On 09/18/2016 10:05 PM, Brad Roberts via Digitalmars-d wrote:
> This is nice in the case of no changes, but problematic in the case of
> some changes.  The standard write new, rename technique never has either
> file in a half-right state.  The file is atomically either old or new
> and nothing in between.  This can be critical.

Good point, should be also part of the doco or a flag with update (e.g.
Yes.atomic). Alternative: the caller may wish to rename the file prior to the operation and then rename it back after the operation. -- Andrei
September 19, 2016
On Monday, 19 September 2016 at 14:04:03 UTC, Andrei Alexandrescu wrote:
>
> Interesting. What happens e.g. if one makes a change to a function whose generated code is somewhere in the middle of the object file? If it doesn't alter the call graph, doesn't the new .o file share a common prefix with the old one?

Only if the TOC is unchanged.
There are a lot of common sections in the same order but with different offsets.
we would need some binary patching method.

But I am unaware of file-systems supporting this.
Microsofts incremental linking mechnism makes use of thunks so it can avoid changing the header iirc.

But all of this needs codegen to adept.
September 19, 2016
On 9/19/2016 7:04 AM, Andrei Alexandrescu wrote:
> On 09/19/2016 01:16 AM, Walter Bright wrote:
>> The compiler currently creates the complete object file in a buffer,
>> then writes the buffer to a file in one command. The reason is mostly
>> because the object file format isn't incremental, the beginning is
>> written last and the body gets backpatched as the compilation progresses.
> Great. In that case, if the target .o file already exists, it should be compared
> against the buffer. If identical, there should be no write and the timestamp of
> the .o file should stay the same.

That's right. I was just referring to the idea of incrementally writing and comparing, which is a great idea for sequential file writing, likely won't work for the object file case. I think it is distinct enough to merit a separate library function. Note that we already have:

    http://dlang.org/phobos/std_file.html#.write

Adding another "writeIfDifferent()" function would be a good thing. The range based incremental one should go into std.stdio.

Any case where writing is much more costly than reading (such as SSD drives you mentioned, and the new Seagate "archival" drives), would make your technique a good one. It works even for memory; I've used it in code to reduce swapping, as in:

    if (*p != newvalue) *p = newvalue;

> I need to re-emphasize this kind of stuff is important for tooling. Many files
> get recompiled to identical object files - e.g. the many innocent bystanders in
> a dense dependency structure when one module changes. We also embed
> documentation in source files. Being disciplined about reflecting actual changes
> in the actual file operations is very helpful for tools that track file writes
> and/or timestamps.

That's right.


>> I can't really see a compilation producing an object file where the
>> first half of it matches the previous object file and the second half is
>> different, because of the file format.
>
> Interesting. What happens e.g. if one makes a change to a function whose
> generated code is somewhere in the middle of the object file? If it doesn't
> alter the call graph, doesn't the new .o file share a common prefix with the old
> one?

Two things:

1. The object file starts out with a header that contains file offsets to the various tables and sections. Changing the size of any of the pieces in the file changes the header, and will likely require moving pieces around to make room.

2. Writing an object file can mean "backpatching" what was written earlier, as a declaration one assumed was external turns out to be internal.