Jump to page: 1 2 3
Thread overview
[WORK] std.file.update function
Sep 18, 2016
rikki cattermole
Sep 18, 2016
rikki cattermole
Sep 19, 2016
Chris Wright
Oct 18, 2016
R
Oct 18, 2016
Patrick Schluter
Sep 19, 2016
Walter Bright
Sep 19, 2016
Jacob Carlborg
Sep 19, 2016
Stefan Koch
Sep 19, 2016
Walter Bright
Sep 19, 2016
ketmar
Sep 19, 2016
Stefan Koch
Sep 19, 2016
Walter Bright
Sep 18, 2016
Stefan Koch
Sep 18, 2016
Chris Wright
Sep 19, 2016
Brad Roberts
Sep 19, 2016
Walter Bright
Sep 20, 2016
Walter Bright
September 18, 2016
There are quite a few situations in rdmd and dmd generally when we compute a dependency structure over sets of files. Based on that, we write new files that overwrite old, obsoleted files. Those changes in turn trigger other dependencies to go stale so more building is done etc.

Simplest case is - source file is being changed, therefore a new object file is being produced, therefore a new executable is being produced. And it only gets more involved.

We've discussed before using a simple method to avoid unnecessary stale dependencies when it's possible that a certain file won't, in fact, change contents:

1. Do all work on the side in a separate file e.g. file.ext.tmp

2. Compare the new file with the old file file.ext

3. If they're identical, delete file.ext.tmp; otherwise, rename file.ext.tmp into file.ext

There is actually an even better way at the application level. Consider a function in std.file:

updateS, Range)(S name, Range data);

updateFile does something interesting: it opens the file "name" for reading AND writing, then reads data from the Range _and_ the file. For as long as the data and the contents in the file agree, it just moves reading along. At the first difference between the data and the file contents, starts writing the data into the file through the end of the range.

So this makes zero writes (and leaves the "last modified time" intact) if the file has the same content as the data. Better yet, if it so happens that the file and the data have the same prefix, there's less writing going on, which IIRC is faster for most filesystems. Saving on writes happens to be particularly nice on new solid-state drives.

Who wants to take this with testing, measurements etc? It's a cool mini project.


Andrei
September 18, 2016
On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
> Simplest case is - source file is being changed, therefore a new object
> file is being produced, therefore a new executable is being produced.

Forgot to mention a situation here: if you change the source code of a module without influencing the object file (e.g. documentation, certain style changes, unittests in non-unittest builds etc) there'd be no linking upon rebuilding. -- Andrei

September 19, 2016
On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:
> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>> Simplest case is - source file is being changed, therefore a new object
>> file is being produced, therefore a new executable is being produced.
>
> Forgot to mention a situation here: if you change the source code of a
> module without influencing the object file (e.g. documentation, certain
> style changes, unittests in non-unittest builds etc) there'd be no
> linking upon rebuilding. -- Andrei

How does this compare against doing a checksum comparison on the file?

September 18, 2016
On Sunday, 18 September 2016 at 15:17:31 UTC, Andrei Alexandrescu wrote:
> There are quite a few situations in rdmd and dmd generally when we compute a dependency structure over sets of files. Based on that, we write new files that overwrite old, obsoleted files. Those changes in turn trigger other dependencies to go stale so more building is done etc.

If so we need it in druntime.

Introducing phobos into ddmd is still considered a nono.

Personally I am pretty torn, without range-specific optimizations in dmd they do incur more overhead then they should.
September 18, 2016
On 9/18/16 11:24 AM, rikki cattermole wrote:
> On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:
>> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>>> Simplest case is - source file is being changed, therefore a new object
>>> file is being produced, therefore a new executable is being produced.
>>
>> Forgot to mention a situation here: if you change the source code of a
>> module without influencing the object file (e.g. documentation, certain
>> style changes, unittests in non-unittest builds etc) there'd be no
>> linking upon rebuilding. -- Andrei
>
> How does this compare against doing a checksum comparison on the file?

Favorably :o). -- Andrei

September 18, 2016
This will produce different behavior with hard links. With hard links, the temporary file mechanism you mention will result in the old file being accessible via the other path. With your recommended strategy, the data accessible from both paths is updated.

That's probably acceptable, and hard links aren't used that much anyway.

Obviously, if you have to overwrite large portions of the file, it's going to be faster to just write it. This is just for cases when you can get speedups down the line by not updating write timestamps, or when you know a large portion of the file is unchanged and the file is cached, or you're using a disk that sucks at writing data.
September 18, 2016
On 9/18/16 12:15 PM, Chris Wright wrote:
> This will produce different behavior with hard links. With hard links,
> the temporary file mechanism you mention will result in the old file
> being accessible via the other path. With your recommended strategy, the
> data accessible from both paths is updated.
>
> That's probably acceptable, and hard links aren't used that much anyway.

Awesome, this should be part of the docs.

> Obviously, if you have to overwrite large portions of the file, it's
> going to be faster to just write it. This is just for cases when you can
> get speedups down the line by not updating write timestamps, or when you
> know a large portion of the file is unchanged and the file is cached, or
> you're using a disk that sucks at writing data.

That's exactly right, and such considerations should also go in the function documentation. Wanna go for it?


Andrei

September 19, 2016
On 19/09/2016 3:41 AM, Andrei Alexandrescu wrote:
> On 9/18/16 11:24 AM, rikki cattermole wrote:
>> On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:
>>> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>>>> Simplest case is - source file is being changed, therefore a new object
>>>> file is being produced, therefore a new executable is being produced.
>>>
>>> Forgot to mention a situation here: if you change the source code of a
>>> module without influencing the object file (e.g. documentation, certain
>>> style changes, unittests in non-unittest builds etc) there'd be no
>>> linking upon rebuilding. -- Andrei
>>
>> How does this compare against doing a checksum comparison on the file?
>
> Favorably :o). -- Andrei

Confirmed in doing the checksum myself.
However I have not compared against OS provided checksum.

September 18, 2016
On 9/18/2016 8:17 AM, Andrei Alexandrescu via Digitalmars-d wrote:
> There is actually an even better way at the application level. Consider
> a function in std.file:
>
> updateS, Range)(S name, Range data);
>
> updateFile does something interesting: it opens the file "name" for
> reading AND writing, then reads data from the Range _and_ the file. For
> as long as the data and the contents in the file agree, it just moves
> reading along. At the first difference between the data and the file
> contents, starts writing the data into the file through the end of the
> range.
>
> So this makes zero writes (and leaves the "last modified time" intact)
> if the file has the same content as the data. Better yet, if it so
> happens that the file and the data have the same prefix, there's less
> writing going on, which IIRC is faster for most filesystems. Saving on
> writes happens to be particularly nice on new solid-state drives.
>
> Who wants to take this with testing, measurements etc? It's a cool mini
> project.
>
>
> Andrei

This is nice in the case of no changes, but problematic in the case of some changes.  The standard write new, rename technique never has either file in a half-right state.  The file is atomically either old or new and nothing in between.  This can be critical.
September 19, 2016
On Mon, 19 Sep 2016 04:24:41 +1200, rikki cattermole wrote:

> On 19/09/2016 3:41 AM, Andrei Alexandrescu wrote:
>> On 9/18/16 11:24 AM, rikki cattermole wrote:
>>> On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:
>>>> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>>>>> Simplest case is - source file is being changed, therefore a new object file is being produced, therefore a new executable is being produced.
>>>>
>>>> Forgot to mention a situation here: if you change the source code of a module without influencing the object file (e.g. documentation, certain style changes, unittests in non-unittest builds etc) there'd be no linking upon rebuilding. -- Andrei
>>>
>>> How does this compare against doing a checksum comparison on the file?
>>
>> Favorably :o). -- Andrei
> 
> Confirmed in doing the checksum myself.
> However I have not compared against OS provided checksum.

You have an operating system that automatically checksums every file?
« First   ‹ Prev
1 2 3