Thread overview | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
September 18, 2016 [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
There are quite a few situations in rdmd and dmd generally when we compute a dependency structure over sets of files. Based on that, we write new files that overwrite old, obsoleted files. Those changes in turn trigger other dependencies to go stale so more building is done etc. Simplest case is - source file is being changed, therefore a new object file is being produced, therefore a new executable is being produced. And it only gets more involved. We've discussed before using a simple method to avoid unnecessary stale dependencies when it's possible that a certain file won't, in fact, change contents: 1. Do all work on the side in a separate file e.g. file.ext.tmp 2. Compare the new file with the old file file.ext 3. If they're identical, delete file.ext.tmp; otherwise, rename file.ext.tmp into file.ext There is actually an even better way at the application level. Consider a function in std.file: updateS, Range)(S name, Range data); updateFile does something interesting: it opens the file "name" for reading AND writing, then reads data from the Range _and_ the file. For as long as the data and the contents in the file agree, it just moves reading along. At the first difference between the data and the file contents, starts writing the data into the file through the end of the range. So this makes zero writes (and leaves the "last modified time" intact) if the file has the same content as the data. Better yet, if it so happens that the file and the data have the same prefix, there's less writing going on, which IIRC is faster for most filesystems. Saving on writes happens to be particularly nice on new solid-state drives. Who wants to take this with testing, measurements etc? It's a cool mini project. Andrei |
September 18, 2016 Re: [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
> Simplest case is - source file is being changed, therefore a new object
> file is being produced, therefore a new executable is being produced.
Forgot to mention a situation here: if you change the source code of a module without influencing the object file (e.g. documentation, certain style changes, unittests in non-unittest builds etc) there'd be no linking upon rebuilding. -- Andrei
|
September 19, 2016 Re: [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:
> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>> Simplest case is - source file is being changed, therefore a new object
>> file is being produced, therefore a new executable is being produced.
>
> Forgot to mention a situation here: if you change the source code of a
> module without influencing the object file (e.g. documentation, certain
> style changes, unittests in non-unittest builds etc) there'd be no
> linking upon rebuilding. -- Andrei
How does this compare against doing a checksum comparison on the file?
|
September 18, 2016 Re: [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 18 September 2016 at 15:17:31 UTC, Andrei Alexandrescu wrote:
> There are quite a few situations in rdmd and dmd generally when we compute a dependency structure over sets of files. Based on that, we write new files that overwrite old, obsoleted files. Those changes in turn trigger other dependencies to go stale so more building is done etc.
If so we need it in druntime.
Introducing phobos into ddmd is still considered a nono.
Personally I am pretty torn, without range-specific optimizations in dmd they do incur more overhead then they should.
|
September 18, 2016 Re: [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
Posted in reply to rikki cattermole | On 9/18/16 11:24 AM, rikki cattermole wrote:
> On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:
>> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>>> Simplest case is - source file is being changed, therefore a new object
>>> file is being produced, therefore a new executable is being produced.
>>
>> Forgot to mention a situation here: if you change the source code of a
>> module without influencing the object file (e.g. documentation, certain
>> style changes, unittests in non-unittest builds etc) there'd be no
>> linking upon rebuilding. -- Andrei
>
> How does this compare against doing a checksum comparison on the file?
Favorably :o). -- Andrei
|
September 18, 2016 Re: [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | This will produce different behavior with hard links. With hard links, the temporary file mechanism you mention will result in the old file being accessible via the other path. With your recommended strategy, the data accessible from both paths is updated. That's probably acceptable, and hard links aren't used that much anyway. Obviously, if you have to overwrite large portions of the file, it's going to be faster to just write it. This is just for cases when you can get speedups down the line by not updating write timestamps, or when you know a large portion of the file is unchanged and the file is cached, or you're using a disk that sucks at writing data. |
September 18, 2016 Re: [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris Wright | On 9/18/16 12:15 PM, Chris Wright wrote: > This will produce different behavior with hard links. With hard links, > the temporary file mechanism you mention will result in the old file > being accessible via the other path. With your recommended strategy, the > data accessible from both paths is updated. > > That's probably acceptable, and hard links aren't used that much anyway. Awesome, this should be part of the docs. > Obviously, if you have to overwrite large portions of the file, it's > going to be faster to just write it. This is just for cases when you can > get speedups down the line by not updating write timestamps, or when you > know a large portion of the file is unchanged and the file is cached, or > you're using a disk that sucks at writing data. That's exactly right, and such considerations should also go in the function documentation. Wanna go for it? Andrei |
September 19, 2016 Re: [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 19/09/2016 3:41 AM, Andrei Alexandrescu wrote:
> On 9/18/16 11:24 AM, rikki cattermole wrote:
>> On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:
>>> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>>>> Simplest case is - source file is being changed, therefore a new object
>>>> file is being produced, therefore a new executable is being produced.
>>>
>>> Forgot to mention a situation here: if you change the source code of a
>>> module without influencing the object file (e.g. documentation, certain
>>> style changes, unittests in non-unittest builds etc) there'd be no
>>> linking upon rebuilding. -- Andrei
>>
>> How does this compare against doing a checksum comparison on the file?
>
> Favorably :o). -- Andrei
Confirmed in doing the checksum myself.
However I have not compared against OS provided checksum.
|
September 18, 2016 Re: [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 9/18/2016 8:17 AM, Andrei Alexandrescu via Digitalmars-d wrote:
> There is actually an even better way at the application level. Consider
> a function in std.file:
>
> updateS, Range)(S name, Range data);
>
> updateFile does something interesting: it opens the file "name" for
> reading AND writing, then reads data from the Range _and_ the file. For
> as long as the data and the contents in the file agree, it just moves
> reading along. At the first difference between the data and the file
> contents, starts writing the data into the file through the end of the
> range.
>
> So this makes zero writes (and leaves the "last modified time" intact)
> if the file has the same content as the data. Better yet, if it so
> happens that the file and the data have the same prefix, there's less
> writing going on, which IIRC is faster for most filesystems. Saving on
> writes happens to be particularly nice on new solid-state drives.
>
> Who wants to take this with testing, measurements etc? It's a cool mini
> project.
>
>
> Andrei
This is nice in the case of no changes, but problematic in the case of some changes. The standard write new, rename technique never has either file in a half-right state. The file is atomically either old or new and nothing in between. This can be critical.
|
September 19, 2016 Re: [WORK] std.file.update function | ||||
---|---|---|---|---|
| ||||
Posted in reply to rikki cattermole | On Mon, 19 Sep 2016 04:24:41 +1200, rikki cattermole wrote:
> On 19/09/2016 3:41 AM, Andrei Alexandrescu wrote:
>> On 9/18/16 11:24 AM, rikki cattermole wrote:
>>> On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:
>>>> On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
>>>>> Simplest case is - source file is being changed, therefore a new object file is being produced, therefore a new executable is being produced.
>>>>
>>>> Forgot to mention a situation here: if you change the source code of a module without influencing the object file (e.g. documentation, certain style changes, unittests in non-unittest builds etc) there'd be no linking upon rebuilding. -- Andrei
>>>
>>> How does this compare against doing a checksum comparison on the file?
>>
>> Favorably :o). -- Andrei
>
> Confirmed in doing the checksum myself.
> However I have not compared against OS provided checksum.
You have an operating system that automatically checksums every file?
|
Copyright © 1999-2021 by the D Language Foundation