May 10, 2023
On Wednesday, 10 May 2023 at 16:09:53 UTC, Commander Zot wrote:
> how often do you need the _exact_ column? or to rephrase it, wouldn't column divided by 2 or 3 be good enough to figure out where an error happened?

The column is needed in every error message when the option `-vcolumns` is activated. It really has to be the right number.


May 10, 2023
On Wednesday, 10 May 2023 at 17:16:23 UTC, max haughton wrote:
> I also don't see why its a perf issue?

I can imagine something like compiling the entire linux kernel at once being a big ask, but it doesn't require any physical memory - the array can be offsets into discardable pages of memory mapped files (which would probably perform quite well in general).
May 10, 2023
On 5/10/2023 10:16 AM, max haughton wrote:
> I also don't see why its a perf issue?

Every time the line number is needed, the file source has to be scanned from the start of the file source. Line numbers are needed not just for error messages, but for symbolic debug info. Generally debug compiles should be fast.

Scanning the source files also faults them into memory.

If one does a table of pairs of fileoffsets/linenumbers, that will speed things up, but at a cost of 8 bytes per code line. Also the binary search needed for referencing it, which will cause the array to be faulted into memory.
May 11, 2023

On Wednesday, 10 May 2023 at 20:21:18 UTC, Walter Bright wrote:

>

On 5/10/2023 10:16 AM, max haughton wrote:

>

I also don't see why its a perf issue?

Every time the line number is needed, the file source has to be scanned from the start of the file source. Line numbers are needed not just for error messages, but for symbolic debug info. Generally debug compiles should be fast.

Approximately once or never per symbol in a pattern that is probably mostly linear (i.e. why do it from scratch all time?).

We live in a world where we can parse gigabytes of JSON per second (https://github.com/simdjson/simdjson), it can be made fast. We can fit most full western names in a SIMD register now, even.

>

Scanning the source files also faults them into memory.

As does parsing them. If aren't going to use it except for a burst at the beginning and the end you can tell the kernel as much very easily and it can use the physical side of the memory map for something else.

Building some of the code we have at Symmetry literally takes up all the memory on my machine, its not the files.

Overarching point here is that it needs to be measured properly.

I would also like to point out that making Loc smaller is basically tittle-tattle at the scale of dmd's memory allocation at the moment. It only feels good in the numbers Dennis posted because (in relative terms) druntime doesn't stress the compiler that much, and (in absolute terms) dmd is still allocating way too many objects overall.

If you want another free saving, do less Arrays as pointer to struct (with a pointer in it) and make it smaller. Last time I measured it the modal length was either 0 or 1 depending on how I measured it so a lot of them are just wasted memory even if they never actually allocate anything themselves. Anecdotally, most memory allocated with the bump the pointer scheme is never written to (or more precisely is always 0)

Longer term:

Make dmd more GC friendly: Currently, activating -lowmem often makes no difference because there's usually a reference to everything somewhere. I'm not sure how you'd find easy things to change, though.

I suppose the academic solution would be to find a way to export the GC's graph and stare at it, the engineering solution might be to print a report of which objects were still alive when the program exits and then stare at that instead.

May 11, 2023
On Wednesday, 10 May 2023 at 17:57:17 UTC, Basile B. wrote:
> On Wednesday, 10 May 2023 at 16:09:53 UTC, Commander Zot wrote:
>> how often do you need the _exact_ column? or to rephrase it, wouldn't column divided by 2 or 3 be good enough to figure out where an error happened?
>
> The column is needed in every error message when the option `-vcolumns` is activated. It really has to be the right number.
I know that it does that, but is there a reason it has to be the exact column other than "we specified it to be that way". for usecase is there for it? because if it's just to print the line where the error happened, it really doesnt matter if you start/end one or two chars earlier.

but in case it is needed, then limiting it to 64 is also not an option.

May 11, 2023
On Thursday, 11 May 2023 at 09:31:16 UTC, Commander Zot wrote:
> On Wednesday, 10 May 2023 at 17:57:17 UTC, Basile B. wrote:
>> On Wednesday, 10 May 2023 at 16:09:53 UTC, Commander Zot wrote:
>>> how often do you need the _exact_ column? or to rephrase it, wouldn't column divided by 2 or 3 be good enough to figure out where an error happened?
>>
>> The column is needed in every error message when the option `-vcolumns` is activated. It really has to be the right number.
> I know that it does that, but is there a reason it has to be the exact column other than "we specified it to be that way".

The column is useful for IDEs, for example after a failed compilation it allows to accurately syntax highlight with a wavy red underline the right identifier in a chain that contains one error, instead of the whole chain.

Otherwise and presumably, formatters can use the information too.

> for usecase is there for it? because if it's just to print the line where the error happened, it really doesnt matter if you start/end one or two chars earlier.
>
> but in case it is needed, then limiting it to 64 is also not an option.


May 11, 2023
On Tuesday, 9 May 2023 at 00:32:33 UTC, Adam D Ruppe wrote:
> On Tuesday, 9 May 2023 at 00:24:44 UTC, Walter Bright wrote:
>>  6 bits for column - 1..64
>> 15 bits for line - 1..32768
>> 11 bits for file - 2047
>>
>> So, for great glory, can anyone come up with a clever scheme that uses only 32 bits?
>
> I wouldn't separate out column/line/file at all. Concatenate all the files together in memory and store only an offset into that gigantic array. If an error happens, then and only then go back to extract the details by slicing the filename out of a listing and rescanning it to determine line and column. (You'd probably have an index that does a bit of both, like every new file or every 5000 lines, add an entry to the lookup table. Then when you do hit an error, you just need to scan from the closest point forward instead of the whole thing.)
>
> If there's no errors, it uses little memory and is fast. If there is an error, the rescanning time is not significant anyway relative to the time to fix the error.

What if someone already implemented this. Wait...

https://github.com/snazzy-d/sdc/blob/master/src/source/manager.d
https://github.com/snazzy-d/sdc/blob/master/src/source/location.d

May 11, 2023
On Wednesday, 10 May 2023 at 03:01:34 UTC, Walter Bright wrote:
> File offset is a performance problem and also requires keeping the source files in memory.

Just do it once, and you can even SWAR it.

https://github.com/snazzy-d/sdc/blob/master/src/source/lexwhitespace.d#L115-L137
May 11, 2023
On Thursday, 11 May 2023 at 23:17:32 UTC, deadalnix wrote:
> On Wednesday, 10 May 2023 at 03:01:34 UTC, Walter Bright wrote:
>> File offset is a performance problem and also requires keeping the source files in memory.
>
> Just do it once, and you can even SWAR it.
>
> https://github.com/snazzy-d/sdc/blob/master/src/source/lexwhitespace.d#L115-L137

The SWAR: https://github.com/snazzy-d/sdc/blob/master/src/source/swar/newline.d
June 05, 2023

On Tuesday, 9 May 2023 at 00:24:44 UTC, Walter Bright wrote:

>

This PR https://github.com/dlang/dmd/pull/15199 reduces its size by 8 bytes, resulting in about 20Mb of memory savings compiling druntime, according to @dkorpel.

6 bits for column - 1..64

I'm just sitting here blinking my eyes over and over at the thought of considering (essentially) losing the ability to track column numbers for errors in order to potentially save 20MB of memory usage in a process that is currently using... *presses compile and glances at second monitor* 2GB.

In other news, the City Council recently decided that doing away with crosswalks would be the best solution to dealing with the robot-cars-barreling-over-people epidemic. After careful consideration, it was determined that the benefit of meeting human needs was trivial compared to the 1% improvement in machine efficiency.

1 2 3
Next ›   Last »