May 09, 2023
On Tuesday, 9 May 2023 at 00:24:44 UTC, Walter Bright wrote:
> This PR https://github.com/dlang/dmd/pull/15199 reduces its size by 8 bytes, resulting in about 20Mb of memory savings compiling druntime, according to @dkorpel.
>
> It is currently:
>
> struct Loc
> {
>   uint linnum; // line number, starting from 1
>   ushort charnum; // utf8 code unit index relative to start of line, starting from 1
>   ushort fileIndex; // index into filenames[], starting from 1 (0 means no filename)
> }
>
> which is 8 bytes.
>
> If it was 4 bytes, it could still store 4 billion unique locations, which ought to be enough for everyone. I toyed around with various encodings:
>
>  6 bits for column - 1..64
> 15 bits for line - 1..32768
> 11 bits for file - 2047
>
> I also thought of schemes like having the sign bit set an alternate encoding scheme.
>
> So, for great glory, can anyone come up with a clever scheme that uses only 32 bits?

Does it concern also C files handled by ImportC? If yes, be aware that there's no limit in C in line length and that the pre-processor can generate humongous lines.
In that case splitting line/column is a bad idea and file offset is just enough.
May 09, 2023

On Tuesday, 9 May 2023 at 09:35:41 UTC, Dennis wrote:

>

On Tuesday, 9 May 2023 at 00:24:44 UTC, Walter Bright wrote:

>

So, for great glory, can anyone come up with a clever scheme that uses only 32 bits?

I put mine in the PR already: https://github.com/dlang/dmd/pull/15199#issuecomment-1538120181

It's the same idea as Adam's. You really only need a file offset, which currently already exists, but only for DMD as a lib:

version (LocOffset)
    uint fileOffset; /// utf8 code unit index relative to start of file, starting from 0

Line number and column number can be computed when needed, and can be accelerated with a binary search in a list of line numbers sorted by file offset (because of #line directives lines won't be monotonically increasing and need to be stored in the list).

FYI, this is indeed what Clang does. 32-bit offset, with a "SourceManager" to help convert the ID back to file:line.
https://github.com/llvm/llvm-project/blob/ec77d1f3d9fcf7105b6bda25fb4d0e5ed5afd0c5/clang/include/clang/Basic/SourceLocation.h#L71-L86

-Johan

May 09, 2023
On Tuesday, 9 May 2023 at 00:24:44 UTC, Walter Bright wrote:
> This PR https://github.com/dlang/dmd/pull/15199 reduces its size by 8 bytes, resulting in about 20Mb of memory savings compiling druntime, according to @dkorpel.
>
> [...]

https://issues.dlang.org/show_bug.cgi?id=23902 while you're there, reserve some bits for fixing this.
May 09, 2023
On 5/9/2023 4:57 AM, Patrick Schluter wrote:
> Does it concern also C files handled by ImportC?

Yes

> If yes, be aware that there's no limit in C in line length and that the pre-processor can generate humongous lines.

D has no line limit, either. The saving grace is the column is not used for anything other than error messages, so it's not a catastrophy if it sticks at 64.

> In that case splitting line/column is a bad idea and file offset is just enough.

File offset is a performance problem and also requires keeping the source files in memory.
May 09, 2023
On 5/9/2023 2:35 AM, Dennis wrote:
> Line number and column number can be computed when needed, and can be accelerated with a binary search in a list of line numbers sorted by file offset (because of #line directives lines won't be monotonically increasing and need to be stored in the list).
> 
> To handle multiple files, add the sum of all previous file sizes to the file offset and have a mapping from the resulting 'global offset' to local file offset.

That should work, and avoids the need to keep the source file text around.

May 09, 2023
On 5/9/2023 1:35 AM, Basile B. wrote:
> Yes, call me a madlad but in styx the filename information is in the DMD equivalent of `Scope` (and the Loc is 2*32 bits, {line,col}). If you do the same in DMD you can reduce the size to 32 bits, i.e 2*16 bits, {line,col}.
> 
> I'm not sure this would work for DMD but for styx the rationale was
> 
> 1. in DMD the size of Loc is a known problem, that's why it's usually recommended to pass its instances by ref, let's not have the same problem.
> 2. there's much less Scopes than Loc.


Indeed, it would eliminate the pass-by-ref of Loc.
May 10, 2023
On Wednesday, 10 May 2023 at 03:01:34 UTC, Walter Bright wrote:
> On 5/9/2023 4:57 AM, Patrick Schluter wrote:
>> Does it concern also C files handled by ImportC?
>
> Yes
>
>> If yes, be aware that there's no limit in C in line length and that the pre-processor can generate humongous lines.
>
> D has no line limit, either. The saving grace is the column is not used for anything other than error messages, so it's not a catastrophy if it sticks at 64.

Then why save the column number at all? At limit 64, it will be wrong very often.

>> In that case splitting line/column is a bad idea and file offset is just enough.
>
> File offset is a performance problem and also requires keeping the source files in memory.

It is not a problem for Clang.

-Johan

May 10, 2023
On Wednesday, 10 May 2023 at 03:16:19 UTC, Walter Bright wrote:
> On 5/9/2023 1:35 AM, Basile B. wrote:
>> Yes, call me a madlad but in styx the filename information is in the DMD equivalent of `Scope` (and the Loc is 2*32 bits, {line,col}). If you do the same in DMD you can reduce the size to 32 bits, i.e 2*16 bits, {line,col}.
>> 
>> I'm not sure this would work for DMD but for styx the rationale was
>> 
>> 1. in DMD the size of Loc is a known problem, that's why it's usually recommended to pass its instances by ref, let's not have the same problem.
>> 2. there's much less Scopes than Loc.
>
>
> Indeed, it would eliminate the pass-by-ref of Loc.

how often do you need the _exact_ column? or to rephrase it, wouldn't column divided by 2 or 3 be good enough to figure out where an error happened?
May 11, 2023
And that's how you break already poor performing IDE support even further.

We don't need to save double digit megabytes when we are using gigabytes of memory, if we have to use extreme measures which result in poorer usability.

Instead it would be better to figure out how to minimize the number we store; if we can't its just the cost of compiling D.
May 10, 2023
On Wednesday, 10 May 2023 at 06:09:25 UTC, Johan wrote:

>>> In that case splitting line/column is a bad idea and file offset is just enough.
>>
>> File offset is a performance problem and also requires keeping the source files in memory.
>
> It is not a problem for Clang.
>
> -Johan

I also don't see why its a perf issue?