struct Location size

struct Location size
May 09, 2023 Walter Bright
May 09, 2023 Adam D Ruppe
May 09, 2023 Walter Bright
May 09, 2023 Walter Bright
May 09, 2023 zjh
May 11, 2023 deadalnix
May 09, 2023 zjh
May 09, 2023 zjh
May 09, 2023 Walter Bright
May 09, 2023 Basile B.
May 10, 2023 Walter Bright
May 10, 2023 Commander Zot
May 10, 2023 Richard (Rikki) Andrew Cattermole
May 10, 2023 Basile B.
May 11, 2023 Commander Zot
May 11, 2023 Basile B.
May 09, 2023 Dennis
May 09, 2023 Johan
May 10, 2023 Walter Bright
May 09, 2023 Patrick Schluter
May 10, 2023 Walter Bright
May 10, 2023 Johan
May 10, 2023 max haughton
May 10, 2023 Adam D Ruppe
May 10, 2023 Walter Bright
May 11, 2023 max haughton
May 11, 2023 deadalnix
May 11, 2023 deadalnix
May 09, 2023 max haughton
Jun 05, 2023 cc

May 08, 2023

Posted by Walter Bright

Permalink

Walter Bright

Permalink

This PR https://github.com/dlang/dmd/pull/15199 reduces its size by 8 bytes, resulting in about 20Mb of memory savings compiling druntime, according to @dkorpel.

It is currently:

struct Loc
{
  uint linnum; // line number, starting from 1
  ushort charnum; // utf8 code unit index relative to start of line, starting from 1
  ushort fileIndex; // index into filenames[], starting from 1 (0 means no filename)
}

which is 8 bytes.

If it was 4 bytes, it could still store 4 billion unique locations, which ought to be enough for everyone. I toyed around with various encodings:

 6 bits for column - 1..64
15 bits for line - 1..32768
11 bits for file - 2047

I also thought of schemes like having the sign bit set an alternate encoding scheme.

So, for great glory, can anyone come up with a clever scheme that uses only 32 bits?

May 09, 2023

Re: struct Location size

Posted by Adam D Ruppe
in reply to Walter Bright

Permalink

Adam D Ruppe

Posted in reply to Walter Bright

Permalink

On Tuesday, 9 May 2023 at 00:24:44 UTC, Walter Bright wrote:
>  6 bits for column - 1..64
> 15 bits for line - 1..32768
> 11 bits for file - 2047
>
> So, for great glory, can anyone come up with a clever scheme that uses only 32 bits?

I wouldn't separate out column/line/file at all. Concatenate all the files together in memory and store only an offset into that gigantic array. If an error happens, then and only then go back to extract the details by slicing the filename out of a listing and rescanning it to determine line and column. (You'd probably have an index that does a bit of both, like every new file or every 5000 lines, add an entry to the lookup table. Then when you do hit an error, you just need to scan from the closest point forward instead of the whole thing.)

If there's no errors, it uses little memory and is fast. If there is an error, the rescanning time is not significant anyway relative to the time to fix the error.

May 09, 2023

Re: struct Location size

Posted by zjh
in reply to Walter Bright

Permalink

zjh

Posted in reply to Walter Bright

Permalink

On Tuesday, 9 May 2023 at 00:24:44 UTC, Walter Bright wrote:

6 bits for column - 1..64
15 bits for line - 1..32768
11 bits for file - 2047

8 bits for columns - 1..256
15 bits for lines - 1..32768
9 bits for files - 512

64 for column are generally not enough. After the above changes, the general projects are sufficient.
But for large projects, there should be a switch, which needs to be expanded to '8' bytes.

May 09, 2023

Re: struct Location size

Posted by zjh
in reply to zjh

Permalink

zjh

Posted in reply to zjh

Permalink

On Tuesday, 9 May 2023 at 01:41:31 UTC, zjh wrote:

But for large projects, there should be a switch, which needs to be expanded to '8' bytes.

Of course, if it is possible to require large projects to be compiled as sub projects, then it may generally be sufficient to use projects that meet the above conditions as unit projects

A large project must be composed of multiple unit projects and then compiled separately.But then, I don't know how the total compilation will be.

May 08, 2023

Re: struct Location size

Posted by Walter Bright
in reply to Adam D Ruppe

Permalink

Walter Bright

Posted in reply to Adam D Ruppe

Permalink

On 5/8/2023 5:32 PM, Adam D Ruppe wrote:
> I wouldn't separate out column/line/file at all. Concatenate all the files together in memory and store only an offset into that gigantic array. If an error happens, then and only then go back to extract the details by slicing the filename out of a listing and rescanning it to determine line and column. (You'd probably have an index that does a bit of both, like every new file or every 5000 lines, add an entry to the lookup table. Then when you do hit an error, you just need to scan from the closest point forward instead of the whole thing.)
> 
> If there's no errors, it uses little memory and is fast. If there is an error, the rescanning time is not significant anyway relative to the time to fix the error.

That is indeed a clever idea, thank you.

May 08, 2023

Re: struct Location size

Posted by Walter Bright
in reply to zjh

Permalink

Walter Bright

Posted in reply to zjh

Permalink

On 5/8/2023 6:41 PM, zjh wrote:
> 8 bits for columns - 1..256
> 15 bits for lines - 1..32768
> 9 bits for files - 512
> 
> 64 for column are generally not enough. After the above changes, the general projects are sufficient.
> But for large projects, there should be a switch, which needs to be expanded to '8' bytes.


I don't think 512 files is enough. As for columns, those aren't very important, and it can just "top out" at 64. Not perfect.

May 08, 2023

Re: struct Location size

Posted by Walter Bright
in reply to Adam D Ruppe

Permalink

Walter Bright

Posted in reply to Adam D Ruppe

Permalink

On 5/8/2023 5:32 PM, Adam D Ruppe wrote:
> I wouldn't separate out column/line/file at all. Concatenate all the files together in memory and store only an offset into that gigantic array. If an error happens, then and only then go back to extract the details by slicing the filename out of a listing and rescanning it to determine line and column. (You'd probably have an index that does a bit of both, like every new file or every 5000 lines, add an entry to the lookup table. Then when you do hit an error, you just need to scan from the closest point forward instead of the whole thing.)
> 
> If there's no errors, it uses little memory and is fast. If there is an error, the rescanning time is not significant anyway relative to the time to fix the error.

Thinking about it a bit more, it isn't necessary to concatenate the files at all. Just, for each file, record the starting line number, which would be the sum of all the lines in the files previously parsed.

Then, the line number can be converted to file/line by looking up the range of line numbers it falls in, which gives the file, and subtracting the corresponding starting line number, which gives the line.

So if we leave 8 bits for column, 1..256, that gives 24 for the line number, which gives room for 16,777,216 lines which ought to be enough, and no rescanning necessary.

May 09, 2023

Re: struct Location size

Posted by zjh
in reply to Walter Bright

Permalink

zjh

Posted in reply to Walter Bright

Permalink

On Tuesday, 9 May 2023 at 03:51:21 UTC, Walter Bright wrote:

So if we leave 8 bits for column, 1..256, that gives 24 for the line number, which gives room for 16,777,216 lines which ought to be enough, and no rescanning necessary.

It can represent projects less than 1000,777,216 bytes, but projects exceeding 1000MB are not sufficient. However, most of them are sufficient.

May 09, 2023

Re: struct Location size

Posted by Basile B.
in reply to Walter Bright

Permalink

Basile B.

Posted in reply to Walter Bright

Permalink

On Tuesday, 9 May 2023 at 00:24:44 UTC, Walter Bright wrote:
> This PR https://github.com/dlang/dmd/pull/15199 reduces its size by 8 bytes, resulting in about 20Mb of memory savings compiling druntime, according to @dkorpel.
>
> [...]
>
> So, for great glory, can anyone come up with a clever scheme that uses only 32 bits?

Yes, call me a madlad but in styx the filename information is in the DMD equivalent of `Scope` (and the Loc is 2*32 bits, {line,col}). If you do the same in DMD you can reduce the size to 32 bits, i.e 2*16 bits, {line,col}.

I'm not sure this would work for DMD but for styx the rationale was

1. in DMD the size of Loc is a known problem, that's why it's usually recommended to pass its instances by ref, let's not have the same problem.
2. there's much less Scopes than Loc.

May 09, 2023

Re: struct Location size

Posted by Dennis
in reply to Walter Bright

Permalink

Dennis

Posted in reply to Walter Bright

Permalink

On Tuesday, 9 May 2023 at 00:24:44 UTC, Walter Bright wrote:

So, for great glory, can anyone come up with a clever scheme that uses only 32 bits?

I put mine in the PR already: https://github.com/dlang/dmd/pull/15199#issuecomment-1538120181

It's the same idea as Adam's. You really only need a file offset, which currently already exists, but only for DMD as a lib:

version (LocOffset)
    uint fileOffset; /// utf8 code unit index relative to start of file, starting from 0

Line number and column number can be computed when needed, and can be accelerated with a binary search in a list of line numbers sorted by file offset (because of #line directives lines won't be monotonically increasing and need to be stored in the list).

To handle multiple files, add the sum of all previous file sizes to the file offset and have a mapping from the resulting 'global offset' to local file offset.

The cool thing is that DMDLIB can still access fileOffset, while keeping the small 4 byte Loc!

Top | Forum index | About this forum

Forums