Thread overview
compile time compression for associatve array literal
Aug 23, 2021
Brian Tiffin
Aug 23, 2021
ag0aep6g
Aug 23, 2021
Brian Tiffin
Aug 23, 2021
jfondren
Aug 23, 2021
Brian Tiffin
August 23, 2021

From a little reading, it seems associative array literal initialization is still pending for global scope, but allowed in a module constructor? If I understood the skimming surface reading so far.

immutable string[string] things;
static (this) {
   things = ["key1": "value 1", "key2": "value 2"];
}

Is there a magic incantation that could convert the values to a std.zlib.compressed ubyte array, at compile time? So the object code gets keys:compvals instead of the full string value? Can't use auto for the AA, I don't think, as the initial data is separated from the declaration. I'm not sure about

a) if code in a module constructor is even a candidate for CTFE?
b) what a cast might look like to get a q"DELIM ... DELIM" delimited string for use as input to std.zlib.compress?
and I guess c), is the value field of an associative array a candidate for CTFE?

The plan is embedding a bunch of source code fragments, and a couple of them will be complete programs of a few pages. Wondering if it could be compressed at compile time, during an associative array init?

Cheers

August 23, 2021
On 23.08.21 08:14, Brian Tiffin wrote:
>  From ~~a~~ little reading, it seems associative array literal initialization is still pending for global scope, but allowed in a module constructor?  *If I understood the skimming surface reading so far*.
> 
> ```d
> immutable string[string] things;
> static (this) {
>     things = ["key1": "value 1", "key2": "value 2"];
> }
> ```

(Typo: It's `static this()`.)

> Is there a magic incantation that could convert the values to a `std.zlib.compress`ed ubyte array, at compile time?  So the object code gets keys:compvals instead of the full string value?

There's a big roadblock: std.zlib.compress cannot go through CTFE, because the source code of zlib isn't available to the compiler; it's not even D code.

Maybe there's a CTFE-able compression library on dub. If not, you can write your own function and run that through CTFE. Example with simple run-length encoding:

----
uint[] my_compress(string s)
{
    import std.algorithm: group;
    import std.string: representation;
    uint[] compressed;
    foreach (c_n; group(s.representation))
    {
        compressed ~= [c_n[0], c_n[1]];
    }
    return compressed;
}

string my_uncompress(const(uint)[] compressed)
{
    import std.conv: to;
    string uncompressed = "";
    for (; compressed.length >= 2; compressed = compressed[2 .. $])
    {
        foreach (i; 0 .. compressed[1])
        {
            uncompressed ~= compressed[0].to!char;
        }
    }
    return uncompressed;
}

import std.array: replicate;

/* CTFE compression: */
enum compressed = my_compress("f" ~ "o".replicate(100_000) ~ "bar");

immutable string[string] things;
shared static this()
{
    /* Runtime decompression: */
    things = ["key1": my_uncompress(compressed)];
}
----

If you compile that, the object file should be far smaller than 100,000 bytes, thanks to the compression.

[...]
> I'm not sure about
> 
> a) if code in a module constructor is even a candidate for CTFE?

The word "candidate" might indicate a common misunderstanding of CTFE. CTFE doesn't look for candidates. It's not an optimization. The language dictates which values go through CTFE.

In a way, static constructors are the opposite of CTFE. Initializers in module scope do go through CTFE. When you have code that you cannot (or don't want to) put through CTFE, you put it in a static constructor.

You can still trigger CTFE within a static constructor by other means (e.g., `enum`), but the static constructor itself is just another function as far as CTFE is concerned.

> b) what a cast might look like to get a `q"DELIM ... DELIM"` delimited string for use as input to std.zlib.compress?

A cast to get a string literal? That doesn't make sense.

You might be looking for `import("some_file")`. That gives you the contents of a file as a string. You can then run that string through your compression function in CTFE, put the resulting compressed data into the object file, and decompress it at runtime (like the example above does).
August 23, 2021
On Monday, 23 August 2021 at 11:53:46 UTC, ag0aep6g wrote:
> On 23.08.21 08:14, Brian Tiffin wrote:
>>  From ~~a~~ little reading, it seems associative array literal initialization is still pending for global scope, but allowed in a module constructor?  *If I understood the skimming surface reading so far*.
>> 
>> ```d
>> immutable string[string] things;
>> static (this) {
>>     things = ["key1": "value 1", "key2": "value 2"];
>> }
>> ```
>
> (Typo: It's `static this()`.)
>
Yep, that's a typo.

>> Is there a magic incantation that could convert the values to a `std.zlib.compress`ed ubyte array, at compile time?  So the object code gets keys:compvals instead of the full string value?
>
> There's a big roadblock: std.zlib.compress cannot go through CTFE, because the source code of zlib isn't available to the compiler; it's not even D code.
>
> Maybe there's a CTFE-able compression library on dub. If not, you can write your own function and run that through CTFE. Example with simple run-length encoding:
>
> ----
> uint[] my_compress(string s)
> {
>     import std.algorithm: group;
>     import std.string: representation;
>     uint[] compressed;
>     foreach (c_n; group(s.representation))
>     {
>         compressed ~= [c_n[0], c_n[1]];
>     }
>     return compressed;
> }
>
> string my_uncompress(const(uint)[] compressed)
> {
>     import std.conv: to;
>     string uncompressed = "";
>     for (; compressed.length >= 2; compressed = compressed[2 .. $])
>     {
>         foreach (i; 0 .. compressed[1])
>         {
>             uncompressed ~= compressed[0].to!char;
>         }
>     }
>     return uncompressed;
> }
>
> import std.array: replicate;
>
> /* CTFE compression: */
> enum compressed = my_compress("f" ~ "o".replicate(100_000) ~ "bar");
>
> immutable string[string] things;
> shared static this()
> {
>     /* Runtime decompression: */
>     things = ["key1": my_uncompress(compressed)];
> }
> ----
>
> If you compile that, the object file should be far smaller than 100,000 bytes, thanks to the compression.

Cool.  So, is might not be obvious, but there is a path to this little nicety.

>
> [...]
>> I'm not sure about
>> 
>> a) if code in a module constructor is even a candidate for CTFE?
>
> The word "candidate" might indicate a common misunderstanding of CTFE. CTFE doesn't look for candidates. It's not an optimization. The language dictates which values go through CTFE.
>
> In a way, static constructors are the opposite of CTFE. Initializers in module scope do go through CTFE. When you have code that you cannot (or don't want to) put through CTFE, you put it in a static constructor.
>
> You can still trigger CTFE within a static constructor by other means (e.g., `enum`), but the static constructor itself is just another function as far as CTFE is concerned.

Ok.  I'm hoping this gets easier to reason with once I get further up the D curve.

>
>> b) what a cast might look like to get a `q"DELIM ... DELIM"` delimited string for use as input to std.zlib.compress?
>
> A cast to get a string literal? That doesn't make sense.

No, no it doesn't.  And it didn't help that I had the order of AA key and value syntax backwards in my head when I was typing in the question.  I was thinking it was `key[value]`, not the proper `value[key]`.

So in this case, `(ubyte[])[string]` was what I *think* I'd be aiming for as the AA type spec.  The inputs to compress are `const(void)[]`, so I figured I needed to cast the type inferred literal delimited string for use in compress.  More things to learn.  ;-)

I cannot claim to be on solid ground of understanding when it comes to some areas of D syntax yet.

>
> You might be looking for `import("some_file")`. That gives you the contents of a file as a string. You can then run that string through your compression function in CTFE, put the resulting compressed data into the object file, and decompress it at runtime (like the example above does).

That's the goal.  It's an optional goal at this point.  I'm not *really* worried about size of object code, yet, but figured this would be a neat way to shrink the compiled code generated from some large COBOL source fragments embedded in D source.

COBOL programmer me might have planned to run the fragments through a compressor, then copy those outputs to the D source by hand, but that would be a maintenance headache and make for far less grokkable code.

Thanks for the hints, ag0aep6g.  You've given me some more paths to explore.

Have good.
August 23, 2021

On Monday, 23 August 2021 at 14:04:05 UTC, Brian Tiffin wrote:

>

That's the goal. It's an optional goal at this point. I'm not really worried about size of object code, yet, but figured this would be a neat way to shrink the compiled code generated from some large COBOL source fragments embedded in D source.

The decompression needs to happen at runtime, where these libraries are still useful. The compression could happen through CTFE once some suitable compression code is written in D, but that's not actually required to get the results of

  1. your object file contains compressed strings

  2. your program decompresses them at runtime

You can still achieve this end by having your build system compress external files that D then includes.

Manually setting this up:

$ dd if=/dev/zero bs=$((1024*1024)) count=1024 of=gigabyte.data
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.3655 s, 786 MB/s

$ time zip giga.zip gigabyte.data
  adding: gigabyte.data (deflated 100%)

real    0m5.645s
user    0m5.470s
sys     0m0.160s

$ du -sh giga.zip
1020K   giga.zip

$ dmd -J. -O zeroes.d

$ du -sh zeroes
3.3M    zeroes

$ time ./zeroes > out.data

real    0m3.310s
user    0m1.486s
sys     0m1.167s

$ diff -s gigabyte.data out.data
Files gigabyte.data and out.data are identical

From this zeroes.d:

import std.stdio : write;
import std.zip;

enum zeroes = import("giga.zip");

void main() {
    auto zip = new ZipArchive(cast(char[]) zeroes);
    ArchiveMember am = zip.directory.values[0];
    zip.expand(am);
    write(cast(char[]) am.expandedData);
}
August 23, 2021

On Monday, 23 August 2021 at 14:49:17 UTC, jfondren wrote:

>

On Monday, 23 August 2021 at 14:04:05 UTC, Brian Tiffin wrote:

>

That's the goal. It's an optional goal at this point. I'm not really worried about size of object code, yet, but figured this would be a neat way to shrink the compiled code generated from some large COBOL source fragments embedded in D source.

The decompression needs to happen at runtime, where these libraries are still useful. The compression could happen through CTFE once some suitable compression code is written in D, but that's not actually required to get the results of

  1. your object file contains compressed strings

  2. your program decompresses them at runtime

You can still achieve this end by having your build system compress external files that D then includes.

Manually setting this up:

$ dd if=/dev/zero bs=$((1024*1024)) count=1024 of=gigabyte.data
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.3655 s, 786 MB/s

$ time zip giga.zip gigabyte.data
  adding: gigabyte.data (deflated 100%)

real    0m5.645s
user    0m5.470s
sys     0m0.160s

$ du -sh giga.zip
1020K   giga.zip

$ dmd -J. -O zeroes.d

$ du -sh zeroes
3.3M    zeroes

$ time ./zeroes > out.data

real    0m3.310s
user    0m1.486s
sys     0m1.167s

$ diff -s gigabyte.data out.data
Files gigabyte.data and out.data are identical

From this zeroes.d:

import std.stdio : write;
import std.zip;

enum zeroes = import("giga.zip");

void main() {
    auto zip = new ZipArchive(cast(char[]) zeroes);
    ArchiveMember am = zip.directory.values[0];
    zip.expand(am);
    write(cast(char[]) am.expandedData);
}

Yep, pondered external tooling, but that's not a goal either really. Want people, well me actually, looking at the source file to be able to quickly scan over the fragments from the single D source. I'm still ok with high-school level D at this point, and will just compile in the heredoc strings, as-is.

And thanks, jfondren. Making another bookmark for later visiting once further up the D curve.

Cheers