August 05, 2021

On Thursday, 5 August 2021 at 10:28:00 UTC, Steven Schveighoffer wrote:

>

H.S. Teoh, I know you know better than this ;) None of this is necessary, you just need rtValue for both runtime and CTFE (and compile time parameters)!

Now, the original question is about associative arrays, which are a different animal. Those, you actually have to initialize using a static constructor, and does indeed need both an enum and a static immutable, as CTFE currently does not understand runtime AAs. This is a huge issue since you do need silly things like the if(__ctfe) statement you wrote, and keep an enum handy for those cases which is identical to the static immutable. We really need to fix this.

When you say "We really need to fix this" you mean that eventually associative-arrays will be available at compile-time ?

>

-Steve

August 05, 2021
On Thu, Aug 05, 2021 at 03:09:13PM +0000, someone via Digitalmars-d-learn wrote:
> On Thursday, 5 August 2021 at 10:28:00 UTC, Steven Schveighoffer wrote:
> 
> > H.S. Teoh, I know you know better than this ;) None of this is necessary, you just need `rtValue` for both runtime and CTFE (and compile time parameters)!

Haha, I haven't used this particular feature of D recently, so probably my memory is failing me. ;-)


> > Now, the original question is about *associative arrays*, which are a different animal. Those, you actually have to initialize using a static constructor, and does indeed need both an enum and a static immutable, as CTFE currently does not understand runtime AAs. This is a huge issue since you do need silly things like the `if(__ctfe)` statement you wrote, and keep an enum handy for those cases which is identical to the static immutable. We really need to fix this.
> 
> When you say "We really need to fix this" you mean that *eventually* associative-arrays will be available at compile-time ?
[...]

AA's are already available at compile-time.  You can define them in CTFE and pass them around as template arguments.

What doesn't work is initializing global static immutable AA's with literals. Currently, you need this workaround:

	struct Data { /* whatever you want to store here */ }
	static immutable Data[string] aa;
	shared static this() {
		aa = [
			"abc": Data(...),
			"def": Data(...),
			// ... etc.
		];
	}

Unfortunately, this also means you can't access the value of `aa` at compile-time. So you need a separate enum in order to access AA values at compile-time.

Full runnable example:
---------------
enum ctValue = [
	"abc": 123,
	"def": 456,
];

static immutable int[string] rtValue;
shared static this() {
	rtValue = ctValue;
}

// Compile-time operations
enum x = ctValue["abc"];
enum y = ctValue["def"];
static assert(x == 123 && y == 456);

// Runtime operations
void main() {
	assert(rtValue["abc"] == 123);
	assert(rtValue["def"] == 456);
}
---------------


T

-- 
My father told me I wasn't at all afraid of hard work. I could lie down right next to it and go to sleep. -- Walter Bright
August 05, 2021
On Thursday, 5 August 2021 at 15:26:33 UTC, H. S. Teoh wrote:
> On Thu, Aug 05, 2021 at 03:09:13PM +0000, someone via Digitalmars-d-learn wrote:
>> On Thursday, 5 August 2021 at 10:28:00 UTC, Steven Schveighoffer wrote:
>> 
>> > H.S. Teoh, I know you know better than this ;) None of this is necessary, you just need `rtValue` for both runtime and CTFE (and compile time parameters)!
>
> Haha, I haven't used this particular feature of D recently, so probably my memory is failing me. ;-)
>
>
>> > Now, the original question is about *associative arrays*, which are a different animal. Those, you actually have to initialize using a static constructor, and does indeed need both an enum and a static immutable, as CTFE currently does not understand runtime AAs. This is a huge issue since you do need silly things like the `if(__ctfe)` statement you wrote, and keep an enum handy for those cases which is identical to the static immutable. We really need to fix this.
>> 
>> When you say "We really need to fix this" you mean that *eventually* associative-arrays will be available at compile-time ?
> [...]
>
> AA's are already available at compile-time.  You can define them in CTFE and pass them around as template arguments.
>
> What doesn't work is initializing global static immutable AA's with literals. Currently, you need this workaround:
>
> 	struct Data { /* whatever you want to store here */ }
> 	static immutable Data[string] aa;
> 	shared static this() {
> 		aa = [
> 			"abc": Data(...),
> 			"def": Data(...),
> 			// ... etc.
> 		];
> 	}
>
> Unfortunately, this also means you can't access the value of `aa` at compile-time. So you need a separate enum in order to access AA values at compile-time.
>
> Full runnable example:
> ---------------
> enum ctValue = [
> 	"abc": 123,
> 	"def": 456,
> ];
>
> static immutable int[string] rtValue;
> shared static this() {
> 	rtValue = ctValue;
> }
>
> // Compile-time operations
> enum x = ctValue["abc"];
> enum y = ctValue["def"];
> static assert(x == 123 && y == 456);
>
> // Runtime operations
> void main() {
> 	assert(rtValue["abc"] == 123);
> 	assert(rtValue["def"] == 456);
> }
> ---------------
>
>
> T

So if we are talking AA-arrays at compile-time only there should be nothing wrong with the following code ... right ?

private struct structureLocation {

   dstring countryID = null;
   dstring countryName = null;
   dstring city = null;
   dstring TZ = null;

}

private enum pudtLocations = [
   r"BUE"d : structureLocation(r"arg"d, r"Buenos Aires"d, r"ART"d),
   r"GRU"d : structureLocation(r"bra"d, r"São Paulo"d, r"BRT"d),
   r"HHN"d : structureLocation(r"deu"d, r"Frankfurt am Main"d, r"CET"d),
   r"LHR"d : structureLocation(r"gbr"d, r"London"d, r"UTC"d),
   r"NYC"d : structureLocation(r"usa"d, r"New York"d, r"EST"d)
   ];

private struct structureExchange {

   structureLocation location;

   dstring ID = null;
   dstring name = null;
   dstring currencyID = null;

}

private enum dstring pstrExchangeIDB3 = r"B3"d;
private enum dstring pstrExchangeIDBCBA = r"BCBA"d;
private enum dstring pstrExchangeIDLSE = r"LSE"d;
private enum dstring pstrExchangeIDNASDAQ = r"NASDAQ"d;
private enum dstring pstrExchangeIDNYSE = r"NYSE"d;
private enum dstring pstrExchangeIDXETRA = r"XETRA"d;

public enum gudtExchanges = [
   pstrExchangeIDB3     : structureExchange(pudtLocations[r"GRU"d], pstrExchangeIDB3    , r"B3 formerly Bolsa de Valores de São Paulo (aka BOVESPA)"d, r"BRL"d),
   pstrExchangeIDBCBA   : structureExchange(pudtLocations[r"BUE"d], pstrExchangeIDBCBA  , r"Bolsa de Comercio de Buenos Aires"d, r"ARS"d),
   pstrExchangeIDLSE    : structureExchange(pudtLocations[r"LHR"d], pstrExchangeIDLSE   , r"London Stock Exchange"d, r"GBP"d),
   pstrExchangeIDNASDAQ : structureExchange(pudtLocations[r"NYC"d], pstrExchangeIDNASDAQ, r"National Association of Securities Dealers Automated Quotations"d, r"USD"d),
   pstrExchangeIDNYSE   : structureExchange(pudtLocations[r"NYC"d], pstrExchangeIDNYSE  , r"New York Stock Exchange"d, r"USD"d),
   pstrExchangeIDXETRA  : structureExchange(pudtLocations[r"HHN"d], pstrExchangeIDXETRA , r"Deutsche Börse"d, r"EUR"d)
   ]; /// byKeyValue is not available at compile‐time; hence the redundancy of IDs

/*public enum gudtExchanges = [
   pstrExchangeIDB3     : structureExchange(pudtLocations[r"GRU"d], r"B3 formerly Bolsa de Valores de São Paulo (aka BOVESPA)"d, r"BRL"d),
   pstrExchangeIDBCBA   : structureExchange(pudtLocations[r"BUE"d], r"Bolsa de Comercio de Buenos Aires"d, r"ARS"d),
   pstrExchangeIDLSE    : structureExchange(pudtLocations[r"LHR"d], r"London Stock Exchange"d, r"GBP"d),
   pstrExchangeIDNASDAQ : structureExchange(pudtLocations[r"NYC"d], r"National Association of Securities Dealers Automated Quotations"d, r"USD"d),
   pstrExchangeIDNYSE   : structureExchange(pudtLocations[r"NYC"d], r"New York Stock Exchange"d, r"USD"d),
   pstrExchangeIDXETRA  : structureExchange(pudtLocations[r"HHN"d], r"Deutsche Börse"d, r"EUR"d)
   ];*/ /// byKeyValue eventually becomes available

...

static foreach (
   structureExchange sudtExchange;
   gudtExchanges
   ) {

   mixin(

   ... sudtExchange.ID ...
   ... sudtExchange.name ...
   ... sudtExchange.CurrencyID ...

   ... sudtExchange.location.countryID ...
   ... sudtExchange.location.countryName ...
   ... sudtExchange.location.city ...
   ... sudtExchange.location.TZ ...

   );

}
August 05, 2021

On Thursday, 5 August 2021 at 16:06:58 UTC, someone wrote:

>

So if we are talking AA-arrays at compile-time only there should be nothing wrong with the following code ... right ?
...
private enum pudtLocations = [
r"BUE"d : structureLocation(r"arg"d, r"Buenos Aires"d, r"ART"d),
r"GRU"d : structureLocation(r"bra"d, r"São Paulo"d, r"BRT"d),
r"HHN"d : structureLocation(r"deu"d, r"Frankfurt am Main"d, r"CET"d),
r"LHR"d : structureLocation(r"gbr"d, r"London"d, r"UTC"d),
r"NYC"d : structureLocation(r"usa"d, r"New York"d, r"EST"d)
];

Every time you use this variable in your program, it's as if you retyped this definition, which can mean runtime work building the hash table where you use it, which can mean very surprising performance hits. Consider:

enum aa = [1: 0];

// this is fine
unittest {
    foreach (key; aa.byKey)
        assert(aa[key] == 0);
}

// Error: associative array literal in `@nogc` function `enumaa.__unittest_L9_C7` may cause a GC allocation
@nogc unittest {
    foreach (key; aa.byKey)
        assert(aa[key] == 0);
}

It's not iteration that causes allocation, but the runtime building of the AA that has to happen on the spot for it to be iterated.

Maybe this isn't getting discussed directly enough. As rules:

  1. don't use enum for AA literals

  2. module-scoped AAs can be initialized in module constructors, like shared static this

  3. (depending very precisely on what you want to do, you can break #1 with no runtime cost. For example if you never need to iterate over the AA or look up a compiletime-unknown key, if all your uses are aa["literal key"], then the AA won't need to be constructed at runtime and this is just like putting the value of that key at the site)

enum aa = [1: 0];

// this is fine
unittest {
    assert(aa[1] == 0);
}

// this is still fine
@nogc unittest {
    assert(aa[1] == 0);
}

(but that't not likely to apply to your case, especially since you mentioned wanting to get this data from a file at program start and not need literals in your program.)

August 05, 2021

On Thursday, 5 August 2021 at 16:24:21 UTC, jfondren wrote:

>

On Thursday, 5 August 2021 at 16:06:58 UTC, someone wrote:

>

So if we are talking AA-arrays at compile-time only there should be nothing wrong with the following code ... right ?
...
private enum pudtLocations = [
r"BUE"d : structureLocation(r"arg"d, r"Buenos Aires"d, r"ART"d),
r"GRU"d : structureLocation(r"bra"d, r"São Paulo"d, r"BRT"d),
r"HHN"d : structureLocation(r"deu"d, r"Frankfurt am Main"d, r"CET"d),
r"LHR"d : structureLocation(r"gbr"d, r"London"d, r"UTC"d),
r"NYC"d : structureLocation(r"usa"d, r"New York"d, r"EST"d)
];

Every time you use this variable in your program, it's as if you retyped this definition, which can mean runtime work building the hash table where you use it, which can mean very surprising performance hits. Consider:

enum aa = [1: 0];

// this is fine
unittest {
    foreach (key; aa.byKey)
        assert(aa[key] == 0);
}

// Error: associative array literal in `@nogc` function `enumaa.__unittest_L9_C7` may cause a GC allocation
@nogc unittest {
    foreach (key; aa.byKey)
        assert(aa[key] == 0);
}

It's not iteration that causes allocation, but the runtime building of the AA that has to happen on the spot for it to be iterated.

Maybe this isn't getting discussed directly enough. As rules:

  1. don't use enum for AA literals

  2. module-scoped AAs can be initialized in module constructors, like shared static this

  3. (depending very precisely on what you want to do, you can break #1 with no runtime cost. For example if you never need to iterate over the AA or look up a compiletime-unknown key, if all your uses are aa["literal key"], then the AA won't need to be constructed at runtime and this is just like putting the value of that key at the site)

enum aa = [1: 0];

// this is fine
unittest {
    assert(aa[1] == 0);
}

// this is still fine
@nogc unittest {
    assert(aa[1] == 0);
}

(but that't not likely to apply to your case, especially since you mentioned wanting to get this data from a file at program start and not need literals in your program.)

Yes. Of course I remember the post you mention in which I stated the possibility of loading data from a file. And somehow this post in which I am asking for advice to finally nail-down const usage in D went back to the problem I encountered with AA at compile-time (byKeyValue) when I did not clearly understand the differences of using arrays/AA at compile-time vs run-time.

My conclusion of said post was that I have to find an alternative to AA at compile-time, and then, in this post, when I saw the comment:

"AA's are already available at compile-time" [H S Teoth]

and then the related:

"We really need to fix this" [H S Teoth]

I thought that probably there was nothing wrong with my code to begin with and thus I somehow re-asked the original question about AA at compile-time.

As you pointed me, using them the way I am using them, leads to usable code at the cost of horrible-performance compilation because of the copy/paste/build-hash-table-once-again the compiler has to do each time I reference them.

I already assumed that loading the data from file is a goner.

So this leaves me with two choices:

  • keep the code as it is incurring higher-than expected compilation-time: this is solely used in a module to build classes, I guess pretty minimal usage, but I don't know where this app will eventually go so better get it right to begin with

  • find a better alternative

Thanks for your advice jfondren :) !

August 05, 2021
On Thu, Aug 05, 2021 at 04:53:38PM +0000, someone via Digitalmars-d-learn wrote: [...]
> I already assumed that loading the data from file is a goner.
> 
> So this leaves me with two choices:
> 
> - keep the code as it is incurring higher-than expected compilation-time: this is solely used in a module to build classes, I guess pretty minimal usage, but I don't know where this app will eventually go so better get it right to begin with
> 
> - find a better alternative
[...]

I'd say if the performance hit isn't noticeably bad right now, don't worry too much about it. You can always replace it later. One thing I really like about D is how easily refactorable D code tends to be. If you structure your code in a way that keeps separate concerns apart (and D features like metaprogramming really help with this), replacing your implementation choice with something else often turns out to be surprisingly simple.

But if you expect to process very large data files at compile-time, I'd say consider generating the classes via a utility D program that emits D code as a separate step.  In one of my projects I have a whole bunch of 3D model files that I need to convert into a format more suitable for use at runtime.  I eventually opted to do this as a separate step from the main compilation: a helper utility to parse the files, massage the data, then spit out a .d file with the right definitions (including embedded array literals for binary data), then compile that into the main program.  Since the data rarely changes, it's a waste of time to keep regenerating it every time I recompile; keeping it as a separate step means I only need to rerun the helper utility when the data actually changes as opposed to every single time I change 1 line of code.


T

-- 
Those who don't understand Unix are condemned to reinvent it, poorly.
August 05, 2021

On 8/5/21 11:09 AM, someone wrote:

>

On Thursday, 5 August 2021 at 10:28:00 UTC, Steven Schveighoffer wrote:

>

H.S. Teoh, I know you know better than this ;) None of this is necessary, you just need rtValue for both runtime and CTFE (and compile time parameters)!

Now, the original question is about associative arrays, which are a different animal. Those, you actually have to initialize using a static constructor, and does indeed need both an enum and a static immutable, as CTFE currently does not understand runtime AAs. This is a huge issue since you do need silly things like the if(__ctfe) statement you wrote, and keep an enum handy for those cases which is identical to the static immutable. We really need to fix this.

When you say "We really need to fix this" you mean that eventually associative-arrays will be available at compile-time ?

I mean eventually AAs that are reasonably available at compile time, even though the structure is determined by the runtime, should be available at compile time. This allows them to be as usable with static immutable as regular arrays (and just about any other struct) are.

Right now, AA's implementation is completely opaque via extern(C) function prototypes implemented in the runtime, so the compiler doesn't know how to make one.

-Steve

August 06, 2021
On Thursday, 5 August 2021 at 17:12:13 UTC, H. S. Teoh wrote:
> [...]
>
> I'd say if the performance hit isn't noticeably bad right now, don't worry too much about it. You can always replace it later. One thing I really like about D is how easily refactorable D code tends to be. If you structure your code in a way that keeps separate concerns apart (and D features like metaprogramming really help with this), replacing your implementation choice with something else

Granted, I'll be taking your advice :)

> often turns out to be surprisingly simple.

Although I have very little experience with D, I second this: refactoring, even huge refactors, proved to be far more straightforward than I expected.

> But if you expect to process very large data files at compile-time

Not at all, very tiny files, what you saw on my examples, 10 times that at the very very far end; but more probably no more than 10/15 exchanges meaning an AA length 15, period.

> I'd say consider generating the classes via a utility D program that emits D code as a separate step.  In one of my projects I have a whole bunch of 3D model files that I need to convert into a format more suitable for use at runtime.  I eventually opted to do this as a separate step from the main compilation: a helper utility to parse the files, massage the data, then spit out a .d file with the right definitions (including embedded array literals for binary data), then compile that into the main program.  Since the data rarely changes, it's a waste of time to keep regenerating it every time I recompile; keeping it as a separate step means I only need to rerun the helper utility when the data actually changes as opposed to every single time I change 1 line of code.

I am very used to this mainly on SQL-Server with TransactSQL (horrible "language" to say the least): lots of functionallity automatically-generated from XML/XSD file definitions to transactSQL via XSLT transformations, all within SQL-Sever itself (the NET framework has superb handling of everything XML/XSD/XSLT related) ... think of it like code-on-demand or code-on-the-fly. In the end this approach gives me lots of flexibility and allows me to do things almost-unthinkable in transactSQL; key here is the ability to SQL-Server to easily invoke external code, and this is what makes me wonder what a super-combo D as a first-tier language on postgreSQL could achieve ... drooling here :)
August 06, 2021

On Thursday, 5 August 2021 at 20:50:38 UTC, Steven Schveighoffer wrote:

>

I mean eventually AAs that are reasonably available at compile time, even though the structure is determined by the runtime, should be available at compile time. This allows them to be as usable with static immutable as regular arrays (and just about any other struct) are.

>

Right now, AA's implementation is completely opaque via extern(C) function prototypes implemented in the runtime, so the compiler doesn't know how to make one.

Thanks for the clarification Steve !

August 05, 2021
On 8/5/21 5:11 PM, someone wrote:

> Although I have very little experience with D, I second this:
> refactoring, even huge refactors, proved to be far more straightforward
> than I expected.

May I humbly suggest names like Location instead of structureLocation to make refactoring even more straightforward. ;)

Ali