June 24, 2023
On Thursday, 22 June 2023 at 05:21:52 UTC, Cecil Ward wrote:
> On Thursday, 22 June 2023 at 01:44:22 UTC, Jonathan M Davis wrote:
>> On Wednesday, June 21, 2023 7:05:28 PM MDT Paul Backus via Digitalmars-d-learn wrote:
>>> [...]
>>
>> To add to that, it _has_ to know the element type, because aside from anything related to a type's size, it bit-blits the type's init value onto the new elements when it increases the length of the dynamic array.
>>
>> You'd probably be dealing with bytes if you were explicitly asking for memory and the like (e.g. with malloc), but a dynamic array is properly typed, and everything you do with it in @safe code is going to deal with it as properly typed. For it to be otherwise would require @system casts.
>>
>> - Jonathan M Davis
>
> Thankyou Jonathan!

I just had a fight with LDC over the following code when I tried out reserve. I have an associative array that maps strings to ‘ordinals’ ie uints that are unique, and the compiler hates the call to reserve.

==



struct decls_t
	{
	uint                 		n_entries = 0;
	uint[ dstring ]		ordinals;	// Associative array maps variable names to ordinals
	}

static decls_t Decls;

enum NPreAllocEntries = 32;
Decls.ordinals.reserve( NPreAllocEntries );

source>(82): Error: none of the overloads of template `object.reserve` are callable using argument types `!()(uint[dstring], ulong)`
/opt/compiler-explorer/ldc1.32.1/ldc2-1.32.1-linux-x86_64/bin/../import/object.d(3983):        Candidate is: `reserve(T)(ref T[] arr, size_t newcapacity)`
Compiler returned: 1

June 23, 2023
On Friday, June 23, 2023 7:02:12 PM MDT Cecil Ward via Digitalmars-d-learn wrote:
> I just had a fight with LDC over the following code when I tried out reserve. I have an associative array that maps strings to ‘ordinals’ ie uints that are unique, and the compiler hates the call to reserve.
>
> ==
>
>
>
> struct decls_t
>   {
>   uint                        n_entries = 0;
>   uint[ dstring ]     ordinals;   // Associative array maps variable
> names to ordinals
>   }
>
> static decls_t Decls;
>
> enum NPreAllocEntries = 32;
> Decls.ordinals.reserve( NPreAllocEntries );
>
> source>(82): Error: none of the overloads of template
> `object.reserve` are callable using argument types
> `!()(uint[dstring], ulong)`
> /opt/compiler-explorer/ldc1.32.1/ldc2-1.32.1-linux-x86_64/bin/../import/obje
> ct.d(3983):        Candidate is: `reserve(T)(ref T[] arr, size_t
> newcapacity)` Compiler returned: 1

Associative arrays and dynamic arrays are completely different things. Associative arrays are hash tables, and reserve really doesn't make sense for them. reserve is for telling the GC to make sure that a dynamic array has at least a specific amount of room to grow into before the GC needs to do a reallocation so that the dynamic array refers to a different memory block with enough memory to hold the data, whereas if and when associative arrays have to reallocate any of their internals is largely implementation-defined.

Any time that you add or remove elements from an AA, it might reallocate some of its internals depending on its current state and what the key of the element is - and that could be different between different compiler releases (though it's unlikely to change very often, since I don't think that the AA implementation gets messed with much).

You can use the rehash function on AAs to tell the GC to try to reorder how it's structured all of its buckets so that lookups are more efficient with the data that's currently in there, and you can call clear to remove all its elements, but in general, you don't do much to manage an AA's memory. It's a much more complicated data structure than an array.

https://dlang.org/spec/hash-map.html

- Jonathan M Davis




June 24, 2023
On 6/20/23 19:09, Cecil Ward wrote:

> 2.) I have a dynamic array and I wish to preinitialise its alloc cell to
> be a certain large size so that I don’t need to reallocate often

To be complete, 'assumeSafeAppend' must be mentioned here as well. Without it, there will be cases where the GC cannot guarantee that there are no slices to this particular one; so it has to reallocate:

import std;

void main() {
    // An array with room for 100 elements
    int[] arr;
    arr.reserve(100);

    // Take note of current address of the elements
    auto ptr = arr.ptr;

    foreach (i; 0 .. 80) {
        // Add elements
        arr ~= i;

        // Was there a reallocation?
        if (arr.ptr != ptr) {
            writeln("relocated to ", arr.ptr, " at ", i);
            ptr = arr.ptr;
        }

        // Let's say our algorithm shrinks the array
        if (i == 50) {
            arr.length = 0;
            // assumeSafeAppend(arr);
        }
    }
}

Although the array has room for 100 elements, the program will print something similar to the following:

relocated to 7F058B02B000 at 51
relocated to 7F058B02C000 at 54
relocated to 7F058B02D000 at 58
relocated to 7F058B02E000 at 62
relocated to 7F058B02F000 at 66
relocated to 7F058B030000 at 74

When it's known that there is no other slice to the old elements, the programmer calls assumeSafeAppend() by uncommenting that line :o). Now there are no relocations. Sweet!

Ali

June 24, 2023
On Saturday, 24 June 2023 at 01:28:03 UTC, Jonathan M Davis wrote:
> On Friday, June 23, 2023 7:02:12 PM MDT Cecil Ward via Digitalmars-d-learn wrote:
>> I just had a fight with LDC over the following code when I tried out reserve. I have an associative array that maps strings to ‘ordinals’ ie uints that are unique, and the compiler hates the call to reserve.
>>
>> ==
>>
>>
>>
>> struct decls_t
>>   {
>>   uint                        n_entries = 0;
>>   uint[ dstring ]     ordinals;   // Associative array maps variable
>> names to ordinals
>>   }
>>
>> static decls_t Decls;
>>
>> enum NPreAllocEntries = 32;
>> Decls.ordinals.reserve( NPreAllocEntries );
>>
>> source>(82): Error: none of the overloads of template
>> `object.reserve` are callable using argument types
>> `!()(uint[dstring], ulong)`
>> /opt/compiler-explorer/ldc1.32.1/ldc2-1.32.1-linux-x86_64/bin/../import/obje
>> ct.d(3983):        Candidate is: `reserve(T)(ref T[] arr, size_t
>> newcapacity)` Compiler returned: 1
>
> Associative arrays and dynamic arrays are completely different things. Associative arrays are hash tables, and reserve really doesn't make sense for them. reserve is for telling the GC to make sure that a dynamic array has at least a specific amount of room to grow into before the GC needs to do a reallocation so that the dynamic array refers to a different memory block with enough memory to hold the data, whereas if and when associative arrays have to reallocate any of their internals is largely implementation-defined.
>
> Any time that you add or remove elements from an AA, it might reallocate some of its internals depending on its current state and what the key of the element is - and that could be different between different compiler releases (though it's unlikely to change very often, since I don't think that the AA implementation gets messed with much).
>
> You can use the rehash function on AAs to tell the GC to try to reorder how it's structured all of its buckets so that lookups are more efficient with the data that's currently in there, and you can call clear to remove all its elements, but in general, you don't do much to manage an AA's memory. It's a much more complicated data structure than an array.
>
> https://dlang.org/spec/hash-map.html
>
> - Jonathan M Davis

Jonathan, is it possible that I wanted one thing and got another? My description in the earlier post was of the _aim_ of the program. What I ended up with might be something else? I wanted an array of uints whose values are the results/outputs of the mapping function. Since it is keyed by strings I assumed that the runtime generates some kind of hash for fast lookup when I ask it to retrieve an entry by the string (key) associated with it. I assumed that in some sense the hashing was sort of separate with some degree of independence from the underlying array, if that makes sense. The lookup is just assumed to be fast but how it is done we don’t really care. I just wanted to expand the array as I did successfully elsewhere with reserve, as I built this structure by successive additions of data. I have a number of strings and the map is meant to output the ordinal number in which I first saw them, zero-based. Then I want to come back and randomly look up one ordinal given a string preferably with a very fast lookup. The number of entries can not practically be more than 30, and even that would be highly unusual, maybe ten is the practical limit in my particular case, so it’s hardly MySQL.
June 24, 2023
On Saturday, 24 June 2023 at 07:36:26 UTC, Cecil Ward wrote:
> Jonathan, is it possible that I wanted one thing and got another? My description in the earlier post was of the _aim_ of the program. What I ended up with might be something else? I wanted an array of uints whose values are the results/outputs of the mapping function. Since it is keyed by strings I assumed that the runtime generates some kind of hash for fast lookup when I ask it to retrieve an entry by the string (key) associated with it. I assumed that in some sense the hashing was sort of separate with some degree of independence from the underlying array, if that makes sense. The lookup is just assumed to be fast but how it is done we don’t really care. I just wanted to expand the array as I did successfully elsewhere with reserve, as I built this structure by successive additions of data. I have a number of strings and the map is meant to output the ordinal number in which I first saw them, zero-based. Then I want to come back and randomly look up one ordinal given a string preferably with a very fast lookup. The number of entries can not practically be more than 30, and even that would be highly unusual, maybe ten is the practical limit in my particular case, so it’s hardly MySQL.

I just realised something, your point about altering the table and having to rehash, is well taken. I hadn’t considered that. The reason for my foolishness in failing to realise that I’m asking the impractical is my pattern of usage. I add all the entries into the mapping table and have no interest in any lookups until it is fully built. Then a second function starts to do lookups while the data remains unchanging and that usage pattern can be guaranteed. I could even idup it if that would help, as copying < 32 uints wouldn’t take forever. A typical value would be a mere 5 or less. I only picked 32 to be completely safely ott.
June 24, 2023
On Saturday, June 24, 2023 1:43:53 AM MDT Cecil Ward via Digitalmars-d-learn wrote:
> On Saturday, 24 June 2023 at 07:36:26 UTC, Cecil Ward wrote:
> > Jonathan, is it possible that I wanted one thing and got another? My description in the earlier post was of the _aim_ of the program. What I ended up with might be something else? I wanted an array of uints whose values are the results/outputs of the mapping function. Since it is keyed by strings I assumed that the runtime generates some kind of hash for fast lookup when I ask it to retrieve an entry by the string (key) associated with it. I assumed that in some sense the hashing was sort of separate with some degree of independence from the underlying array, if that makes sense. The lookup is just assumed to be fast but how it is done we don’t really care. I just wanted to expand the array as I did successfully elsewhere with reserve, as I built this structure by successive additions of data. I have a number of strings and the map is meant to output the ordinal number in which I first saw them, zero-based. Then I want to come back and randomly look up one ordinal given a string preferably with a very fast lookup. The number of entries can not practically be more than 30, and even that would be highly unusual, maybe ten is the practical limit in my particular case, so it’s hardly MySQL.
>
> I just realised something, your point about altering the table and having to rehash, is well taken. I hadn’t considered that. The reason for my foolishness in failing to realise that I’m asking the impractical is my pattern of usage. I add all the entries into the mapping table and have no interest in any lookups until it is fully built. Then a second function starts to do lookups while the data remains unchanging and that usage pattern can be guaranteed. I could even idup it if that would help, as copying < 32 uints wouldn’t take forever. A typical value would be a mere 5 or less. I only picked 32 to be completely safely ott.

Well, if the key were a struct or a class, the hashing function would be opHash. For built-in types, the runtime has hashing functions that it uses. Either way, with AAs, you really don't worry about managing the memory, because it's completely outside of your control. You just put the elements in there using their associated keys, and if you want to try to speed it up after you've populated it, you use rehash so that the runtime can try to move the elements around within the container so that lookup speeds will be closer to optimal.

As such, for the most part, when dealing with AAs and worrying about efficiency, the question really becomes whether AAs are the correct solution rather than much of anything having to do with how you manage their memory.

With so few elements, it's also possible that using std.algorithm.searching.find would be faster - e.g. having a dynamic array of strings where the matching int is at the same index in a dynamic array of ints - or you could use std.typecons.Tuple!(string, int)[] with something like arr.find!(a => a[0] == key)() to find the tuple with the int you want.

Simply comparing a small number of strings like that might be faster than what goes on with hashing the string and then finding the corresponding element within the AA - or it might not be. You'd have to test that to know. The AA would definitely be faster with a large number of elements, but with a small number of elements, the algorithmic complexity doesn't really matter, and the extra overhad with the AA lookups could actually mean that the search through the dynamic array is faster even though it's O(n). But you can only know which is faster by testing it out with the actual data that you're dealing with.

Regardless, you need to remember that associative arrays are not arrays in the C sense. Rather, they're hash tables, so they function very differently from dynamic arrays, and the rehash function is the closest that you're going to get to affecting how the elements are laid out internally or how much memory the AA is using.

- Jonathan M Davis




June 24, 2023
On Saturday, 24 June 2023 at 12:05:26 UTC, Jonathan M Davis wrote:
> On Saturday, June 24, 2023 1:43:53 AM MDT Cecil Ward via Digitalmars-d-learn wrote:
>> On Saturday, 24 June 2023 at 07:36:26 UTC, Cecil Ward wrote:
>> > [...]
>>
>> I just realised something, your point about altering the table and having to rehash, is well taken. I hadn’t considered that. The reason for my foolishness in failing to realise that I’m asking the impractical is my pattern of usage. I add all the entries into the mapping table and have no interest in any lookups until it is fully built. Then a second function starts to do lookups while the data remains unchanging and that usage pattern can be guaranteed. I could even idup it if that would help, as copying < 32 uints wouldn’t take forever. A typical value would be a mere 5 or less. I only picked 32 to be completely safely ott.
>
> Well, if the key were a struct or a class, the hashing function would be opHash. For built-in types, the runtime has hashing functions that it uses. Either way, with AAs, you really don't worry about managing the memory, because it's completely outside of your control. You just put the elements in there using their associated keys, and if you want to try to speed it up after you've populated it, you use rehash so that the runtime can try to move the elements around within the container so that lookup speeds will be closer to optimal.
>
> As such, for the most part, when dealing with AAs and worrying about efficiency, the question really becomes whether AAs are the correct solution rather than much of anything having to do with how you manage their memory.
>
> With so few elements, it's also possible that using std.algorithm.searching.find would be faster - e.g. having a dynamic array of strings where the matching int is at the same index in a dynamic array of ints - or you could use std.typecons.Tuple!(string, int)[] with something like arr.find!(a => a[0] == key)() to find the tuple with the int you want.
>
> Simply comparing a small number of strings like that might be faster than what goes on with hashing the string and then finding the corresponding element within the AA - or it might not be. You'd have to test that to know. The AA would definitely be faster with a large number of elements, but with a small number of elements, the algorithmic complexity doesn't really matter, and the extra overhad with the AA lookups could actually mean that the search through the dynamic array is faster even though it's O(n). But you can only know which is faster by testing it out with the actual data that you're dealing with.
>
> Regardless, you need to remember that associative arrays are not arrays in the C sense. Rather, they're hash tables, so they function very differently from dynamic arrays, and the rehash function is the closest that you're going to get to affecting how the elements are laid out internally or how much memory the AA is using.
>
> - Jonathan M Davis

I started out looking into a number of runtime library routines, but in the end it seemed quicker to roll my own code for a crude recursive descent parser/lexer that parses part of D’s grammar for expressions, and (again partial grammar) parser for string literal expressions and so on. I find certain special elements and execute actions which involve doing the AA lookup and replacing variable names with ordinal numbers in decimal in the output stream. Admission: The parsing is the thing that has to be fast, even though again the size of the D language text is not likely to be huge at all. But 40 years ago, I came from a world with 2k RAM and 0.9 MHz clock rates so I have developed a habit of always thinking about speed before I do anything, needful or not, to be honest. I once wrote a program that took 35 mins to evaluate 2+2 and print out the answer, so I’m now ashamed of writing slow code. Those were bad days, to be honest. 4 GHz+ and ILP is nicer.
June 24, 2023
On Saturday, June 24, 2023 8:43:00 AM MDT Cecil Ward via Digitalmars-d-learn wrote:
> I started out looking into a number of runtime library routines, but in the end it seemed quicker to roll my own code for a crude recursive descent parser/lexer that parses part of D’s grammar for expressions, and (again partial grammar) parser for string literal expressions and so on. I find certain special elements and execute actions which involve doing the AA lookup and replacing variable names with ordinal numbers in decimal in the output stream. Admission: The parsing is the thing that has to be fast, even though again the size of the D language text is not likely to be huge at all. But 40 years ago, I came from a world with 2k RAM and 0.9 MHz clock rates so I have developed a habit of always thinking about speed before I do anything, needful or not, to be honest. I once wrote a program that took 35 mins to evaluate 2+2 and print out the answer, so I’m now ashamed of writing slow code. Those were bad days, to be honest. 4 GHz+ and ILP is nicer.

Well, dmd is open source (and Boost-licensed, so it doesn't really have any restrictions), so depending on what you're doing, it might make sense to just take code from that (and it's very fast). IIRC, it pulls some fun tricks like replacing identical strings with pointers to the same string so that it can just compare pointers.

- Jonathan M Davis




June 24, 2023
On Saturday, 24 June 2023 at 15:12:14 UTC, Jonathan M Davis wrote:
> On Saturday, June 24, 2023 8:43:00 AM MDT Cecil Ward via Digitalmars-d-learn wrote:
>> I started out looking into a number of runtime library routines, but in the end it seemed quicker to roll my own code for a crude recursive descent parser/lexer that parses part of D’s grammar for expressions, and (again partial grammar) parser for string literal expressions and so on. I find certain special elements and execute actions which involve doing the AA lookup and replacing variable names with ordinal numbers in decimal in the output stream. Admission: The parsing is the thing that has to be fast, even though again the size of the D language text is not likely to be huge at all. But 40 years ago, I came from a world with 2k RAM and 0.9 MHz clock rates so I have developed a habit of always thinking about speed before I do anything, needful or not, to be honest. I once wrote a program that took 35 mins to evaluate 2+2 and print out the answer, so I’m now ashamed of writing slow code. Those were bad days, to be honest. 4 GHz+ and ILP is nicer.
>
> Well, dmd is open source (and Boost-licensed, so it doesn't really have any restrictions), so depending on what you're doing, it might make sense to just take code from that (and it's very fast). IIRC, it pulls some fun tricks like replacing identical strings with pointers to the same string so that it can just compare pointers.
>
> - Jonathan M Davis

Yeah, it would take me forever to get my head around that, and I only want a crude toy partial parser for certain portions of the grammar, and the parsing code is done now. A hand-written recursive descent type thing mainly dealing with things like comments and literal string that have to be taken account of as they prevent hazards to naive straight string searching for what you want to find, as comments and eg double-quoted strings could have things in them that are red-herrings or the things that you want to find items in, depending on circumstances.

I’m trying to get my head round the differences between OSX tools and those for Linux relating to LDC and GDC which seems slightly inferior in some situations. I’m a serious professional asm programmer of old, before compilers were of usable output quality for
git-hard applications. (‘a git’ a disreputable person, colloquial English English. ‘git hard’ - brain-meltingly hard, like quantum gravity.)
June 24, 2023
On Saturday, 24 June 2023 at 16:42:45 UTC, Cecil Ward wrote:
> On Saturday, 24 June 2023 at 15:12:14 UTC, Jonathan M Davis wrote:
>> [...]
>
> Yeah, it would take me forever to get my head around that, and I only want a crude toy partial parser for certain portions of the grammar, and the parsing code is done now. A hand-written recursive descent type thing mainly dealing with things like comments and literal string that have to be taken account of as they prevent hazards to naive straight string searching for what you want to find, as comments and eg double-quoted strings could have things in them that are red-herrings or the things that you want to find items in, depending on circumstances.
>
> [...]

I read an article about just that good strings trick many many years back, and the author called it ‘a string universe’, which I really liked.
1 2
Next ›   Last »