Thread overview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
June 25, 2023 Counting an initialised array, and segments | ||||
---|---|---|---|---|
| ||||
I recently had some problems dchar[] arr = [ ‘ ‘, TAB, CR, LF … ]; and I got errors from the compiler which led to me having to count the elements in the initialiser and declare the array with an explicit size. I don’t want the array to be mutable so I later added immutable to it, but that didn’t help matters. At one point, because the array was quite long, I got the arr[ n_elements ] number wrong, it was too small and the remainder of the array was full of 0xffs (or something), which was good, helped me spot the bug. Is there any way to get the compiler to count the number of elements in the initialiser and set the array to that size ? And it’s immutable. The only reason that I’m giving it a name is that I want the object to be used in several places and I don’t want multiple copies of it in the code/readonly initialised data segment. Another couple of unrelated questions: is there such a thing as a no-execute initialised readonly data segment? I’m seeing immutables going into the code segment, I think, with x86 LDC at least, can’t remember about GDC. Anyway on x86-64 immutables are addressed as [rip + displ] which is very pleasing as it’s vastly more efficient than accessing statics in TLS which seems to be a nightmare in Linux at least. In MS Windows, isn’t TLS dealt with using FS: ( or GS: ?) prefixes? Shame this doesn’t seem to be exploited in Linux, or am I wrong? I’d like to deal with the overhead of retrieving the static base address all the time in the Linux situation (if I have got the right end of the stick) but having an ‘application object’ which contains all the statics in a struct in an alloc cell or something, and passing a pointer to this static base app object everywhere seems a nightmare too as it eats a register and worse eats one of the limited number of precious function argument registers which are in short supply in eg x86-64, where there are less than half a dozen argument registers allowed. I realise that one can deal with that limited number by rolling some passed arguments up into a passed struct, but that’s introducing a level of indirection and other overhead, that or just live with the fact that the extra args are going into the stack, which isn’t the worst thing in the world. I wonder what others do about statics in TLS? |
June 26, 2023 Re: Counting an initialised array, and segments | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cecil Ward | ``` import std; auto arr = [dchar(' '), '\t', 0x0a, 0x10]; void main() { writeln("Hello D: ", typeid(arr)); } ``` |
June 26, 2023 Re: Counting an initialised array, and segments | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cecil Ward | On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via Digitalmars-d-learn wrote: > I recently had some problems > > dchar[] arr = [ ‘ ‘, TAB, CR, LF … ]; > > and I got errors from the compiler which led to me having to > count the elements in the initialiser and declare the array with > an explicit size. I don’t want the array to be mutable so I later > added immutable to it, but that didn’t help matters. At one > point, because the array was quite long, I got the arr[ > n_elements ] number wrong, it was too small and the remainder of > the array was full of 0xffs (or something), which was good, > helped me spot the bug. > > Is there any way to get the compiler to count the number of elements in the initialiser and set the array to that size ? And it’s immutable. Without seeing the errors, I can't really say what the problem was, but most character literals are going to be char, not dchar, so you may have had issues related to the type that the compiler was inferring for the array literal. I don't recall at the moment how exactly the compiler decides the type of an array literal when it's given values of differing types for the elements. Either way, if you want a static array, and you don't want to have to count the number of elements, then https://dlang.org/phobos/std_array.html#staticArray should take care of that problem. - Jonathan M Davis |
June 26, 2023 Re: Counting an initialised array, and segments | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Monday, 26 June 2023 at 08:26:31 UTC, Jonathan M Davis wrote:
> On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via Digitalmars-d-learn wrote:
>> I recently had some problems
>>
>> dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];
>>
>> and I got errors from the compiler which led to me having to
>> count the elements in the initialiser and declare the array with
>> an explicit size. I don’t want the array to be mutable so I later
>> added immutable to it, but that didn’t help matters. At one
>> point, because the array was quite long, I got the arr[
>> n_elements ] number wrong, it was too small and the remainder of
>> the array was full of 0xffs (or something), which was good,
>> helped me spot the bug.
>>
>> Is there any way to get the compiler to count the number of elements in the initialiser and set the array to that size ? And it’s immutable.
>
> Without seeing the errors, I can't really say what the problem was, but most character literals are going to be char, not dchar, so you may have had issues related to the type that the compiler was inferring for the array literal. I don't recall at the moment how exactly the compiler decides the type of an array literal when it's given values of differing types for the elements.
>
> Either way, if you want a static array, and you don't want to have to count the number of elements, then https://dlang.org/phobos/std_array.html#staticArray should take care of that problem.
>
> - Jonathan M Davis
Where I used symbolic names, such as TAB, that was defined as an int (or uint)
enum TAB = 9;
or
enum uint TAB = 9;
I forget which. So I had at least one item that was typed something wider than a char.
I tried the usual sizeof( arr )/ sizeof dchar, compiler wouldn’t have that for some reason, and yes I know it should be D syntax, god how I long for C sizeof()!
|
June 26, 2023 Re: Counting an initialised array, and segments | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cecil Ward | On Monday, June 26, 2023 5:08:06 AM MDT Cecil Ward via Digitalmars-d-learn wrote:
> On Monday, 26 June 2023 at 08:26:31 UTC, Jonathan M Davis wrote:
> > On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via
> >
> > Digitalmars-d-learn wrote:
> >> I recently had some problems
> >>
> >> dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];
> >>
> >> and I got errors from the compiler which led to me having to
> >> count the elements in the initialiser and declare the array
> >> with
> >> an explicit size. I don’t want the array to be mutable so I
> >> later
> >> added immutable to it, but that didn’t help matters. At one
> >> point, because the array was quite long, I got the arr[
> >> n_elements ] number wrong, it was too small and the remainder
> >> of
> >> the array was full of 0xffs (or something), which was good,
> >> helped me spot the bug.
> >>
> >> Is there any way to get the compiler to count the number of elements in the initialiser and set the array to that size ? And it’s immutable.
> >
> > Without seeing the errors, I can't really say what the problem was, but most character literals are going to be char, not dchar, so you may have had issues related to the type that the compiler was inferring for the array literal. I don't recall at the moment how exactly the compiler decides the type of an array literal when it's given values of differing types for the elements.
> >
> > Either way, if you want a static array, and you don't want to have to count the number of elements, then https://dlang.org/phobos/std_array.html#staticArray should take care of that problem.
> >
> > - Jonathan M Davis
>
> Where I used symbolic names, such as TAB, that was defined as an
> int (or uint)
> enum TAB = 9;
> or
> enum uint TAB = 9;
> I forget which. So I had at least one item that was typed
> something wider than a char.
>
> I tried the usual sizeof( arr )/ sizeof dchar, compiler wouldn’t
> have that for some reason, and yes I know it should be D syntax,
> god how I long for C sizeof()!
sizeof is a property in D. So, you can do char.sizeof or varName.sizeof. But regardless, there really is no reason to use sizeof with D arrays under normal circumstances. And in the case of dynamic arrays, sizeof will give you the size of the dynamic array itself, not the slice of memory that it refers to. You're essentially using sizeof on
struct DynamicArray(T)
{
size_t length;
T* ptr;
}
which is not going to tell you anything about the memory it points to. The length property of an array already tells you the length of the array (be it static or dynamic), so using sizeof like you're talking about really does not apply to D.
And I wouldn't advise using uint for a character in D. That's what char, wchar, and dchar are for. Depending on the circumstances, you get implicit conversions between character and integer types, but they are distinct types, and mixing and matching them willy-nilly could result in compilation errors depending on what your code is doing.
- Jonathan M Davis
|
June 26, 2023 Re: Counting an initialised array, and segments | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Monday, 26 June 2023 at 12:28:15 UTC, Jonathan M Davis wrote:
> On Monday, June 26, 2023 5:08:06 AM MDT Cecil Ward via Digitalmars-d-learn wrote:
>> On Monday, 26 June 2023 at 08:26:31 UTC, Jonathan M Davis wrote:
>> > On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via
>> >
>> > Digitalmars-d-learn wrote:
>> >> I recently had some problems
>> >>
>> >> dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];
>> >>
>> >> and I got errors from the compiler which led to me having to
>> >> count the elements in the initialiser and declare the array
>> >> with
>> >> an explicit size. I don’t want the array to be mutable so I
>> >> later
>> >> added immutable to it, but that didn’t help matters. At one
>> >> point, because the array was quite long, I got the arr[
>> >> n_elements ] number wrong, it was too small and the remainder
>> >> of
>> >> the array was full of 0xffs (or something), which was good,
>> >> helped me spot the bug.
>> >>
>> >> Is there any way to get the compiler to count the number of elements in the initialiser and set the array to that size ? And it’s immutable.
>> >
>> > Without seeing the errors, I can't really say what the problem was, but most character literals are going to be char, not dchar, so you may have had issues related to the type that the compiler was inferring for the array literal. I don't recall at the moment how exactly the compiler decides the type of an array literal when it's given values of differing types for the elements.
>> >
>> > Either way, if you want a static array, and you don't want to have to count the number of elements, then https://dlang.org/phobos/std_array.html#staticArray should take care of that problem.
>> >
>> > - Jonathan M Davis
>>
>> Where I used symbolic names, such as TAB, that was defined as an
>> int (or uint)
>> enum TAB = 9;
>> or
>> enum uint TAB = 9;
>> I forget which. So I had at least one item that was typed
>> something wider than a char.
>>
>> I tried the usual sizeof( arr )/ sizeof dchar, compiler wouldn’t
>> have that for some reason, and yes I know it should be D syntax,
>> god how I long for C sizeof()!
>
> sizeof is a property in D. So, you can do char.sizeof or varName.sizeof. But regardless, there really is no reason to use sizeof with D arrays under normal circumstances. And in the case of dynamic arrays, sizeof will give you the size of the dynamic array itself, not the slice of memory that it refers to. You're essentially using sizeof on
>
> struct DynamicArray(T)
> {
> size_t length;
> T* ptr;
> }
>
> which is not going to tell you anything about the memory it points to. The length property of an array already tells you the length of the array (be it static or dynamic), so using sizeof like you're talking about really does not apply to D.
>
> And I wouldn't advise using uint for a character in D. That's what char, wchar, and dchar are for. Depending on the circumstances, you get implicit conversions between character and integer types, but they are distinct types, and mixing and matching them willy-nilly could result in compilation errors depending on what your code is doing.
>
> - Jonathan M Davis
No, point taken, a sloppy example. I don’t in fact do that in the real code. I use dchar everywhere appropriate instead of uint. In fact I have aliases for dstring and dchar and successfully did an alternative build with the aliases renamed to use 16-bits wchar / w string instead of 32-bits and rebuilt and all was well, just to test that it is code word size-independent. I would need to do something different though if I ever decided to change to use 16-bit code words in memory because I would still be wanting to manipulate 32-bit values for char code points when they are being handled in registers, for efficiency too as well as code correctness, as 16-bit ‘partial words’ are bad news for performance on x86-64. I perhaps ought to introduce a new alias called codepoint, which is always 32-bits, to distinguish dchar in registers from words in memory. It turns out that I can get away with not caring about utf16, as I’m merely _scanning_ a string. I couldn’t ever get away with changing the in-memory code word type to be 8-bit chars, and then using utf8 though, as I do occasionally deal with non-ASCII characters, and I would have to either preconvert the Utf8 to do the decoding, or parse 8-bit code words and handle the decoding myself on the fly which would be madness. If I have to handle utf8 data I will just preconvert it.
|
June 26, 2023 Re: Counting an initialised array, and segments | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cecil Ward | On Monday, June 26, 2023 1:09:24 PM MDT Cecil Ward via Digitalmars-d-learn wrote:
> No, point taken, a sloppy example. I don’t in fact do that in the
> real code. I use dchar everywhere appropriate instead of uint. In
> fact I have aliases for dstring and dchar and successfully did an
> alternative build with the aliases renamed to use 16-bits wchar /
> w string instead of 32-bits and rebuilt and all was well, just to
> test that it is code word size-independent. I would need to do
> something different though if I ever decided to change to use
> 16-bit code words in memory because I would still be wanting to
> manipulate 32-bit values for char code points when they are being
> handled in registers, for efficiency too as well as code
> correctness, as 16-bit ‘partial words’ are bad news for
> performance on x86-64. I perhaps ought to introduce a new alias
> called codepoint, which is always 32-bits, to distinguish dchar
> in registers from words in memory. It turns out that I can get
> away with not caring about utf16, as I’m merely _scanning_ a
> string. I couldn’t ever get away with changing the in-memory code
> word type to be 8-bit chars, and then using utf8 though, as I do
> occasionally deal with non-ASCII characters, and I would have to
> either preconvert the Utf8 to do the decoding, or parse 8-bit
> code words and handle the decoding myself on the fly which would
> be madness. If I have to handle utf8 data I will just preconvert
> it.
Well, I can't really comment on the details of what you're doing, since I don't know them, but I would point out that a dchar is a code point by definition. That is its purpose. char is a UTF-8 code unit, wchar is a UTF-16 code unit, and dchar is both a UTF-32 code unit and a code point, since UTF-32 code units are code points by definition. It is possible for a dchar to be an invalid code point if you give it bad data, but code points are 32-bit, and dchar is intended to represent that. Actual characters, of course, can be multiple code points, annoyingly enough, so all of that Unicode stuff is of course an annoyingly complicated mess, but D and Phobos do have a pretty good set of primitives for handling code units and code points without programmers needing to come up with their own types for those. char is a UTF-8 code unit, wchar is a UTF-16 code unit, and dchar is both a UTF-32 code unit and a code point, since UTF-32 code units are code points by definition.
The primary mistake in what D has is that strings are all ranges of dchar with the code units automatically being decoded to dchar by front, popFront, etc. (at the time, Andrei thought that that would ensure correctness, since he didn't understand that you could have characters that were multiple code points). We'd like to get rid of that, but it's difficult to do so without breaking code. std.utf.byCodeUnit helps work around that, and of course, you can do so by simply operating on the strings as arrays without using the range primitives, but the range primitives do decode to dchar, unfortunately. However, in spite of that quirk, the tools are there to operate on Unicode correctly in a way that don't exist out of the box with many languages. So, in general, you shouldn't need to be creating new types for Unicode primitives. The language already has that.
- Jonathan M Davis
|
June 26, 2023 Re: Counting an initialised array, and segments | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Monday, 26 June 2023 at 22:19:25 UTC, Jonathan M Davis wrote:
> On Monday, June 26, 2023 1:09:24 PM MDT Cecil Ward via Digitalmars-d-learn wrote:
>> [...]
>
> [...]
I completely agree with everything you said. I merely used aliases to give me the freedom to switch between having text in either UTF16 or UTF32 in memory, and see how the performance changes. That’s the only reason for me doing that. I also want to keep a clear distinction between words in me memory and code points in registers.
|
Copyright © 1999-2021 by the D Language Foundation