February 08, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Thursday, 8 February 2018 at 17:06:55 UTC, H. S. Teoh wrote: > On Thu, Feb 08, 2018 at 08:26:03AM -0500, Steven Schveighoffer via Digitalmars-d wrote: [...] >> The extra data in the object file comes from the inclusion of the hexStringImpl function, and from the template parameter (the symbol _D3std4conv__T9hexStringVAyaa8_6465616462656566ZQBiyAa is in there as well, which will always be larger than the actual string passed to hexString). > [...] > > This is one area that really should be improved. Is there some easy way in the compiler to mark a template function as "only used in CTFE", and not emit it into the object file if there are no other runtime references to it? I'm thinking of some kind of boolean attribute that defaults to false, and gets set if the function is referenced by runtime code. During codegen, any function that doesn't have this attribute set will be skipped over. > > My speculation is that this would lead to a good amount of reduction in template bloat, given how pervasively CTFE is used in Phobos (and idiomatic D in general). Or maybe you can get away with just using a good compiler/linker that supports LTO. It's quite mature in GCC now, so it's probably worth trying with GDC. http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-1-brief.html |
February 08, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 2/8/2018 7:07 AM, Steven Schveighoffer wrote:
> My concern in the hexString case is the sheer requirement of CTFE for something that is so easy to do in the compiler, already *done* in the compiler, and has another form specifically for hex strings (the "\xde\xad\xbe\xef" form) that isn't going away. It makes me laugh actually that Walter is now replacing the implementation with a mixin of that other form, incurring all the cost of CTFE so you can transform the string, while breaking existing code in the process: https://github.com/dlang/phobos/pull/6138
The breakage was due to the original implementation of hexString not producing a string literal like "abc", but producing an array literal like ['a', 'b', 'c'], which was not what the documentation said it did. And naturally, some uses wound up relying on the array behavior.
What the PR does is fix hexString so that hexString!"deadbeef" rewrites it to the string literal "\xde\xad\xbe\xef". It's classic "lowering". Isn't it amazing that D can even do this?
Simplifying the compiler and pushing things off into the library makes the compiler and spec smaller and less buggy. It also has the nice feature of providing a simple path for anyone who wants to write their own custom string syntax, such as EBCDIC string literals (!).
|
February 08, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 2/8/2018 5:26 AM, Steven Schveighoffer wrote: > The extra data in the object file comes from the inclusion of the hexStringImpl function, and from the template parameter (the symbol _D3std4conv__T9hexStringVAyaa8_6465616462656566ZQBiyAa is in there as well, which will always be larger than the actual string passed to hexString). > > I also see the data in there twice for some reason. This is no longer the case with the PR. import std.conv; void test() { __gshared immutable char[4] s = hexString!"deadbeef"; } produces the following, with no sign of the template and the data is there only once: _TEXT segment dword use32 public 'CODE' ;size is 0 _TEXT ends _DATA segment para use32 public 'DATA' ;size is 4 _DATA ends CONST segment para use32 public 'CONST' ;size is 14 CONST ends _BSS segment para use32 public 'BSS' ;size is 0 _BSS ends FLAT group extrn _D5test24testFZv public _D5test24testFZ1syG4a FMB segment dword use32 public 'DATA' ;size is 0 FMB ends FM segment dword use32 public 'DATA' ;size is 4 FM ends FME segment dword use32 public 'DATA' ;size is 0 FME ends public _D5test212__ModuleInfoZ _D5test24testFZv COMDAT flags=x0 attr=x0 align=x0 _TEXT segment assume CS:_TEXT _TEXT ends _DATA segment _D5test24testFZ1syG4a: db 0ffffffdeh,0ffffffadh,0ffffffbeh,0ffffffefh ;.... _DATA ends CONST segment _D5test212__ModuleInfoZ: db 004h,010h,000h,000h,000h,000h,000h,000h ;........ db 074h,065h,073h,074h,032h,000h ;test2. CONST ends _BSS segment _BSS ends FMB segment FMB ends FM segment dd offset FLAT:_D5test212__ModuleInfoZ FM ends FME segment FME ends _D5test24testFZv comdat assume CS:_D5test24testFZv ret _D5test24testFZv ends end |
February 08, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Thursday, 8 February 2018 at 18:31:06 UTC, Walter Bright wrote:
> On 2/8/2018 5:26 AM, Steven Schveighoffer wrote:
>> The extra data in the object file comes from the inclusion of the hexStringImpl function, and from the template parameter (the symbol _D3std4conv__T9hexStringVAyaa8_6465616462656566ZQBiyAa is in there as well, which will always be larger than the actual string passed to hexString).
>>
>> I also see the data in there twice for some reason.
>
> This is no longer the case with the PR.
>
> import std.conv;
>
> void test() {
> __gshared immutable char[4] s = hexString!"deadbeef";
> }
>
> produces the following, with no sign of the template and the data is there only once:
>
> _DATA segment
> _D5test24testFZ1syG4a:
> db 0ffffffdeh,0ffffffadh,0ffffffbeh,0ffffffefh ;....
> _DATA ends
But it looks like they are all dchar, so 4x the space vs x"deadbeef"?
|
February 08, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 2/8/18 1:25 PM, Walter Bright wrote: > On 2/8/2018 7:07 AM, Steven Schveighoffer wrote: >> My concern in the hexString case is the sheer requirement of CTFE for something that is so easy to do in the compiler, already *done* in the compiler, and has another form specifically for hex strings (the "\xde\xad\xbe\xef" form) that isn't going away. It makes me laugh actually that Walter is now replacing the implementation with a mixin of that other form, incurring all the cost of CTFE so you can transform the string, while breaking existing code in the process: https://github.com/dlang/phobos/pull/6138 > > The breakage was due to the original implementation of hexString not producing a string literal like "abc", but producing an array literal like ['a', 'b', 'c'], which was not what the documentation said it did. And naturally, some uses wound up relying on the array behavior. "abc" is an array (it's an immutable(char)[]). There's no reason why ['a','b','c'] should be different than "abc" (other than the hidden null character, which is irrelevant here). Perhaps the fact that using a string rather than an array causes code to fail should be addressed? > > What the PR does is fix hexString so that hexString!"deadbeef" rewrites it to the string literal "\xde\xad\xbe\xef". It's classic "lowering". Isn't it amazing that D can even do this? It's great that D has this power, and would be really useful if D's language didn't already have a way to do this in a builtin way. > Simplifying the compiler and pushing things off into the library makes the compiler and spec smaller and less buggy. It also has the nice feature of providing a simple path for anyone who wants to write their own custom string syntax, such as EBCDIC string literals (!). How can this be a huge simplification? I mean you already have code that parses hex characters in a string array, all you need is one flag that assumes all character pairs have been preceded by \x. I think this will save probably 4 or 5 lines of code? It also doesn't preclude at all someone writing library code to make their own custom string syntax. -Steve |
February 08, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ralph Doncaster | On 2/8/18 1:42 PM, Ralph Doncaster wrote:
> On Thursday, 8 February 2018 at 18:31:06 UTC, Walter Bright wrote:
>> On 2/8/2018 5:26 AM, Steven Schveighoffer wrote:
>>> The extra data in the object file comes from the inclusion of the hexStringImpl function, and from the template parameter (the symbol _D3std4conv__T9hexStringVAyaa8_6465616462656566ZQBiyAa is in there as well, which will always be larger than the actual string passed to hexString).
>>>
>>> I also see the data in there twice for some reason.
>>
>> This is no longer the case with the PR.
>>
>> import std.conv;
>>
>> void test() {
>> __gshared immutable char[4] s = hexString!"deadbeef";
>> }
>>
>> produces the following, with no sign of the template and the data is there only once:
>>
>> _DATA segment
>> _D5test24testFZ1syG4a:
>> db 0ffffffdeh,0ffffffadh,0ffffffbeh,0ffffffefh ;....
>> _DATA ends
>
> But it looks like they are all dchar, so 4x the space vs x"deadbeef"?
I was looking at that too when I was testing the differences, but actually, it's the same when you use x"deadbeef".
I wonder if it's an issue with how obj2asm prints it out? Surely, that data array must be contiguous, and they must be bytes. Otherwise the resulting code would be wrong.
-Steve
|
February 08, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Thursday, 8 February 2018 at 18:49:51 UTC, Steven Schveighoffer wrote:
> I wonder if it's an issue with how obj2asm prints it out? Surely, that data array must be contiguous, and they must be bytes. Otherwise the resulting code would be wrong.
OK. I didn't even know about obj2asm until you mentioned it. objdump seems to work perfectly fine on the .o's that dmd generates, and I can tell that x"deadbeef" generates 4 contiguous bytes (objdump -D):
Disassembly of section .rodata.str1.1:
0000000000000000 <_TMP0>:
0: de .byte 0xde
1: ad lods %ds:(%rsi),%eax
2: be .byte 0xbe
3: ef out %eax,(%dx)
...
|
February 08, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 2/8/2018 10:49 AM, Steven Schveighoffer wrote: > On 2/8/18 1:42 PM, Ralph Doncaster wrote: >> On Thursday, 8 February 2018 at 18:31:06 UTC, Walter Bright wrote: >>> db 0ffffffdeh,0ffffffadh,0ffffffbeh,0ffffffefh ;.... >> But it looks like they are all dchar, so 4x the space vs x"deadbeef"? The 'db' means 'define byte'. dw for words, dd for 32 bit words. > I was looking at that too when I was testing the differences, but actually, it's the same when you use x"deadbeef". Yes. > I wonder if it's an issue with how obj2asm prints it out? Surely, that data array must be contiguous, and they must be bytes. Otherwise the resulting code would be wrong. Yes. I just never bothered to fix it. |
February 08, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 2/8/2018 10:42 AM, Steven Schveighoffer wrote: > On 2/8/18 1:25 PM, Walter Bright wrote: > "abc" is an array (it's an immutable(char)[]). There's no reason why ['a','b','c'] should be different than "abc" (other than the hidden null character, which is irrelevant here). ['a','b','c'] is mutable, a string literal is immutable. > Perhaps the fact that using a string rather than an array causes code to fail should be addressed? That would be a language change proposal or bug report. By all means, please do so. > How can this be a huge simplification? I mean you already have code that parses hex characters in a string array, all you need is one flag that assumes all character pairs have been preceded by \x. I think this will save probably 4 or 5 lines of code? hexStringConstant() was 79 lines of code, not including comments and blank lines. I also showed how: x"deadbeef" can be replaced with: hexString!"deadbeef" with no overhead. If you hate typing hexString, you can always write: alias x = hexstring; and then you've got: x"deadbeef" x!"deadbeef" which seems an inconsequential difference. (The generated code is the same.) > It also doesn't preclude at all someone writing library code to make their own custom string syntax. You're right it doesn't. But people don't do it, because it is neither obvious that D can do such a thing (it relies on a combination of features) nor is it obvious how to do it correctly (as the earlier hexString implementation shows and nobody seemed able to fix it but me). What Phobos provides is working, professional quality code that should serve as a user resource for "how to do things and how to do them right". I.e. having hexString as a library function is a good advertisement for what D can do. After all, how many languages can do this sort of thing? |
February 11, 2018 Re: missing HexString documentation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 2/8/18 3:49 PM, Walter Bright wrote: > On 2/8/2018 10:42 AM, Steven Schveighoffer wrote: >> On 2/8/18 1:25 PM, Walter Bright wrote: >> "abc" is an array (it's an immutable(char)[]). There's no reason why ['a','b','c'] should be different than "abc" (other than the hidden null character, which is irrelevant here). > > ['a','b','c'] is mutable, a string literal is immutable. OK. alias IC = immutable char; ubyte[3] x = [IC('a'), IC('b'), IC('c')]; works just fine. > > >> Perhaps the fact that using a string rather than an array causes code to fail should be addressed? > > That would be a language change proposal or bug report. By all means, please do so. https://issues.dlang.org/show_bug.cgi?id=18420 >> How can this be a huge simplification? I mean you already have code that parses hex characters in a string array, all you need is one flag that assumes all character pairs have been preceded by \x. I think this will save probably 4 or 5 lines of code? > > hexStringConstant() was 79 lines of code, not including comments and blank lines. My mistake, I assumed the code to parse hex digits would be reused between both string parsing with \x escapes and the hex string parser. I also notice that hex strings are not simply equivalent to strings with \x in them -- the latter is more constrained, as it must be a pair of hex digits per \x. hex strings allow spaces between them. > I also showed how: > > x"deadbeef" > > can be replaced with: > > hexString!"deadbeef" > > with no overhead. I wouldn't call invoking CTFE "no overhead" I tested it out, and generating a hex string of about 600 bytes took 3x as long as using builtin hex strings. > If you hate typing hexString, you can always write: > > alias x = hexstring; > > and then you've got: > > x"deadbeef" > x!"deadbeef" > > which seems an inconsequential difference. (The generated code is the same.) Again, this is about the compile time penalty. >> It also doesn't preclude at all someone writing library code to make their own custom string syntax. > > You're right it doesn't. But people don't do it, because it is neither obvious that D can do such a thing (it relies on a combination of features) This isn't really about having hexString in phobos, I think it's fine to have it, even if it's redundant, since it can be more customized than a builtin language feature. All I was saying is that the language feature and the library function are not mutually exclusive. > nor is it obvious how to do it correctly (as the earlier hexString implementation shows and nobody seemed able to fix it but me). Well, nobody asked :) Besides, it's still not "fixed", as it has the same poor performance as the previous version. And the new version has broken existing code. What the update shows is that you have to jump through incredible hoops to get the compiler not to include your compile-time only generation code in the resulting binary. > What Phobos provides is working, professional quality code that should serve as a user resource for "how to do things and how to do them right". It worked before, pretty much at the same performance as it does now, the mitigating features (using string literals instead of array literals, splitting the implementation into hand written functions to avoid the D template penalty) are a good demonstration at how much work we have to do still on the CTFE front. > I.e. having hexString as a library function is a good advertisement for what D can do. After all, how many languages can do this sort of thing? And nothing has changed here, it's still a library function, as it was before. I agree, it's great to have the ability to do library functions that can do compiler features. But if you already have the compiler feature, I don't see why we should remove it because a slower library version exists. If we did not have the feature in the language, and we were talking about adding it, I'd totally be on the other side. In fact, it's a motivating factor to make CTFE code compile faster as it takes away arguments of adding more things to the compiler. -Steve |
Copyright © 1999-2021 by the D Language Foundation