4-character literal (page 2)

Rick Mann wrote: > torhu Wrote: > >> Try this. >> >> template MAKE_ID(char[] s) >> { >> static assert(s.length == 4); >> const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3]; >> } >> >> enum : uint >> { >> kSomeConstantValue = MAKE_ID!("abcd") >> } > > Of the solutions proposed so far, this is probably the cleanest. Thanks! > > Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language? > If there is a benevolent force watching over us, there will never be four-character literals in D X_X - Gregor Richards

Rick Mann wrote: > Any suggestions that don't involve significant re-writing of the 4-character literals? Thanks! Nope, I've rewritten mine as hex. At least it doesn't involve template voodoo. i.e. the filter that does the initial C to D translation also converts the 4-char constants to 32-bit constants. Since D is not source code compatible anyway, one might as well convert it once and for all IMHO. Ditto goes for those pesky assert macros strewn out all over the place*. Yes I am talking to you, /usr/include/AssertMacros.h. "2002's finest" (see "Living In an Exceptional World by Sean Parent" for the details: http://developer.apple.com/dev/techsupport/develop/issue11toc.shtml) Reinventing the preprocessor using templates is not my own favorite... --anders * If you see "require_noerr", that's the ONERRORGOTO I'm talking about. I just expanded it to the macro definition rather than leaving it in.

Gregor Richards wrote: >> Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language? > > If there is a benevolent force watching over us, there will never be four-character literals in D X_X Which is why it is much easier to write the GUI interface code in C++ with an extern "C" interface, than porting it over to D. --anders

Rick Mann wrote: > Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language? In the process of learning to scan C (for a compiler theory class), I first heard about those. Seems not-too-useful. If you want a number, input the number; if you want a Unicode character, enter L'那' (or whatever the D equivalent is). Entering numbers in base 256 is asking for trouble, especially with UTF-8 source. --Joel

Joel C. Salomon Wrote: > Rick Mann wrote: > > Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language? > > In the process of learning to scan C (for a compiler theory class), I first heard about those. Seems not-too-useful. If you want a number, input the number; if you want a Unicode character, enter L'é‚£' (or whatever the D equivalent is). Entering numbers in base 256 is asking for trouble, especially with UTF-8 source. I gotta say, I think they're very useful. Multibyte-character issues aside, it can be a lot handier to see 'abcd' in a debugger than '61626364'. And handling multibyte characters isn't that big a deal...just include all the bytes. If the integer interpretation is 4 bytes, treat it as an uint. If it's more, treat it as a ulong, and issue appropriate warnings/errors when assigning. Pad the values out with zeros. I'm not sure I understand the resistance to the notion.

January 26, 2007

Re: 4-character literal

Posted by Bill Baxter
in reply to Rick Mann

Permalink

Bill Baxter

Posted in reply to Rick Mann

Permalink

Rick Mann wrote:
> Joel C. Salomon Wrote:
> 
>> Rick Mann wrote:
>>> Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?
>> In the process of learning to scan C (for a compiler theory class), I first heard about those.  Seems not-too-useful.  If you want a number, input the number; if you want a Unicode character, enter L'é‚£' (or whatever the D equivalent is).  Entering numbers in base 256 is asking for trouble, especially with UTF-8 source.
> 
> I gotta say, I think they're very useful. Multibyte-character issues aside, it can be a lot handier to see 'abcd' in a debugger than '61626364'. And handling multibyte characters isn't that big a deal...just include all the bytes. If the integer interpretation is 4 bytes, treat it as an uint. If it's more, treat it as a ulong, and issue appropriate warnings/errors when assigning. Pad the values out with zeros. I'm not sure I understand the resistance to the notion.
> 

Probably it's just that most folks rarely ever have a need for such a thing.  And in the rare case that we do, the template solution doesn't seem so bad.

Besides isn't the value of a multi-character literal going to be dependent on the endianness of the machine you're running on?
So you're probably going to want to use it inside a version(LittleEndian) {} else {} construct anyway. Might as well tuck that away inside the MAKE_ID template.

--bb

Rick Mann wrote: > Joel C. Salomon Wrote: >> Entering numbers in base 256 is asking for trouble, especially with UTF-8 source. > > I gotta say, I think they're very useful. Multibyte-character issues aside, it can be a lot handier to see 'abcd' in a debugger than '61626364'. And handling multibyte characters isn't that big a deal...just include all the bytes. If the integer interpretation is 4 bytes, treat it as an uint. If it's more, treat it as a ulong, and issue appropriate warnings/errors when assigning. Pad the values out with zeros. I'm not sure I understand the resistance to the notion. Generally, though, arbitrary four bytes with the high bit set will constitute an invalid UTF-8 sequence. Assuming you have the number 61626364 in an int32 somewhere, will 'abcd' really tell you something you wanted to know about the number? (Unless you’re dereferencing a char* cast to int*, in which case you deserve all the hassle the debugger can throw at you. ☺) --Joel

Bill Baxter wrote: > Besides isn't the value of a multi-character literal going to be dependent on the endianness of the machine you're running on? Not really, it just gets flipped when _stored_ on a LittleEndian... i.e. the char const 'ABCD' is the same as the hex const 0x41424344 In arch i386 this would read 44434241 but in arch ppc it's 41424344. That is, if you were to store it somewhere or look at the objectfile. --anders

Anders F Björklund wrote: > Bill Baxter wrote: > >> Besides isn't the value of a multi-character literal going to be dependent on the endianness of the machine you're running on? > > Not really, it just gets flipped when _stored_ on a LittleEndian... > i.e. the char const 'ABCD' is the same as the hex const 0x41424344 > > In arch i386 this would read 44434241 but in arch ppc it's 41424344. > That is, if you were to store it somewhere or look at the objectfile. > > --anders Yeh, ok. I'm thinking of the case where you read in a 4-byte uint signature from a file. If you load it in as a uint, you have to watch out for the endianness of the file vs that of the platform you're running on. Or just compare as a sequence of chars rather than uint. --bb

Forums