Regarding hex strings - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Regarding hex strings

Thread overview

Regarding hex strings
Oct 18, 2012 bearophile
Oct 18, 2012 H. S. Teoh
Oct 18, 2012 foobar
Oct 18, 2012 monarch_dodra
Oct 18, 2012 foobar
Oct 18, 2012 bearophile
Oct 18, 2012 foobar
Oct 18, 2012 foobar
Oct 18, 2012 monarch_dodra
Oct 18, 2012 monarch_dodra
Oct 18, 2012 monarch_dodra
Oct 18, 2012 monarch_dodra
Oct 18, 2012 bearophile
Oct 18, 2012 monarch_dodra
Oct 19, 2012 Marco Leise
Oct 19, 2012 Jonathan M Davis
Oct 19, 2012 Marco Leise
Oct 19, 2012 Jonathan M Davis
Oct 20, 2012 monarch_dodra
Oct 19, 2012 Nick Sabalausky
Oct 19, 2012 foobar
Oct 20, 2012 Nick Sabalausky
Oct 18, 2012 Kagamin
Oct 18, 2012 Jonathan M Davis
Oct 18, 2012 Kagamin
Oct 18, 2012 Jonathan M Davis
Oct 18, 2012 Don Clugston
Oct 18, 2012 foobar
Oct 19, 2012 Don Clugston
Oct 19, 2012 foobar
Oct 19, 2012 Don Clugston
Oct 19, 2012 foobar
Oct 19, 2012 foobar
Oct 20, 2012 Nick Sabalausky
Oct 20, 2012 Denis Shelomovskij
Oct 20, 2012 foobar
Oct 20, 2012 Nick Sabalausky
Oct 20, 2012 H. S. Teoh
Oct 20, 2012 foobar
Oct 20, 2012 foobar
Oct 20, 2012 Nick Sabalausky
Oct 22, 2012 Dejan Lekic
Oct 22, 2012 H. S. Teoh
Oct 18, 2012 Nick Sabalausky
Oct 19, 2012 bearophile
Oct 18, 2012 monarch_dodra
Oct 22, 2012 Dejan Lekic
Oct 22, 2012 Simen Kjaeraas

October 18, 2012

Regarding hex strings

Posted by bearophile

bearophile

(Repost)

hex strings are useful, but I think they were invented in D1 when strings were convertible to char[]. But today they are an array of immutable UFT-8, so I think this default type is not so useful:

void main() {
    string data1 = x"A1 B2 C3 D4"; // OK
    immutable(ubyte)[] data2 = x"A1 B2 C3 D4"; // error
}


test.d(3): Error: cannot implicitly convert expression ("\xa1\xb2\xc3\xd4") of type string to ubyte[]


Generally I want to use hex strings to put binary data in a program, so usually it's a ubyte[] or uint[].

So I have to use something like:

auto data3 = cast(ubyte[])(x"A1 B2 C3 D4".dup);


So maybe the following literals are more useful in D2:

ubyte[] data4 = x[A1 B2 C3 D4];
uint[]  data5 = x[A1 B2 C3 D4];
ulong[] data6 = x[A1 B2 C3 D4 A1 B2 C3 D4];

Bye,
bearophile

October 18, 2012

Re: Regarding hex strings

Posted by H. S. Teoh
in reply to bearophile

H. S. Teoh

Posted in reply to bearophile

On Thu, Oct 18, 2012 at 02:45:10AM +0200, bearophile wrote: [...]
> hex strings are useful, but I think they were invented in D1 when strings were convertible to char[]. But today they are an array of immutable UFT-8, so I think this default type is not so useful:
> 
> void main() {
>     string data1 = x"A1 B2 C3 D4"; // OK
>     immutable(ubyte)[] data2 = x"A1 B2 C3 D4"; // error
> }
> 
> 
> test.d(3): Error: cannot implicitly convert expression
> ("\xa1\xb2\xc3\xd4") of type string to ubyte[]
[...]

Yeah I think hex strings would be better as ubyte[] by default.

More generally, though, I think *both* of the above lines should be equally accepted.  If you write x"A1 B2 C3" in the context of initializing a string, then the compiler should infer the type of the literal as string, and if the same literal occurs in the context of, say, passing a ubyte[], then its type should be inferred as ubyte[], NOT string.

T

-- 
Who told you to swim in Crocodile Lake without life insurance??

October 18, 2012

Re: Regarding hex strings

Posted by foobar
in reply to H. S. Teoh

foobar

Posted in reply to H. S. Teoh

On Thursday, 18 October 2012 at 02:47:42 UTC, H. S. Teoh wrote:
> On Thu, Oct 18, 2012 at 02:45:10AM +0200, bearophile wrote:
> [...]
>> hex strings are useful, but I think they were invented in D1 when
>> strings were convertible to char[]. But today they are an array of
>> immutable UFT-8, so I think this default type is not so useful:
>> 
>> void main() {
>>     string data1 = x"A1 B2 C3 D4"; // OK
>>     immutable(ubyte)[] data2 = x"A1 B2 C3 D4"; // error
>> }
>> 
>> 
>> test.d(3): Error: cannot implicitly convert expression
>> ("\xa1\xb2\xc3\xd4") of type string to ubyte[]
> [...]
>
> Yeah I think hex strings would be better as ubyte[] by default.
>
> More generally, though, I think *both* of the above lines should be
> equally accepted.  If you write x"A1 B2 C3" in the context of
> initializing a string, then the compiler should infer the type of the
> literal as string, and if the same literal occurs in the context of,
> say, passing a ubyte[], then its type should be inferred as ubyte[], NOT
> string.
>
>
> T

IMO, this is a redundant feature that complicates the language for no benefit and should be deprecated.
strings already have an escape sequence for specifying code-points "\u" and for ubyte arrays you can simply use:
immutable(ubyte)[] data2 = [0xA1 0xB2 0xC3 0xD4];

So basically this feature gains us nothing.

October 18, 2012

Re: Regarding hex strings

Posted by monarch_dodra
in reply to foobar

monarch_dodra

Posted in reply to foobar

On Thursday, 18 October 2012 at 08:58:57 UTC, foobar wrote:
>
> IMO, this is a redundant feature that complicates the language for no benefit and should be deprecated.
> strings already have an escape sequence for specifying code-points "\u" and for ubyte arrays you can simply use:
> immutable(ubyte)[] data2 = [0xA1 0xB2 0xC3 0xD4];
>
> So basically this feature gains us nothing.

Have you actually ever written code that requires using code points? This feature is a *huge* convenience for when you do. Just compare:

string nihongo1 = x"e697a5 e69cac e8aa9e";
string nihongo2 = "\ue697a5\ue69cac\ue8aa9e";
ubyte[] nihongo3 = [0xe6, 0x97, 0xa5, 0xe6, 0x9c, 0xac, 0xe8, 0xaa, 0x9e];

BTW, your data2 doesn't compile.

October 18, 2012

Re: Regarding hex strings

Posted by monarch_dodra
in reply to bearophile

monarch_dodra

Posted in reply to bearophile

On Thursday, 18 October 2012 at 00:45:12 UTC, bearophile wrote:
> (Repost)
>
> hex strings are useful, but I think they were invented in D1 when strings were convertible to char[]. But today they are an array of immutable UFT-8, so I think this default type is not so useful:
>
> void main() {
>     string data1 = x"A1 B2 C3 D4"; // OK
>     immutable(ubyte)[] data2 = x"A1 B2 C3 D4"; // error
> }
>
>
> test.d(3): Error: cannot implicitly convert expression ("\xa1\xb2\xc3\xd4") of type string to ubyte[]
>
> [SNIP]
>
> Bye,
> bearophile

The conversion can't be done *implicitly*, but you can still get your code to compile:

//----
void main() {
    immutable(ubyte)[] data2 =
        cast(immutable(ubyte)[]) x"A1 B2 C3 D4"; // OK!
}
//----

It's a bit ugly, and I agree it should work natively, but it is a workaround.

October 18, 2012

Re: Regarding hex strings

Posted by foobar
in reply to monarch_dodra

foobar

Posted in reply to monarch_dodra

On Thursday, 18 October 2012 at 09:42:43 UTC, monarch_dodra wrote:
> On Thursday, 18 October 2012 at 08:58:57 UTC, foobar wrote:
>>
>> IMO, this is a redundant feature that complicates the language for no benefit and should be deprecated.
>> strings already have an escape sequence for specifying code-points "\u" and for ubyte arrays you can simply use:
>> immutable(ubyte)[] data2 = [0xA1 0xB2 0xC3 0xD4];
>>
>> So basically this feature gains us nothing.
>
> Have you actually ever written code that requires using code points? This feature is a *huge* convenience for when you do. Just compare:
>
> string nihongo1 = x"e697a5 e69cac e8aa9e";
> string nihongo2 = "\ue697a5\ue69cac\ue8aa9e";
> ubyte[] nihongo3 = [0xe6, 0x97, 0xa5, 0xe6, 0x9c, 0xac, 0xe8, 0xaa, 0x9e];
>
> BTW, your data2 doesn't compile.

I didn't try to compile it :) I just rewrote berophile's example with 0x prefixes.

How often do you actually need to write code-point _literals_ in your code?
I'm not arguing that it isn't convenient. My question would be rather Anderi's "does it pull it's own weight?" meaning does the added complexity in the language and having more than one way for doing something worth that convenience?

Seems to me this is in the same ballpark as the built-in complex numbers. Sure it's nice to be able to write "4+5i" instead of "complex(4,5)" but how frequently do you actually ever need the _literals_ even in complex computational heavy code?

October 18, 2012

Re: Regarding hex strings

Posted by bearophile
in reply to foobar

bearophile

Posted in reply to foobar

The docs say:
http://dlang.org/lex.html

>Hex strings allow string literals to be created using hex data. The hex data need not form valid UTF characters.<

But this code:


void main() {
    immutable ubyte[4] data = x"F9 04 C1 E2";
}



Gives me:

temp.d(2): Error: Outside Unicode code space

Are the docs correct?

--------------------------

foobar:

> Seems to me this is in the same ballpark as the built-in complex numbers. Sure it's nice to be able to write "4+5i" instead of "complex(4,5)" but how frequently do you actually ever need the _literals_ even in complex computational heavy code?

Compared to "oct!5151151511", one problem with code like this is that binary blobs are sometimes large, so supporting a x"" syntax is better:

immutable ubyte[4] data = hex!"F9 04 C1 E2";

Bye,
bearophile

October 18, 2012

Re: Regarding hex strings

Posted by foobar
in reply to bearophile

foobar

Posted in reply to bearophile

On Thursday, 18 October 2012 at 10:05:06 UTC, bearophile wrote:
> The docs say:
> http://dlang.org/lex.html
>
>>Hex strings allow string literals to be created using hex data. The hex data need not form valid UTF characters.<
>
> But this code:
>
>
> void main() {
>     immutable ubyte[4] data = x"F9 04 C1 E2";
> }
>
>
>
> Gives me:
>
> temp.d(2): Error: Outside Unicode code space
>
> Are the docs correct?
>
> --------------------------
>
> foobar:
>
>> Seems to me this is in the same ballpark as the built-in complex numbers. Sure it's nice to be able to write "4+5i" instead of "complex(4,5)" but how frequently do you actually ever need the _literals_ even in complex computational heavy code?
>
> Compared to "oct!5151151511", one problem with code like this is that binary blobs are sometimes large, so supporting a x"" syntax is better:
>
> immutable ubyte[4] data = hex!"F9 04 C1 E2";
>
> Bye,
> bearophile

How often large binary blobs are literally spelled in the source code (as opposed to just being read from a file)?
In any case, I'm not opposed to such a utility library, in fact I think it's a rather good idea and we already have a precedent with "oct!"
I just don't think this belongs as a built-in feature in the language.

October 18, 2012

Re: Regarding hex strings

Posted by foobar
in reply to foobar

foobar

Posted in reply to foobar

On Thursday, 18 October 2012 at 10:11:14 UTC, foobar wrote:
> On Thursday, 18 October 2012 at 10:05:06 UTC, bearophile wrote:
>> The docs say:
>> http://dlang.org/lex.html
>>
>>>Hex strings allow string literals to be created using hex data. The hex data need not form valid UTF characters.<
>>

This is especially a good reason to remove this feature as it breaks the principle of least surprise and I consider it a major bug, not a feature.

I expect D's strings which are by definition Unicode to _only_ ever allow _valid_ Unicode. It makes no sense what so ever to allow this nasty back-door. Other text encoding should be either stored and treated as binary data (ubyte[]) or better yet stored in their own types that will ensure those encodings' invariants.

October 18, 2012

Re: Regarding hex strings

Posted by monarch_dodra
in reply to foobar

monarch_dodra

Posted in reply to foobar

On Thursday, 18 October 2012 at 10:17:06 UTC, foobar wrote:
> On Thursday, 18 October 2012 at 10:11:14 UTC, foobar wrote:
>> On Thursday, 18 October 2012 at 10:05:06 UTC, bearophile wrote:
>>> The docs say:
>>> http://dlang.org/lex.html
>>>
>>>>Hex strings allow string literals to be created using hex data. The hex data need not form valid UTF characters.<
>>>
>
> This is especially a good reason to remove this feature as it breaks the principle of least surprise and I consider it a major bug, not a feature.
>
> I expect D's strings which are by definition Unicode to _only_ ever allow _valid_ Unicode. It makes no sense what so ever to allow this nasty back-door. Other text encoding should be either stored and treated as binary data (ubyte[]) or better yet stored in their own types that will ensure those encodings' invariants.

Yeah, that makes sense too. I'll try to toy around on my end and see if I can write an "hex".

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation