Thread overview
FormattedRead hex string
Sep 24, 2012
Jason Spencer
Sep 24, 2012
monarch_dodra
Sep 24, 2012
Jason Spencer
Sep 25, 2012
monarch_dodra
September 24, 2012
I imagine there's a slick way to do this, but I'm not seeing it.

I have a string of hex digits which I'd like to convert to an array of 8 ubytes:

0123456789abcdef --> [0x01, 0x23, 0x45, 0x67, 0x89, 0xAB, 0xCD, 0xEF]

I'm looking at std.format.formattedRead, but the documentation is...lightish.  First of all, it seems there's no format specifier except %s on reads and type information is gleaned from the args' types.  I was able to experiment and show that %x works, but no documentation on exactly how.

Second, array syntax seems to work only if there's some delimiter.  With:

void main(string[] args)
{
   ubyte[8] b;

   formattedRead(args[1], "%(%s%)", &b);
}

I get

std.conv.ConvOverflowException@C:\Tools\D\dmd2\windows\bin\..\..\src\phobos\std\
conv.d(2006): Overflow in integral conversion

at least once. :)  But that makes sense--hard to tell how many input chars to assign to one byte versus another (although it seems to me a hungry algorithm would work--saturate one type's max and move to the next.)

There doesn't seem to be any support for field sizes or counts in formatted read, similar to old C "%16x".  This barks at me right away--"%1 not supported."

I know I could read (in this case) as two longs or a uint16, but I don't want to deal with endianess--just data.

Is there some trick to use the fact that b is fixed size 8 bytes and know that requires 16 hex digits and converts automatically?  Is there some other suggestion for how to do this eloquently?  I can play around with split and join, but it seemed like there is probably some way to do this directly that I'm  not seeing.

Thanks!
Jason
September 24, 2012
On Monday, 24 September 2012 at 15:05:54 UTC, Jason Spencer wrote:
> I imagine there's a slick way to do this, but I'm not seeing it.
>
> I have a string of hex digits which I'd like to convert to an array of 8 ubytes:
>
> 0123456789abcdef --> [0x01, 0x23, 0x45, 0x67, 0x89, 0xAB, 0xCD, 0xEF]
>
> I'm looking at std.format.formattedRead, but the documentation is...lightish.  First of all, it seems there's no format specifier except %s on reads and type information is gleaned from the args' types.  I was able to experiment and show that %x works, but no documentation on exactly how.
>
> Second, array syntax seems to work only if there's some delimiter.  With:
>
> void main(string[] args)
> {
>    ubyte[8] b;
>
>    formattedRead(args[1], "%(%s%)", &b);
> }
>
> I get
>
> std.conv.ConvOverflowException@C:\Tools\D\dmd2\windows\bin\..\..\src\phobos\std\
> conv.d(2006): Overflow in integral conversion
>
> at least once. :)  But that makes sense--hard to tell how many input chars to assign to one byte versus another (although it seems to me a hungry algorithm would work--saturate one type's max and move to the next.)
>
> There doesn't seem to be any support for field sizes or counts in formatted read, similar to old C "%16x".  This barks at me right away--"%1 not supported."
>
> I know I could read (in this case) as two longs or a uint16, but I don't want to deal with endianess--just data.
>
> Is there some trick to use the fact that b is fixed size 8 bytes and know that requires 16 hex digits and converts automatically?  Is there some other suggestion for how to do this eloquently?  I can play around with split and join, but it seemed like there is probably some way to do this directly that I'm  not seeing.
>
> Thanks!
> Jason

I think that you are not supposed to use a static array: If there are not EXACTLY as many array elements as there are parse-able elements, then the formatted read will consider the parse to have failed.

Try this, it's what you want, right?

--------
void main()
{
    string s = "ffff fff ff f";
    ushort[] vals;
    formattedRead(s, "%(%x %)", &vals);
    writefln("%(%s - %)", vals);
}
--------
65535 - 4095 - 255 - 15
--------

Regarding the %1x, well, I guess it just isn't supported (yet?)
September 24, 2012
On Monday, 24 September 2012 at 16:32:45 UTC, monarch_dodra wrote:
> On Monday, 24 September 2012 at 15:05:54 UTC, Jason Spencer wrote:
>> I imagine there's a slick way to do this, but I'm not seeing it.
>>
>> I have a string of hex digits which I'd like to convert to an array of 8 ubytes:
>>
>> 0123456789abcdef --> [0x01, 0x23, 0x45, 0x67, 0x89, 0xAB, 0xCD, 0xEF]
<snip>
>> void main(string[] args)
>> {
>>   ubyte[8] b;
>>
>>   formattedRead(args[1], "%(%s%)", &b);
>> }


> I think that you are not supposed to use a static array: If there are not EXACTLY as many array elements as there are parse-able elements, then the formatted read will consider the parse to have failed.

The sample code was just for testing convenience.  In practice the string will be conditioned and known to have 16 characters in {0-9, a-f}.

>
> Try this, it's what you want, right?
>
> --------
> void main()
> {
>     string s = "ffff fff ff f";
>     ushort[] vals;
>     formattedRead(s, "%(%x %)", &vals);
>     writefln("%(%s - %)", vals);
> }

Not quite.  You've taken the liberty of using a delimiter--spaces.  I have to take 16 contiguous, NON-delimited hex digits and produce 8 bytes.  So I could read it as a uint64 (not uint16, as I mistakenly posted before), but then I'd have to byte-reverse it.  I could use slicing and do a byte at a time.  I just wondered if there were a slick way to get in-place data from a contiguous hex string.

Thanks,
Jason
September 25, 2012
On Monday, 24 September 2012 at 22:38:59 UTC, Jason Spencer wrote:
> On Monday, 24 September 2012 at 16:32:45 UTC, monarch_dodra wrote:
>> On Monday, 24 September 2012 at 15:05:54 UTC, Jason Spencer wrote:
>>> I imagine there's a slick way to do this, but I'm not seeing it.
>>>
>>> I have a string of hex digits which I'd like to convert to an array of 8 ubytes:
>>>
>>> 0123456789abcdef --> [0x01, 0x23, 0x45, 0x67, 0x89, 0xAB, 0xCD, 0xEF]
> <snip>
>>> void main(string[] args)
>>> {
>>>  ubyte[8] b;
>>>
>>>  formattedRead(args[1], "%(%s%)", &b);
>>> }
>
>
>> I think that you are not supposed to use a static array: If there are not EXACTLY as many array elements as there are parse-able elements, then the formatted read will consider the parse to have failed.
>
> The sample code was just for testing convenience.  In practice the string will be conditioned and known to have 16 characters in {0-9, a-f}.
>
>>
>> Try this, it's what you want, right?
>>
>> --------
>> void main()
>> {
>>    string s = "ffff fff ff f";
>>    ushort[] vals;
>>    formattedRead(s, "%(%x %)", &vals);
>>    writefln("%(%s - %)", vals);
>> }
>
> Not quite.  You've taken the liberty of using a delimiter--spaces.  I have to take 16 contiguous, NON-delimited hex digits and produce 8 bytes.  So I could read it as a uint64 (not uint16, as I mistakenly posted before), but then I'd have to byte-reverse it.  I could use slicing and do a byte at a time.  I just wondered if there were a slick way to get in-place data from a contiguous hex string.
>
> Thanks,
> Jason

I am unsure if the non-support of %2x is by design, or just "not yet supported".

Keep in mind that slicing a string *is* inplace. It is equivalent to pointer arithmetic. I'd just do a loop:

--------
void main()
{
    string s = "0123456789abcdef";
    ushort[8] vals;
    foreach(size_t i; 0..8)
    {
        string slice = s[2*i .. 2*(i+1)];
        slice.formattedRead("%x", &vals[i]);
    }
    writeln(vals);
}
--------
[1, 35, 69, 103, 137, 171, 205, 239]
--------
Will still get the job done pretty cleanly and efficiently.

Chances are this is even faster and more efficient than a supposed "%(%2x%)" scheme, since you are lowering the complexity from a list of reads to a simple extract data.

Alternatively, you could just use conv's "to" or "parse". I've had others argue that "ForamttedRead" is meant as an implementation detail, and should be used by other functions, but "consumers" shouldn't use it directly.

I found this strange at first, but I've grown fond of the power of "to":

--------
import std.conv, std.stdio;

void main()
{
    string s = "0123456789abcdef";
    ushort[8] vals;
    foreach(size_t i; 0..8)
        vals[i] = s[2*i .. 2*(i+1)].to!ushort(16);
    writeln(vals);
}
--------

Pretty nice, no?