Regarding hex strings (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Regarding hex strings (page 3)

October 18, 2012

Re: Regarding hex strings

Posted by Jonathan M Davis
in reply to Kagamin

Jonathan M Davis

Posted in reply to Kagamin

On Thursday, October 18, 2012 21:09:14 Kagamin wrote:
> Your keyboard doesn't have ready unicode values for all characters either.

So? That doesn't make it so that it's not valuable to be able to input the values in hexidecimal instead of as actual unicode characters. Heck, if you want a specific character, I wouldn't trust copying the characters anyway, because it's far too easy to have two characters which look really similar but are different (e.g. there are multiple types of angle brackets in unicode), whereas with the numbers you can be sure. And with some characters (e.g. unicode whitespace characters), it generally doesn't make sense to enter the characters directly.

Regardless, my point is that both approaches can be useful, so it's good to be able to do both. If you prefer to put the unicode characters in directly, then do that, but others may prefer the other way. Personally, I've done both.

- Jonathan M Davis

October 18, 2012

Re: Regarding hex strings

Posted by Nick Sabalausky
in reply to H. S. Teoh

Nick Sabalausky

Posted in reply to H. S. Teoh

On Wed, 17 Oct 2012 19:49:43 -0700
"H. S. Teoh" <hsteoh@quickfur.ath.cx> wrote:

> On Thu, Oct 18, 2012 at 02:45:10AM +0200, bearophile wrote: [...]
> > hex strings are useful, but I think they were invented in D1 when strings were convertible to char[]. But today they are an array of immutable UFT-8, so I think this default type is not so useful:
> > 
> > void main() {
> >     string data1 = x"A1 B2 C3 D4"; // OK
> >     immutable(ubyte)[] data2 = x"A1 B2 C3 D4"; // error
> > }
> > 
> > 
> > test.d(3): Error: cannot implicitly convert expression
> > ("\xa1\xb2\xc3\xd4") of type string to ubyte[]
> [...]
> 
> Yeah I think hex strings would be better as ubyte[] by default.
> 
> More generally, though, I think *both* of the above lines should be equally accepted.  If you write x"A1 B2 C3" in the context of initializing a string, then the compiler should infer the type of the literal as string, and if the same literal occurs in the context of, say, passing a ubyte[], then its type should be inferred as ubyte[], NOT string.
> 

Big +1

Having the language expect x"..." to always be a string (let alone a *valid UTF* string) is just insane. It's just too damn useful for arbitrary binary data.

October 19, 2012

Re: Regarding hex strings

Posted by bearophile
in reply to Nick Sabalausky

bearophile

Posted in reply to Nick Sabalausky

Nick Sabalausky:

> Big +1
>
> Having the language expect x"..." to always be a string (let alone a *valid UTF* string) is just insane. It's just too
> damn useful for arbitrary binary data.

I'd like an opinion on such topics from one of the the D bosses :-)

Bye,
bearophile

October 19, 2012

Re: Regarding hex strings

Posted by Nick Sabalausky
in reply to foobar

Nick Sabalausky

Posted in reply to foobar

On Thu, 18 Oct 2012 12:11:13 +0200
"foobar" <foo@bar.com> wrote:
> 
> How often large binary blobs are literally spelled in the source code (as opposed to just being read from a file)?

Frequency isn't the issue. The issues are "*Is* it ever needed?" and "When it is needed, is it useful enough?" The answer to both is most certainly "yes". (Remember, D is supposed to usable as a systems language, it's not merely a high-level-app-only language.)

Keep in mind, the question "Does it pull it's own weight?" is for adding new features, not for going around gutting the language just because we can.

> In any case, I'm not opposed to such a utility library, in fact I
> think it's a rather good idea and we already have a precedent
> with "oct!"
> I just don't think this belongs as a built-in feature in the
> language.

I think monarch_dodra's test proves that it definitely needs to be built-in.

October 19, 2012

Re: Regarding hex strings

Posted by Marco Leise
in reply to monarch_dodra

Marco Leise

Posted in reply to monarch_dodra

Am Thu, 18 Oct 2012 16:31:57 +0200
schrieb "monarch_dodra" <monarchdodra@gmail.com>:

> On Thursday, 18 October 2012 at 13:15:55 UTC, bearophile wrote:
> > monarch_dodra:
> >
> >> hex! was a very good idea actually, imo.
> >
> > It must scale up to "real world" usages. Try it with a program composed of 3 modules each one containing a 100 KB long string. Then try it with a program with two hundred of medium sized literals, and let's see compilation times and binary sizes.
> >
> > Bye,
> > bearophile
> 
> Hum... The compilation is pretty fast actually, about 1 second, provided it doesn't choke.
> 
> It works for strings up to a length of 400 lines @ 80 chars per line, which result to approximately 16K of data. After that, I get a DMD out of memory error.
> 
> DMD memory usage spikes quite quickly. To compile those 400 lines (16K), I use 800MB of memory (!). If I reach about 1GB, then it crashes.
> 
> I tried using a refAppender instead of ret~, but that changed nothing.
> 
> Kind of weird it would use that much memory though...
> 
> Also, the memory doesn't get released. I can parse a 1x400 Line string, but if I try to parse 3 of them, DMD will choke on the second one. :(

Hehe, I assume most of the regulars know this: DMD used to use a garbage collector that is disabled. Memory just isn't freed! Also it has copy on write semantics during CTFE:

int bug6498(int x)
{
    int n = 0;
    while (n < x)
        ++n;
    return n;
}
static assert(bug6498(10_000_000)==10_000_000);

--> Fails with an 'out of memory' error.

http://d.puremagic.com/issues/show_bug.cgi?id=6498

So, as strange as it sounds, for now try not to write often or into large blocks. Using this knowledge I was sometimes able to bring down the memory consumption considerably by caching recurring concatenations of two strings or to!string calls.

That said, appending single elements to an array may actually be better than using a fixed-sized one and have DMD duplicate it on every write. :p

Please remember to give Don a cookie when he manages to change the compiler to modify in-place where appropriate.

-- 
Marco

October 19, 2012

Re: Regarding hex strings

Posted by Jonathan M Davis
in reply to Marco Leise

Jonathan M Davis

Posted in reply to Marco Leise

On Friday, October 19, 2012 05:14:44 Marco Leise wrote:
> Hehe, I assume most of the regulars know this: DMD used to use a garbage collector that is disabled.

Yes, but it didn't use it for long, because it made performance worse, and Walter didn't have the time to spend fixing it, so it was disabled. Presumably, someone will take the time to improve it at some point and then it will be re- enabled.

> Memory just isn't freed!

That was my understanding, but the last time that I said that, Brad Roberts said that it wasn't true, and that we should stop spreading that FUD, so I don't know what the exact situation is, but it sounds like if that was true in the past, it's not true now. Regardless, it's clear that dmd still uses too much memory in many cases, especially when code uses a lot of templates or CTFE.

- Jonathan M Davis

October 19, 2012

Re: Regarding hex strings

Posted by Marco Leise
in reply to Jonathan M Davis

Marco Leise

Posted in reply to Jonathan M Davis

Am Thu, 18 Oct 2012 21:03:01 -0700
schrieb Jonathan M Davis <jmdavisProg@gmx.com>:

> On Friday, October 19, 2012 05:14:44 Marco Leise wrote:
> > Memory just isn't freed!
> 
> That was my understanding, but the last time that I said that, Brad Roberts said that it wasn't true, and that we should stop spreading that FUD, so I don't know what the exact situation is, but it sounds like if that was true in the past, it's not true now. Regardless, it's clear that dmd still uses too much memory in many cases, especially when code uses a lot of templates or CTFE.
> 
> - Jonathan M Davis

He called it a FUD? Without trying to sound too patronizing, most D
programmers would really only notice DMD's memory footprint
when they use CTFE features. It is always Pegged, ctRegex, etc.
that make the issue come up, never basic code. And preloading
the Boehm collector showed that gigabytes of CTFE memory usage
can still be brought down to a few hundred MB [citation
needed]. I guess we can meet somewhere in the middle. Btw. did
I mix up Don and Brad in the last post ? Who is working on the
memory management ?

-- 
Marco

October 19, 2012

Re: Regarding hex strings

Posted by Jonathan M Davis
in reply to Marco Leise

Jonathan M Davis

Posted in reply to Marco Leise

On Friday, October 19, 2012 07:29:46 Marco Leise wrote:
> Am Thu, 18 Oct 2012 21:03:01 -0700
> 
> schrieb Jonathan M Davis <jmdavisProg@gmx.com>:
> > On Friday, October 19, 2012 05:14:44 Marco Leise wrote:
> > > Memory just isn't freed!
> > 
> > That was my understanding, but the last time that I said that, Brad
> > Roberts
> > said that it wasn't true, and that we should stop spreading that FUD, so I
> > don't know what the exact situation is, but it sounds like if that was
> > true in the past, it's not true now. Regardless, it's clear that dmd
> > still uses too much memory in many cases, especially when code uses a lot
> > of templates or CTFE.
> > 
> > - Jonathan M Davis
> 
> He called it a FUD?

I don't think that he used quite that term, but his point was that I shouldn't be saying that, because it wasn't true, and so I was spreading incorrect information (that and the fact that he was tired of people spreading that incorrect information, IIRC). I can't find the exact post at the moment though.

> I guess we can meet somewhere in the middle. Btw. did
> I mix up Don and Brad in the last post ? Who is working on the
> memory management ?

I don't think that you mixed anyone up. Don works primarily on CTFE. Brad works primarily on the auto tester and other infrastructure required for the dmd/Phobos folks to do what they do.

- Jonathan M Davis

October 19, 2012

Re: Regarding hex strings

Posted by foobar
in reply to Nick Sabalausky

foobar

Posted in reply to Nick Sabalausky

On Friday, 19 October 2012 at 00:14:18 UTC, Nick Sabalausky wrote:
> On Thu, 18 Oct 2012 12:11:13 +0200
> "foobar" <foo@bar.com> wrote:
>> 
>> How often large binary blobs are literally spelled in the source code (as opposed to just being read from a file)?
>
>
> Frequency isn't the issue. The issues are "*Is* it ever needed?" and
> "When it is needed, is it useful enough?" The answer to both is most
> certainly "yes". (Remember, D is supposed to usable as a systems
> language, it's not merely a high-level-app-only language.)

Any real-world use cases to support this claim? Does C++ have such a feature?
My limited experience with kernels is that this feature is not needed. The solution we used for this was to define an extern symbol and load it with a linker script (the binary data was of course stored in separate files).

>
> Keep in mind, the question "Does it pull it's own weight?" is for
> adding new features, not for going around gutting the language
> just because we can.

Ok, I grant you that but remember that the whole thread started because the feature _doesn't_ work so lets rephrase - is it worth the effort to fix this feature?

>
>> In any case, I'm not opposed to such a utility library, in fact I think it's a rather good idea and we already have a precedent with "oct!"
>> I just don't think this belongs as a built-in feature in the language.
>
> I think monarch_dodra's test proves that it definitely needs to be
> built-in.

It proves that DMD has bugs that should be fixed, nothing more.

October 19, 2012

Re: Regarding hex strings

Posted by Don Clugston
in reply to foobar

Don Clugston

Posted in reply to foobar

On 18/10/12 17:43, foobar wrote:
> On Thursday, 18 October 2012 at 14:29:57 UTC, Don Clugston wrote:
>> On 18/10/12 10:58, foobar wrote:
>>> On Thursday, 18 October 2012 at 02:47:42 UTC, H. S. Teoh wrote:
>>>> On Thu, Oct 18, 2012 at 02:45:10AM +0200, bearophile wrote:
>>>> [...]
>>>>> hex strings are useful, but I think they were invented in D1 when
>>>>> strings were convertible to char[]. But today they are an array of
>>>>> immutable UFT-8, so I think this default type is not so useful:
>>>>>
>>>>> void main() {
>>>>>    string data1 = x"A1 B2 C3 D4"; // OK
>>>>>    immutable(ubyte)[] data2 = x"A1 B2 C3 D4"; // error
>>>>> }
>>>>>
>>>>>
>>>>> test.d(3): Error: cannot implicitly convert expression
>>>>> ("\xa1\xb2\xc3\xd4") of type string to ubyte[]
>>>> [...]
>>>>
>>>> Yeah I think hex strings would be better as ubyte[] by default.
>>>>
>>>> More generally, though, I think *both* of the above lines should be
>>>> equally accepted.  If you write x"A1 B2 C3" in the context of
>>>> initializing a string, then the compiler should infer the type of the
>>>> literal as string, and if the same literal occurs in the context of,
>>>> say, passing a ubyte[], then its type should be inferred as ubyte[],
>>>> NOT
>>>> string.
>>>>
>>>>
>>>> T
>>>
>>> IMO, this is a redundant feature that complicates the language for no
>>> benefit and should be deprecated.
>>> strings already have an escape sequence for specifying code-points "\u"
>>> and for ubyte arrays you can simply use:
>>> immutable(ubyte)[] data2 = [0xA1 0xB2 0xC3 0xD4];
>>>
>>> So basically this feature gains us nothing.
>>
>> That is not the same. Array literals are not the same as string
>> literals, they have an implicit .dup.
>> See my recent thread on this issue (which unfortunately seems have to
>> died without a resolution, people got hung up about trailing null
>> characters without apparently noticing the more important issue of the
>> dup).
>
> I don't see how that detail is relevant to this discussion as I was not
> arguing against string literals or array literals in general.
>
> We can still have both (assuming the code points are valid...):
> string foo = "\ua1\ub2\uc3"; // no .dup

That doesn't compile.
Error: escape hex sequence has 2 hex digits instead of 4

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation