"bstring"

Apr 05, 2010

Michel Fortin

Apr 06, 2010

Justin Spahr-Summers

Apr 06, 2010

Apr 06, 2010

Apr 06, 2010

Apr 07, 2010

Lately I've been using the type "immutable(ubyte)[]" a lot to pass around binary data of various kinds. In a couple of places now, to save some typing, I'm using this alias: alias immutable(ubyte)[] bstring; Would that make a worthy addition to the other standard string formats defined in object.o? Or am I the only one who is using this type a lot? -- Michel Fortin michel.fortin@michelf.com http://michelf.com/

On Mon, 5 Apr 2010 18:48:58 -0400, Michel Fortin <michel.fortin@michelf.com> wrote: > > Lately I've been using the type "immutable(ubyte)[]" a lot to pass around binary data of various kinds. In a couple of places now, to save some typing, I'm using this alias: > > alias immutable(ubyte)[] bstring; > > Would that make a worthy addition to the other standard string formats defined in object.o? Or am I the only one who is using this type a lot? I use it quite a lot too, but I'm not sure if making it (effectively) a language keyword is the right approach. I mean, I use mutable byte strings probably just as often. The fact that ubytes are really just arbitrary data I think somewhat diminishes the usefulness of a keyword; to compare, 'string' to me represents a contiguous run of valid *characters* (i.e., the data has meaning and representation in and of itself)... not strictly enforced by D, of course, but that's how the type is used. Apologies if this came out rather disjointed.

Justin Spahr-Summers wrote: > 'string' to me represents a contiguous run of valid > *characters* (i.e., the data has meaning and representation in and of > itself)... not strictly enforced by D, of course, but that's how the > type is used. If by "character" you mean "code unit", yes. string characters are UTF-8 code units in D and have meanings by themselves only if they are one-byte UTF-8 sequences. Ali

Hello Ali, > Justin Spahr-Summers wrote: > >> 'string' to me represents a contiguous run of valid >> *characters* (i.e., the data has meaning and representation in and of >> itself)... not strictly enforced by D, of course, but that's how the >> type is used. > If by "character" you mean "code unit", yes. > > string characters are UTF-8 code units in D and have meanings by > themselves only if they are one-byte UTF-8 sequences. I think that's what the "not strictly enforced by D" part was about. True or not, people often assume that a string is valid UTF-8 of some kind. -- ... <IXOYE><

April 06, 2010

Re: "bstring"

Posted by Michel Fortin
in reply to BCS

Permalink

Michel Fortin

Posted in reply to BCS

Permalink

On 2010-04-06 17:10:25 -0400, BCS <none@anon.com> said:

> Hello Ali,
> 
>> Justin Spahr-Summers wrote:
>> 
>>> 'string' to me represents a contiguous run of valid
>>> *characters* (i.e., the data has meaning and representation in and of
>>> itself)... not strictly enforced by D, of course, but that's how the
>>> type is used.
>> If by "character" you mean "code unit", yes.
>> 
>> string characters are UTF-8 code units in D and have meanings by
>> themselves only if they are one-byte UTF-8 sequences.
> 
> I think that's what the "not strictly enforced by D" part was about. True or not, people often assume that a string is valid UTF-8 of some kind.

It may not be strictly enforced, but std.range now iterates on code points instead of code units, making 'string' not very practical to use as a range when you need to iterate over UTF-8 code units (bytes), or with other text encodings. "bstring" is more appropriate for those cases.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

On Tue, 06 Apr 2010 11:50:36 -0700, Ali Çehreli <acehreli@yahoo.com> wrote: > > Justin Spahr-Summers wrote: > > > 'string' to me represents a contiguous run of valid > > *characters* (i.e., the data has meaning and representation in and of > > itself)... not strictly enforced by D, of course, but that's how the > > type is used. > > If by "character" you mean "code unit", yes. > > string characters are UTF-8 code units in D and have meanings by themselves only if they are one-byte UTF-8 sequences. > > Ali Sorry, yes. I'm not very familiar with Unicode terminology, but I do know that strings don't always contain valid Unicode sequences, and that's what I meant.

Forums