Thread overview
"bstring"
Apr 05, 2010
Michel Fortin
Apr 06, 2010
Ali Çehreli
Apr 06, 2010
BCS
Apr 06, 2010
Michel Fortin
April 05, 2010
Lately I've been using the type "immutable(ubyte)[]" a lot to pass around binary data of various kinds. In a couple of places now, to save some typing, I'm using this alias:

	alias immutable(ubyte)[] bstring;

Would that make a worthy addition to the other standard string formats defined in object.o? Or am I the only one who is using this type a lot?

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

April 06, 2010
On Mon, 5 Apr 2010 18:48:58 -0400, Michel Fortin <michel.fortin@michelf.com> wrote:
> 
> Lately I've been using the type "immutable(ubyte)[]" a lot to pass around binary data of various kinds. In a couple of places now, to save some typing, I'm using this alias:
> 
> 	alias immutable(ubyte)[] bstring;
> 
> Would that make a worthy addition to the other standard string formats defined in object.o? Or am I the only one who is using this type a lot?

I use it quite a lot too, but I'm not sure if making it (effectively) a language keyword is the right approach. I mean, I use mutable byte strings probably just as often. The fact that ubytes are really just arbitrary data I think somewhat diminishes the usefulness of a keyword; to compare, 'string' to me represents a contiguous run of valid *characters* (i.e., the data has meaning and representation in and of itself)... not strictly enforced by D, of course, but that's how the type is used.

Apologies if this came out rather disjointed.
April 06, 2010
Justin Spahr-Summers wrote:

> 'string' to me represents a contiguous run of valid
> *characters* (i.e., the data has meaning and representation in and of
> itself)... not strictly enforced by D, of course, but that's how the
> type is used.

If by "character" you mean "code unit", yes.

string characters are UTF-8 code units in D and have meanings by themselves only if they are one-byte UTF-8 sequences.

Ali
April 06, 2010
Hello Ali,

> Justin Spahr-Summers wrote:
> 
>> 'string' to me represents a contiguous run of valid
>> *characters* (i.e., the data has meaning and representation in and of
>> itself)... not strictly enforced by D, of course, but that's how the
>> type is used.
> If by "character" you mean "code unit", yes.
> 
> string characters are UTF-8 code units in D and have meanings by
> themselves only if they are one-byte UTF-8 sequences.

I think that's what the "not strictly enforced by D" part was about. True or not, people often assume that a string is valid UTF-8 of some kind.

-- 
... <IXOYE><



April 06, 2010
On 2010-04-06 17:10:25 -0400, BCS <none@anon.com> said:

> Hello Ali,
> 
>> Justin Spahr-Summers wrote:
>> 
>>> 'string' to me represents a contiguous run of valid
>>> *characters* (i.e., the data has meaning and representation in and of
>>> itself)... not strictly enforced by D, of course, but that's how the
>>> type is used.
>> If by "character" you mean "code unit", yes.
>> 
>> string characters are UTF-8 code units in D and have meanings by
>> themselves only if they are one-byte UTF-8 sequences.
> 
> I think that's what the "not strictly enforced by D" part was about. True or not, people often assume that a string is valid UTF-8 of some kind.

It may not be strictly enforced, but std.range now iterates on code points instead of code units, making 'string' not very practical to use as a range when you need to iterate over UTF-8 code units (bytes), or with other text encodings. "bstring" is more appropriate for those cases.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

April 07, 2010
On Tue, 06 Apr 2010 11:50:36 -0700, Ali Çehreli <acehreli@yahoo.com> wrote:
> 
> Justin Spahr-Summers wrote:
> 
>  > 'string' to me represents a contiguous run of valid
>  > *characters* (i.e., the data has meaning and representation in and of
>  > itself)... not strictly enforced by D, of course, but that's how the
>  > type is used.
> 
> If by "character" you mean "code unit", yes.
> 
> string characters are UTF-8 code units in D and have meanings by themselves only if they are one-byte UTF-8 sequences.
> 
> Ali

Sorry, yes. I'm not very familiar with Unicode terminology, but I do know that strings don't always contain valid Unicode sequences, and that's what I meant.