View mode: basic / threaded / horizontal-split · Log in · Help
April 05, 2010
"bstring"
Lately I've been using the type "immutable(ubyte)[]" a lot to pass 
around binary data of various kinds. In a couple of places now, to save 
some typing, I'm using this alias:

	alias immutable(ubyte)[] bstring;

Would that make a worthy addition to the other standard string formats 
defined in object.o? Or am I the only one who is using this type a lot?

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/
April 06, 2010
Re: "bstring"
On Mon, 5 Apr 2010 18:48:58 -0400, Michel Fortin 
<michel.fortin@michelf.com> wrote:
> 
> Lately I've been using the type "immutable(ubyte)[]" a lot to pass 
> around binary data of various kinds. In a couple of places now, to save 
> some typing, I'm using this alias:
> 
> 	alias immutable(ubyte)[] bstring;
> 
> Would that make a worthy addition to the other standard string formats 
> defined in object.o? Or am I the only one who is using this type a lot?

I use it quite a lot too, but I'm not sure if making it (effectively) a 
language keyword is the right approach. I mean, I use mutable byte 
strings probably just as often. The fact that ubytes are really just 
arbitrary data I think somewhat diminishes the usefulness of a keyword; 
to compare, 'string' to me represents a contiguous run of valid 
*characters* (i.e., the data has meaning and representation in and of 
itself)... not strictly enforced by D, of course, but that's how the 
type is used.

Apologies if this came out rather disjointed.
April 06, 2010
Re: "bstring"
Justin Spahr-Summers wrote:

> 'string' to me represents a contiguous run of valid
> *characters* (i.e., the data has meaning and representation in and of
> itself)... not strictly enforced by D, of course, but that's how the
> type is used.

If by "character" you mean "code unit", yes.

string characters are UTF-8 code units in D and have meanings by 
themselves only if they are one-byte UTF-8 sequences.

Ali
April 06, 2010
Re: "bstring"
Hello Ali,

> Justin Spahr-Summers wrote:
> 
>> 'string' to me represents a contiguous run of valid
>> *characters* (i.e., the data has meaning and representation in and of
>> itself)... not strictly enforced by D, of course, but that's how the
>> type is used.
> If by "character" you mean "code unit", yes.
> 
> string characters are UTF-8 code units in D and have meanings by
> themselves only if they are one-byte UTF-8 sequences.

I think that's what the "not strictly enforced by D" part was about. True 
or not, people often assume that a string is valid UTF-8 of some kind.

-- 
... <IXOYE><
April 06, 2010
Re: "bstring"
On 2010-04-06 17:10:25 -0400, BCS <none@anon.com> said:

> Hello Ali,
> 
>> Justin Spahr-Summers wrote:
>> 
>>> 'string' to me represents a contiguous run of valid
>>> *characters* (i.e., the data has meaning and representation in and of
>>> itself)... not strictly enforced by D, of course, but that's how the
>>> type is used.
>> If by "character" you mean "code unit", yes.
>> 
>> string characters are UTF-8 code units in D and have meanings by
>> themselves only if they are one-byte UTF-8 sequences.
> 
> I think that's what the "not strictly enforced by D" part was about. 
> True or not, people often assume that a string is valid UTF-8 of some 
> kind.

It may not be strictly enforced, but std.range now iterates on code 
points instead of code units, making 'string' not very practical to use 
as a range when you need to iterate over UTF-8 code units (bytes), or 
with other text encodings. "bstring" is more appropriate for those 
cases.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/
April 07, 2010
Re: "bstring"
On Tue, 06 Apr 2010 11:50:36 -0700, Ali Çehreli <acehreli@yahoo.com> 
wrote:
> 
> Justin Spahr-Summers wrote:
> 
>  > 'string' to me represents a contiguous run of valid
>  > *characters* (i.e., the data has meaning and representation in and of
>  > itself)... not strictly enforced by D, of course, but that's how the
>  > type is used.
> 
> If by "character" you mean "code unit", yes.
> 
> string characters are UTF-8 code units in D and have meanings by 
> themselves only if they are one-byte UTF-8 sequences.
> 
> Ali

Sorry, yes. I'm not very familiar with Unicode terminology, but I do 
know that strings don't always contain valid Unicode sequences, and 
that's what I meant.
Top | Discussion index | About this forum | D home