August 23, 2012
On 23/08/12 05:05, bearophile wrote:
> Sean Kelly:
>
>> I'm clearly missing something.  ASCII and UTF-8 are compatible.
>>  What's stopping you from just processing these as if they were UTF-8
>> strings?
>
> std.algorithm is not closed
> (http://en.wikipedia.org/wiki/Closure_%28mathematics%29 ) on UTF-8, its
> operations lead to UTF-32.

Which operations in std.algorithm over map 0-0x7F into higher characters?
August 23, 2012
Don Clugston:

> Which operations in std.algorithm over map 0-0x7F into higher characters?

The first example I've shown:

string s = "test string";
dchar[] s2 = map!(x => x)(s).array(); // Uses the Id function

Bye,
bearophile
August 23, 2012
On Aug 23, 2012, at 4:25 AM, bearophile <bearophileHUGS@lycos.com> wrote:

> Sean Kelly:
> 
>> Gotcha.  Despite it being something I'd use regularly, I wouldn't want this in Phobos because it seems like it could cause maintenance problems.  I'd rather explicitly cast to ubyte as a way to flag that I was doing something potentially unsafe.
> 
> What's unsafe in what I have presented? The constructor verifies every char to be in 7 bits, and then you use the new type safely. No casts, and no need to flag something as unsafe.
> 
> This usage of types to denote capabilities is quite common in functional languages, see articles I've recently linked here as:
> http://tomasp.net/blog/type-first-development.aspx

So it throws an exception if there are non-ASCII characters in the range?  Is this really better than just casting the input array to ubyte?
August 23, 2012
Sean Kelly:

> So it throws an exception if there are non-ASCII characters in the range?  Is this really better than just casting the input array to ubyte?

The cast to ubute[] doesn't perform a run-time test of the
validity of the input, so yeah, the exception is better. Your
code is also able to catch and manage the exception (like asking
the user for another valid input file).

If you carry around some type as "Astring", later you don't have
to cast it back to char[] to print the data as a string (this
discussion is about data that is naturally text, this discussion
is not about generic numerical octets).

An appropriate type statically encodes in your program that you
are using an ascii string. This makes your code more readable.
But when in the code you see a variable of generic type ubyte[]
it doesn't tell you a lot about its contents.

Bye,
bearophile
1 2
Next ›   Last »