Thread overview
latin-1 encoding
Jan 11, 2007
Simen Haugen
Jan 12, 2007
Johan Granberg
Jan 12, 2007
Simen Haugen
Jan 12, 2007
Johan Granberg
Jan 12, 2007
Frits van Bommel
January 11, 2007
I'm just starting to look at D, but I can't seem to find any encodings for latin-1 in the standard library...


January 12, 2007
Simen Haugen wrote:

> I'm just starting to look at D, but I can't seem to find any encodings for latin-1 in the standard library...

What are you trying to do? It would be helpfull to know if you want to read files in latin-1 or if you want your whole program to use it internally.
January 12, 2007
"Johan Granberg" wrote:
> What are you trying to do? It would be helpfull to know if you want to
> read
> files in latin-1 or if you want your whole program to use it internally.

Reading and writing files.


January 12, 2007
Simen Haugen wrote:

> "Johan Granberg" wrote:
>> What are you trying to do? It would be helpfull to know if you want to
>> read
>> files in latin-1 or if you want your whole program to use it internally.
> 
> Reading and writing files.

there is no string manipulation functions i the standard library that will help you there but you could read them as usual but instead of using char[] use ubyte[] to store them. If you want to use string manipulation functions the easiest would be to convert to utf8, there was some discussion of how to do that a couple of weeks ago.
January 12, 2007
Simen Haugen schrieb:
> I'm just starting to look at D, but I can't seem to find any encodings for latin-1 in the standard library...
> 
> 

you can try the mango project. It has a package called ICU, that does convertions between various encodings and unicode.
January 12, 2007
Simen Haugen wrote:
> "Johan Granberg" wrote:
>> What are you trying to do? It would be helpfull to know if you want to read
>> files in latin-1 or if you want your whole program to use it internally.
> 
> Reading and writing files. 

Now I'm no expert in character encodings, but isn't Latin-1 just the first 256 codepoints (or whatever they're called) of Unicode, packed into a single byte per character?

If so, it should be pretty trivial to convert latin-1 characters to Unicode, either to wchar[]/dchar[] by direct one-to-one assignment (no multibyte sequences possible) or to char[] by using std.utf.encode, like this:

-----
// warning: incomplete, untested code

ubyte[] data_lat1;

// ... fill data_lat1 array

char[] data_utf8;    // perhaps preallocate this to a reasonable length

foreach(c; data_lat1) {
    std.utf.encode(data_utf8, c);
}
-----


And UTF to Latin-1 should be pretty easy too:
-----
// again: incomplete, untested code

char[] data_utf;    // wchar[] and dchar[] should work as well

ubyte[] data_lat1;  // again, preallocate a reasonable array if you want

size_t i = 0;
while(i < data_utf.length) {
    dchar c = std.utf.decode(data_utf, i);    // advances i
    assert(c < 0x100);      // make sure it fits
    data_lat1 ~= c;
}
-----

I should note that by 'preallocate' I mean '"new" an array and set the length to 0'.
Setting the length to 0 is important since otherwise your output will get appended to the end of a default-initialized array, which isn't what you want ;)