latin-1 encoding

Jan 11, 2007

Simen Haugen

Jan 12, 2007

Johan Granberg

Jan 12, 2007

Simen Haugen

Jan 12, 2007

Johan Granberg

Jan 12, 2007

Frits van Bommel

Jan 12, 2007

Frank Benoit (keinfarbton)

Simen Haugen wrote: > I'm just starting to look at D, but I can't seem to find any encodings for latin-1 in the standard library... What are you trying to do? It would be helpfull to know if you want to read files in latin-1 or if you want your whole program to use it internally.

"Johan Granberg" wrote: > What are you trying to do? It would be helpfull to know if you want to > read > files in latin-1 or if you want your whole program to use it internally. Reading and writing files.

Simen Haugen wrote: > "Johan Granberg" wrote: >> What are you trying to do? It would be helpfull to know if you want to >> read >> files in latin-1 or if you want your whole program to use it internally. > > Reading and writing files. there is no string manipulation functions i the standard library that will help you there but you could read them as usual but instead of using char[] use ubyte[] to store them. If you want to use string manipulation functions the easiest would be to convert to utf8, there was some discussion of how to do that a couple of weeks ago.

Simen Haugen schrieb: > I'm just starting to look at D, but I can't seem to find any encodings for latin-1 in the standard library... > > you can try the mango project. It has a package called ICU, that does convertions between various encodings and unicode.

January 12, 2007

Re: latin-1 encoding

Posted by Frits van Bommel
in reply to Simen Haugen

Permalink

Frits van Bommel

Posted in reply to Simen Haugen

Permalink

Simen Haugen wrote:
> "Johan Granberg" wrote:
>> What are you trying to do? It would be helpfull to know if you want to read
>> files in latin-1 or if you want your whole program to use it internally.
> 
> Reading and writing files. 

Now I'm no expert in character encodings, but isn't Latin-1 just the first 256 codepoints (or whatever they're called) of Unicode, packed into a single byte per character?

If so, it should be pretty trivial to convert latin-1 characters to Unicode, either to wchar[]/dchar[] by direct one-to-one assignment (no multibyte sequences possible) or to char[] by using std.utf.encode, like this:

-----
// warning: incomplete, untested code

ubyte[] data_lat1;

// ... fill data_lat1 array

char[] data_utf8;    // perhaps preallocate this to a reasonable length

foreach(c; data_lat1) {
    std.utf.encode(data_utf8, c);
}
-----

And UTF to Latin-1 should be pretty easy too:
-----
// again: incomplete, untested code

char[] data_utf;    // wchar[] and dchar[] should work as well

ubyte[] data_lat1;  // again, preallocate a reasonable array if you want

size_t i = 0;
while(i < data_utf.length) {
    dchar c = std.utf.decode(data_utf, i);    // advances i
    assert(c < 0x100);      // make sure it fits
    data_lat1 ~= c;
}
-----

I should note that by 'preallocate' I mean '"new" an array and set the length to 0'.
Setting the length to 0 is important since otherwise your output will get appended to the end of a default-initialized array, which isn't what you want ;)

Forums