View mode: basic / threaded / horizontal-split · Log in · Help
November 23, 2004
8-bit character encodings
I've written some test code for encodings...


They take a mapping (wchar[256]) from ubyte,
which defines the 8-bit charset / encoding.

Then it can convert to and from Unicode.
(such as the default char[] strings in D)


The unoptimized D code looks like this:

> /// converts a 8-bit charset encoding string into unicode
> char[] decode_string(ubyte[] string, wchar[256] mapping)
> {
> 	wchar[] result;
> 	foreach (ubyte c; string)
> 	{
> 		if (mapping[c] != 0xFFFF)
> 		  result ~= mapping[c];
> 	}
> 	return std.utf.toUTF8(result);
> }

> /// converts a unicode string into 8-bit charset encoding
> ubyte[] encode_string(char[] string, wchar[256] mapping)
> {
> 	ubyte[] result;
> 	foreach (wchar c; string)
> 	{
> 		foreach (int i, wchar m; mapping)
> 		{
> 		    if (c == m)
> 		        result ~= cast(ubyte) i;
> 		} 
> 	}
> 	return result;
> }

I added four mappings, just to have something to
test with: iso88591, cp1252, cp437, macroman
(each lookup table is 512 bytes, so that's 2K)

The ubyte[] can then be used as C (char *), by
nul-terminating as usual, for e.g. printf("%s")
It works just fine, for both I/O as e.g. Latin-1

It should probably throw an exception or something
like that, when it encounters unmapped characters ?
(for instance: Win CP-1252 has 5 non-Unicode chars)


Surely someone must have written this before ?
Just that I couldn't find it in the libraries...

--anders


PS. The real code builds reverse lookup tables too.
    (one with chars < 0x0100, and one with the rest)

PPS. wchar[] versions left as exercise for the reader.
     They would avoid all the UTF-8 conversions above.
November 23, 2004
Re: 8-bit character encodings
There's a Microsoft API function to do this, I think it's
WideCharToMultiByte() and MultiByteToWideChar().
November 23, 2004
Re: 8-bit character encodings
Walter wrote:
> There's a Microsoft API function to do this, I think it's
> WideCharToMultiByte() and MultiByteToWideChar().

But that's only on Windows, right ?

(got the lookups from unicode.org)

--anders
November 24, 2004
Re: 8-bit character encodings
"Anders F Björklund" <afb@algonet.se> wrote in message
news:co0f4b$28ip$1@digitaldaemon.com...
> Walter wrote:
> > There's a Microsoft API function to do this, I think it's
> > WideCharToMultiByte() and MultiByteToWideChar().
>
> But that's only on Windows, right ?
>
> (got the lookups from unicode.org)

Right. I don't know what the corresponding linux API is.
November 24, 2004
Mango news (was Re: 8-bit character encodings)
| "Anders F Björklund" <afb@algonet.se> wrote in message
| > Walter wrote:
| > > There's a Microsoft API function to do this, I think it's
| > > WideCharToMultiByte() and MultiByteToWideChar().
| >
| > But that's only on Windows, right ?
| >
| > (got the lookups from unicode.org)
|
| Right. I don't know what the corresponding linux API is.


Mango.io has optional bindings to any/all of the extensive ICU converters.
Stdio is covered there also, so it should probably handle the above case
without issue. I'd like to encourage folks to consider Mango.io and
Mango.icu as part of any Unicode oriented project. Naturally, I'm somewhat
biased :-)

For those not familiar with Mango, it comprises a set of related packages
(the Mango Tree) including:

- Cohesive, type-safe, and highly extensible IO package. Now with ICU hooks.
Supports all the D types along with all their array variants, and makes it
trivial to bind your own classes directly to the IO layer. Provides both the
put/get & <</>> syntactical flavors.

- Configurable runtime logging, a la Log4J, with a bonus HTML-based manager
to dynamically adjust the settings of a running executable. Also hooks into
Chainsaw for remote monitoring.

- Servlet engine. Supports the best parts of what the Java servlet spec
provides, and has better IO.

- A customizable and extensible HTTP server (used by the servlet engine).
Perhaps the fastest HTTP server available, since it can happily process
requests without making a single memory allocation. Just goes to show what
thread-locals and D array-slicing can do for performance! Also has a
separate HttpClient.

- High performance clustering. Based loosely around a Linda design, with
aspects of pub/sub and queuing mixed in. Uses D class-serialization to send
objects around a cluster, and is easy to use.

- Wrappers around the extensive ICU (unicode) project. This currently covers
around 85% of the ICU functionality, and includes a very usable
unicode-enabled UString class.


These packages are available as separate libraries. That is, Mango.icu and
Mango.log can be used in complete isolation. Mango.io can also be used
standalone. Mango.cluster, Mango.http, Mango.servlet, and Mango.cache
leverage the IO package to one degree or another.

Beta 9.6 will be released before the week is out, and v1.0 of some packages
will occur shortly thereafter. You can find out more about Mango over here:
http://www.dsource.org/forums/
November 24, 2004
Re: Mango news (was Re: 8-bit character encodings)
Kris wrote:

> Mango.io has optional bindings to any/all of the extensive ICU converters.
> Stdio is covered there also, so it should probably handle the above case
> without issue. I'd like to encourage folks to consider Mango.io and
> Mango.icu as part of any Unicode oriented project. Naturally, I'm somewhat
> biased :-)

OK, will check it out. Only difference being: 12 MB versus 32 KB :-)

Put there's probably other neat stuff in there, and had ICU already.

> These packages are available as separate libraries. That is, Mango.icu and
> Mango.log can be used in complete isolation. Mango.io can also be used
> standalone. Mango.cluster, Mango.http, Mango.servlet, and Mango.cache
> leverage the IO package to one degree or another.

Looks extensive! Wonder if it compiles on Darwin ? Hmm, no makefile...

--anders
November 24, 2004
Re: Mango news (was Re: 8-bit character encodings)
"Anders F Björklund" <afb@algonet.se> wrote in message news:co1du1
|
| Looks extensive! Wonder if it compiles on Darwin ? Hmm, no makefile...
|
| --anders

I'm not sure that anyone has tried it on Darwin as yet. Perhaps the linux
makefile will work? This one is compatible with the Beta 9.5 download
(accessible via the dsource download section), and I'll update it tomorrow
with the Beta 9.6 equivalent (to match the current checkins)

http://svn.dsource.org/svn/projects/mango/trunk/

Given that the ICU stuff is so recent, it has not been linked to the *nix
libs. The effort to get there is a known (and limited) quantity, but hasn't
happened yet. Everything else compiles and links just fine on linux, and the
vast majority of it runs without issue (there is one known problem regarding
Mango.cluster on that platform).

If you'd perhaps be willing to lend a hand regarding Darwin (or with the ICU
bindings, or whatever else), that would be great! :-)
November 24, 2004
Re: Mango news (was Re: 8-bit character encodings)
That's good work!

"Kris" <fu@bar.com> wrote in message news:co0pv7$2oh2$1@digitaldaemon.com...
> Mango.io has optional bindings to any/all of the extensive ICU converters.
> Stdio is covered there also, so it should probably handle the above case
> without issue. I'd like to encourage folks to consider Mango.io and
> Mango.icu as part of any Unicode oriented project. Naturally, I'm somewhat
> biased :-)
>
> For those not familiar with Mango, it comprises a set of related packages
> (the Mango Tree) including:
>
> - Cohesive, type-safe, and highly extensible IO package. Now with ICU
hooks.
> Supports all the D types along with all their array variants, and makes it
> trivial to bind your own classes directly to the IO layer. Provides both
the
> put/get & <</>> syntactical flavors.
>
> - Configurable runtime logging, a la Log4J, with a bonus HTML-based
manager
> to dynamically adjust the settings of a running executable. Also hooks
into
> Chainsaw for remote monitoring.
>
> - Servlet engine. Supports the best parts of what the Java servlet spec
> provides, and has better IO.
>
> - A customizable and extensible HTTP server (used by the servlet engine).
> Perhaps the fastest HTTP server available, since it can happily process
> requests without making a single memory allocation. Just goes to show what
> thread-locals and D array-slicing can do for performance! Also has a
> separate HttpClient.
>
> - High performance clustering. Based loosely around a Linda design, with
> aspects of pub/sub and queuing mixed in. Uses D class-serialization to
send
> objects around a cluster, and is easy to use.
>
> - Wrappers around the extensive ICU (unicode) project. This currently
covers
> around 85% of the ICU functionality, and includes a very usable
> unicode-enabled UString class.
>
>
> These packages are available as separate libraries. That is, Mango.icu and
> Mango.log can be used in complete isolation. Mango.io can also be used
> standalone. Mango.cluster, Mango.http, Mango.servlet, and Mango.cache
> leverage the IO package to one degree or another.
>
> Beta 9.6 will be released before the week is out, and v1.0 of some
packages
> will occur shortly thereafter. You can find out more about Mango over
here:
> http://www.dsource.org/forums/
>
>
November 24, 2004
Re: Mango news (was Re: 8-bit character encodings)
Kris wrote:

> I'm not sure that anyone has tried it on Darwin as yet. Perhaps the linux
> makefile will work? This one is compatible with the Beta 9.5 download
> (accessible via the dsource download section), and I'll update it tomorrow
> with the Beta 9.6 equivalent (to match the current checkins)

I copied the linux makefile to darwin.make, and tried it.
Throwed some errors and then gdc hung on FileConduit.d...

I think it was, will post the actual errors on Mango forum

> Given that the ICU stuff is so recent, it has not been linked to the *nix
> libs. The effort to get there is a known (and limited) quantity, but hasn't
> happened yet. Everything else compiles and links just fine on linux, and the
> vast majority of it runs without issue (there is one known problem regarding
> Mango.cluster on that platform).

Looks like most of it is POSIX-ish, should be compilable ?

--anders
November 24, 2004
Re: Mango news (was Re: 8-bit character encodings)
Kris wrote:

> I'm not sure that anyone has tried it on Darwin as yet. Perhaps the linux
> makefile will work? This one is compatible with the Beta 9.5 download
> (accessible via the dsource download section), and I'll update it tomorrow
> with the Beta 9.6 equivalent (to match the current checkins)

Also, the Makefile seems a little broken since it recompiles everything?
It should reference the object files, and not the source code directly.

Something like:

> %.o : %.d
> 	$(DMD) -c $(DFLAGS) -o $@ $<
> 
> libmango.a : $(OBJECTS)
> 	$(AR) -r $@ $(OBJECTS)

Perhaps adapted to use the $(OBJ) dir?

"all", "clean" and "install" targets seems to be missing, by the way.
They are phony targets that just references the others or runs shell.

One could also add a "check" target, that would run the unit-tests...

--anders
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home