Thread overview
Internationalization vs. Unicode
Apr 26, 2013
Tyro[17]
Apr 26, 2013
H. S. Teoh
Apr 27, 2013
Jacob Carlborg
Apr 29, 2013
Tyro[17]
Apr 29, 2013
Jesse Phillips
April 26, 2013
There are myriad encoding schemes. D natively supports Unicode and provide functionality via phobos. A byproduct of this is that since ASCII is a subset of Unicode, it also natively support ASCII. This is a plus for the language but what of the other encoding schemes? What library functionality is provided to manipulate or convert between those encoding schemes and Unicode?

I have a need to convert from CKJ encoding (presently EUC-JP and Shift-JIS) to Unicode. How do I accomplish this using D/Phobos? Is there a standalone library that does this? If so, can someone point me to it? If not, is there planned functionality for inclusion in phobos or am I doomed to resorting to Java or some other language to accomplish this task (or at least until I'm educated enough to do it myself)?

Thanks,
Andrew
April 26, 2013
On Fri, Apr 26, 2013 at 06:09:48PM -0400, Tyro[17] wrote:
> There are myriad encoding schemes. D natively supports Unicode and provide functionality via phobos. A byproduct of this is that since ASCII is a subset of Unicode, it also natively support ASCII. This is a plus for the language but what of the other encoding schemes? What library functionality is provided to manipulate or convert between those encoding schemes and Unicode?
> 
> I have a need to convert from CKJ encoding (presently EUC-JP and Shift-JIS) to Unicode. How do I accomplish this using D/Phobos? Is there a standalone library that does this? If so, can someone point me to it? If not, is there planned functionality for inclusion in phobos or am I doomed to resorting to Java or some other language to accomplish this task (or at least until I'm educated enough to do it myself)?
[...]

If you're using a Posix system, you could look into the 'recode' utility to convert from those legacy formats to Unicode before using your program on them. You may be able to figure out how to do it by looking at recode's source code. But AFAIK there is no way to do it in D currently.

Maybe someone should invent std.recode and submit it for inclusion into Phobos. ;-)


T

-- 
People tell me that I'm paranoid, but they're just out to get me.
April 27, 2013
On 2013-04-27 00:09, Tyro[17] wrote:
> There are myriad encoding schemes. D natively supports Unicode and
> provide functionality via phobos. A byproduct of this is that since
> ASCII is a subset of Unicode, it also natively support ASCII. This is a
> plus for the language but what of the other encoding schemes? What
> library functionality is provided to manipulate or convert between those
> encoding schemes and Unicode?
>
> I have a need to convert from CKJ encoding (presently EUC-JP and
> Shift-JIS) to Unicode. How do I accomplish this using D/Phobos? Is there
> a standalone library that does this? If so, can someone point me to it?
> If not, is there planned functionality for inclusion in phobos or am I
> doomed to resorting to Java or some other language to accomplish this
> task (or at least until I'm educated enough to do it myself)?

Would ICU do the work? If that's the case you can take a look at this:

https://github.com/d-widget-toolkit/com.ibm.icu

I will most likely not compile with the latest version of DMD. Also I don't know how complete it is.

-- 
/Jacob Carlborg
April 29, 2013
On 4/27/13 6:37 AM, Jacob Carlborg wrote:
> On 2013-04-27 00:09, Tyro[17] wrote:
>> There are myriad encoding schemes. D natively supports Unicode and
>> provide functionality via phobos. A byproduct of this is that since
>> ASCII is a subset of Unicode, it also natively support ASCII. This is a
>> plus for the language but what of the other encoding schemes? What
>> library functionality is provided to manipulate or convert between those
>> encoding schemes and Unicode?
>>
>> I have a need to convert from CKJ encoding (presently EUC-JP and
>> Shift-JIS) to Unicode. How do I accomplish this using D/Phobos? Is there
>> a standalone library that does this? If so, can someone point me to it?
>> If not, is there planned functionality for inclusion in phobos or am I
>> doomed to resorting to Java or some other language to accomplish this
>> task (or at least until I'm educated enough to do it myself)?
>
> Would ICU do the work? If that's the case you can take a look at this:
>
> https://github.com/d-widget-toolkit/com.ibm.icu
>
> I will most likely not compile with the latest version of DMD. Also I
> don't know how complete it is.
>

This might work. Not sure yet. The first thing that caught my eyes is

	import java.lang.all;
	import java.math.BigInteger;
	import java.text.CharacterIterator;
	import java.text.ParsePosition;
	import java.util.Comparator;
	import java.util.Date;

and I was immediately confused. What? We can directly import and use Java in D? Let me try this... Oh! No! Not really! We can't. Well, since D uses the file system to organize its files, I should be able to find a java folder with these classes signatures or the D equivalent somewhere in the project folder. No... I don't see one anywhere. Looks like I will have to file ICU on my list of things to get educated about. For now I will continue to use the Java implementation I've got. Thanks.
April 29, 2013
On Monday, 29 April 2013 at 18:36:32 UTC, Tyro[17] wrote:

> This might work. Not sure yet. The first thing that caught my eyes is

You'll find the ported Java source:
https://github.com/d-widget-toolkit/base/tree/master/src