Thread overview
ANSI to UTF8 problem
Aug 17, 2010
jicman
Aug 17, 2010
Nick Sabalausky
Aug 17, 2010
jicman
August 17, 2010
Greetings.

I have this program,

import std.stdio;
import juno.base.text;
import std.file;
import std.windows.charset;
import std.utf;

int main(char[][] args)
{
  char[] ansi = r"c:\ansi.txt";
  char[] utf8 = r"c:\utf8.txt";
  try
  {
    char[] t = cast(char[]) read(ansi);
    write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
    writefln(" converted to UTF8.");
  }
  catch (UtfException e)
  {
    writefln(" is not ANSI");
    return 1;
  }
  return(0);
}

the ansi.txt file contains,

josé
áéíóúñÑ

the utf8.txt file when opened with Wordpad looks like this:

josé
áéíóúñÑ

The file did change from ANSI to UTF8, however, it display wrong with Wordpad.  The problem is that there is one application that I am trying to filled with these UTF8 files that is behaving or displaying the same problem as Wordpad.

Any help would be greatly appreciated.

thanks,

josé
August 17, 2010
"jicman" <cabrera_@_wrc.xerox.com> wrote in message news:i4cn8h$2vtn$1@digitalmars.com...
>
> Greetings.
>
> I have this program,
>
> import std.stdio;
> import juno.base.text;
> import std.file;
> import std.windows.charset;
> import std.utf;
>
> int main(char[][] args)
> {
>  char[] ansi = r"c:\ansi.txt";
>  char[] utf8 = r"c:\utf8.txt";
>  try
>  {
>    char[] t = cast(char[]) read(ansi);
>    write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
>    writefln(" converted to UTF8.");
>  }
>  catch (UtfException e)
>  {
>    writefln(" is not ANSI");
>    return 1;
>  }
>  return(0);
> }
>
> the ansi.txt file contains,
>
> josé
> áéíóúñÑ
>
> the utf8.txt file when opened with Wordpad looks like this:
>
> josé
> áéíóúñÑ
>
> The file did change from ANSI to UTF8, however, it display wrong with Wordpad.  The problem is that there is one application that I am trying to filled with these UTF8 files that is behaving or displaying the same problem as Wordpad.
>
> Any help would be greatly appreciated.
>
> thanks,
>
> josé

The utf8.txt file is probably missing the UTF-8 BOM (I'm not familiar with fromMBSz: I *assume* it doesn't add the BOM, but maybe I'm wrong?). Without that BOM, Wordpad is probably assuming it's "ASCII with some codepage" instead of UTF8.

Open utf8.txt in a hex editor (I like XVI32). If it doesn't start with EF BB BF then that's probably the problem, and you'll need to change:

write(utf8, std.windows.charset.fromMBSz(t.ptr,0));

to:

write(utf8, x"EF BB BF" ~ std.windows.charset.fromMBSz(t.ptr,0));


August 17, 2010
Nick Sabalausky Wrote:

> "jicman" <cabrera_@_wrc.xerox.com> wrote in message news:i4cn8h$2vtn$1@digitalmars.com...
> >
> > Greetings.
> >
> > I have this program,
> >
> > import std.stdio;
> > import juno.base.text;
> > import std.file;
> > import std.windows.charset;
> > import std.utf;
> >
> > int main(char[][] args)
> > {
> >  char[] ansi = r"c:\ansi.txt";
> >  char[] utf8 = r"c:\utf8.txt";
> >  try
> >  {
> >    char[] t = cast(char[]) read(ansi);
> >    write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
> >    writefln(" converted to UTF8.");
> >  }
> >  catch (UtfException e)
> >  {
> >    writefln(" is not ANSI");
> >    return 1;
> >  }
> >  return(0);
> > }
> >
> > the ansi.txt file contains,
> >
> > josé
> > áéíóúñÑ
> >
> > the utf8.txt file when opened with Wordpad looks like this:
> >
> > josé
> > áéíóúñÑ
> >
> > The file did change from ANSI to UTF8, however, it display wrong with Wordpad.  The problem is that there is one application that I am trying to filled with these UTF8 files that is behaving or displaying the same problem as Wordpad.
> >
> > Any help would be greatly appreciated.
> >
> > thanks,
> >
> > josé
> 
> The utf8.txt file is probably missing the UTF-8 BOM (I'm not familiar with fromMBSz: I *assume* it doesn't add the BOM, but maybe I'm wrong?). Without that BOM, Wordpad is probably assuming it's "ASCII with some codepage" instead of UTF8.
> 
> Open utf8.txt in a hex editor (I like XVI32). If it doesn't start with EF BB BF then that's probably the problem, and you'll need to change:
> 
> write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
> 
> to:
> 
> write(utf8, x"EF BB BF" ~ std.windows.charset.fromMBSz(t.ptr,0));
> 
> 
DOH!  Yep!  Thanks, Nick.

josé