Thread overview
Using BOM to auto-detect file encoding
Apr 09, 2013
Kai Meyer
Apr 09, 2013
Jacob Carlborg
April 09, 2013
I would like to know if there exists a 'stream' or 'file' class that is able to take a text file with a correct BOM, and an 'ouput' utf encoding. It want it to be capable of detecting the 'input' stream utf encoding by using the BOM, and do the encoding for me on the way out in the specified 'output' utf encoding.

Right now I am using std.stream.File (which I know is going the way of all the earth soon) and manually parsing the BOM myself to then choose whether I call 'readLine' or 'readLineW', and then subsequently calling 'toUTF8' after that.

It just seems like something like this would be nice to have in phobos if it's not already there.
April 09, 2013
On 2013-04-09 18:25, Kai Meyer wrote:
> I would like to know if there exists a 'stream' or 'file' class that is
> able to take a text file with a correct BOM, and an 'ouput' utf
> encoding. It want it to be capable of detecting the 'input' stream utf
> encoding by using the BOM, and do the encoding for me on the way out in
> the specified 'output' utf encoding.
>
> Right now I am using std.stream.File (which I know is going the way of
> all the earth soon) and manually parsing the BOM myself to then choose
> whether I call 'readLine' or 'readLineW', and then subsequently calling
> 'toUTF8' after that.
>
> It just seems like something like this would be nice to have in phobos
> if it's not already there.

There is a module in Tango for this, tango.io.UnicodeFile

http://dsource.org/projects/tango/docs/current/
https://github.com/SiegeLord/Tango-D2

-- 
/Jacob Carlborg
April 09, 2013
On Tue, 09 Apr 2013 12:25:17 -0400, Kai Meyer <kai@unixlords.com> wrote:

> I would like to know if there exists a 'stream' or 'file' class that is able to take a text file with a correct BOM, and an 'ouput' utf encoding. It want it to be capable of detecting the 'input' stream utf encoding by using the BOM, and do the encoding for me on the way out in the specified 'output' utf encoding.
>
> Right now I am using std.stream.File (which I know is going the way of all the earth soon) and manually parsing the BOM myself to then choose whether I call 'readLine' or 'readLineW', and then subsequently calling 'toUTF8' after that.
>
> It just seems like something like this would be nice to have in phobos if it's not already there.

The new stream replacement code is capable of doing this, all without much effort.  It auto-detects the byte order, and allows you to specify it if you wish.

I really need to complete this code.  It's long overdue.

-Steve