Thread overview
Reading UTF32 files
Aug 04, 2006
Tim Locke
Aug 04, 2006
Hasan Aljudy
Aug 04, 2006
kris
Aug 04, 2006
Hasan Aljudy
Aug 04, 2006
kris
Aug 10, 2006
Markus Dangl
Aug 10, 2006
Oskar Linde
Aug 04, 2006
Derek Parnell
Aug 04, 2006
Tim Locke
August 04, 2006
How do I read an UTF32 file? Stream only seems to support UTF8 with readLine and UTF16 with readLineW.

Thanks
August 04, 2006

Tim Locke wrote:
> How do I read an UTF32 file? Stream only seems to support UTF8 with
> readLine and UTF16 with readLineW.
> 
> Thanks

I use mango to convert any file into UTF32

I haven't actually tested it very much .. but I think it should work:


---------
static import std.file;
import mango = mango.convert.UnicodeBom;
version( build ) //TEMP until build learns renamed import syntax
{
    pragma( include, mango.convert.UnicodeBom )
}

dchar[] readFile( char[] fileName )
{
    if( std.file.exists( fileName ) )
        return toUtf32( std.file.read( fileName ) );
    else
        throw new Exception("File: " ~ fileName ~ " doesn't exist");
}

private
{
    alias mango.UnicodeBomTemplate!(dchar) Utf32Decoder;

    ///read BOM and decode/convert to utf-32
    dchar[] toUtf32(void[] content)
    {
        auto decoder = new Utf32Decoder(mango.Unicode.Unknown);
        return decoder.decode(content);
    }
}
---------
August 04, 2006
On Fri, 04 Aug 2006 00:38:18 -0300, Tim Locke wrote:

> How do I read an UTF32 file? Stream only seems to support UTF8 with readLine and UTF16 with readLineW.
> 
> Thanks

Read them in 4-byte chunks and, depending on endian-ness, convert to a ulong then cast to a dchar then append to a dchar[] ... simple!

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
4/08/2006 2:15:37 PM
August 04, 2006
It's perhaps easier to use UnicodeFile instead:

# import mango.io.UnicodeFile;
#
# auto file = new UnicodeFileT!(dchar)("myfile", Unicode.Unknown);
# auto content = file.read;


Please note that Mango leverages a different IO model than Phobos, so you'll have to compile this along with a few other mango.io modules.

Mango typically requires the use of Build to pull in relevant modules, because the combination of D, libraries, and templates just doesn't work reliably at this time. If the compiler front-end were to handle recursive imports natively (like a very simple Build), it would be great! The changes to do so (for DMD) are minimal ;)



Hasan Aljudy wrote:
> 
> 
> Tim Locke wrote:
> 
>> How do I read an UTF32 file? Stream only seems to support UTF8 with
>> readLine and UTF16 with readLineW.
>>
>> Thanks
> 
> 
> I use mango to convert any file into UTF32
> 
> I haven't actually tested it very much .. but I think it should work:
> 
> 
> ---------
> static import std.file;
> import mango = mango.convert.UnicodeBom;
> version( build ) //TEMP until build learns renamed import syntax
> {
>     pragma( include, mango.convert.UnicodeBom )
> }
> 
> dchar[] readFile( char[] fileName )
> {
>     if( std.file.exists( fileName ) )
>         return toUtf32( std.file.read( fileName ) );
>     else
>         throw new Exception("File: " ~ fileName ~ " doesn't exist");
> }
> 
> private
> {
>     alias mango.UnicodeBomTemplate!(dchar) Utf32Decoder;
> 
>     ///read BOM and decode/convert to utf-32
>     dchar[] toUtf32(void[] content)
>     {
>         auto decoder = new Utf32Decoder(mango.Unicode.Unknown);
>         return decoder.decode(content);
>     }
> }
> ---------


August 04, 2006

kris wrote:
> It's perhaps easier to use UnicodeFile instead:
> 
> # import mango.io.UnicodeFile;
> #
> # auto file = new UnicodeFileT!(dchar)("myfile", Unicode.Unknown);
> # auto content = file.read;
> 

Ah nice! I didn't know about that.
I wish someone had told me about it earlier. Are there any tutorials for mango that explain where everything is?
I don't mean the documentation. I mean something that tells you: "if you want to read/decode files, see the documentation for mango.io.UnicodeFile" for example...


> 
> Please note that Mango leverages a different IO model than Phobos, so you'll have to compile this along with a few other mango.io modules.

I use build, so I don't really care.

> 
> Mango typically requires the use of Build to pull in relevant modules, because the combination of D, libraries, and templates just doesn't work reliably at this time. If the compiler front-end were to handle recursive imports natively (like a very simple Build), it would be great! The changes to do so (for DMD) are minimal ;)
> 

Yes, that would be great.
Just let dmd recursivly compile all imported module, and because dmd is so fast, it doesn't matter even if dmd recompiles modules that have already been compiled.

I always use the -full -clean switches on build anyways.
August 04, 2006
Hasan Aljudy wrote:
> 
> 
> kris wrote:
> 
>> It's perhaps easier to use UnicodeFile instead:
>>
>> # import mango.io.UnicodeFile;
>> #
>> # auto file = new UnicodeFileT!(dchar)("myfile", Unicode.Unknown);
>> # auto content = file.read;
>>
> 
> Ah nice! I didn't know about that.
> I wish someone had told me about it earlier. Are there any tutorials for mango that explain where everything is?
> I don't mean the documentation. I mean something that tells you: "if you want to read/decode files, see the documentation for mango.io.UnicodeFile" for example...

No, but there should be :)

BTW: that should probably read "auto content = file.read();" with parens, since otherwise the 'auto' will try to take the function reference


>> Mango typically requires the use of Build to pull in relevant modules, because the combination of D, libraries, and templates just doesn't work reliably at this time. If the compiler front-end were to handle recursive imports natively (like a very simple Build), it would be great! The changes to do so (for DMD) are minimal ;)
>>
> 
> Yes, that would be great.
> Just let dmd recursivly compile all imported module, and because dmd is so fast, it doesn't matter even if dmd recompiles modules that have already been compiled.
> 
> I always use the -full -clean switches on build anyways.

Me too.

Note that DMD *already* pulls in all imported modules during a compilation, and runs one or two stages on each of them ... it just doesn't propogate those modules through the latter stages of compilation and linking ~ choosing to discard them instead. A flag to include them in the compilation and linking stages would be just awesome.
August 04, 2006
On Fri, 4 Aug 2006 14:16:49 +1000, Derek Parnell <derek@nomail.afraid.org> wrote:

>On Fri, 04 Aug 2006 00:38:18 -0300, Tim Locke wrote:
>
>> How do I read an UTF32 file? Stream only seems to support UTF8 with readLine and UTF16 with readLineW.
>> 
>> Thanks
>
>Read them in 4-byte chunks and, depending on endian-ness, convert to a ulong then cast to a dchar then append to a dchar[] ... simple!

Thanks. I will try that.
August 10, 2006
> BTW: that should probably read "auto content = file.read();" with parens, since otherwise the 'auto' will try to take the function reference

Just a note:
I think all methods that don't take parameters can be called without parens, just like you normally use properties, but it's a bit clearer to use parens here (because "read" should actually be used as a method).
To take the reference you'd have to use sth like "auto pointer = &file.read" ...
August 10, 2006
Markus Dangl wrote:

>> BTW: that should probably read "auto content = file.read();" with parens, since otherwise the 'auto' will try to take the function reference
> 
> Just a note:
> I think all methods that don't take parameters can be called without
> parens, just like you normally use properties, but it's a bit clearer to
> use parens here (because "read" should actually be used as a method).
> To take the reference you'd have to use sth like "auto pointer =
> &file.read" ...

In this case, due to a bug or an unfortunate side effect,

auto content = file.read;

will neither call file.read() or make content a reference to the function.
It will try to make content a function type (as opposed to a reference to a
function) which will fail to compile.

There is also still at least one case where an empty pair of parentheses are needed at a function call. Array extension methods:

void func(int[] t) {}

can not be called as:

arr.func;

Though I'm not sure there is any fundamental reason it has to be that way.

All function (reference) and delegate types will also require the parentheses, which is more or less necessary to avoid ambiguities:

int delegate() func() { return { return 1; }; }

...

func;

/Oskar