Always std.utf.validate, or rely on exceptions? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Always std.utf.validate, or rely on exceptions?

Thread overview

Always std.utf.validate, or rely on exceptions?
Mar 02, 2017 SimonN
Mar 02, 2017 ketmar
Mar 02, 2017 Kagamin
Mar 02, 2017 Kagamin
Mar 02, 2017 SimonN

March 02, 2017

Always std.utf.validate, or rely on exceptions?

Posted by SimonN

SimonN

Many functions in std.utf throw UTFException when we pass them malformed UTF, and many functions in std.string throw StringException. From this, I developed a habit of reading user files like so, hoping that it traps all malformed UTF:

    try {
        // call D standard lib on string from file
    }
    catch (Exception e) {
        // treat file as bogus
        // log e.msg
    }

But std.string.stripRight!string calls std.utf.codeLength, which doesn't ever throw on malformed UTF, but asserts false on errors:

    ubyte codeLength(C)(dchar c) @safe pure nothrow @nogc
        if (isSomeChar!C)
    {
        static if (C.sizeof == 1)
        {
            if (c <= 0x7F) return 1;
            if (c <= 0x7FF) return 2;
            if (c <= 0xFFFF) return 3;
            if (c <= 0x10FFFF) return 4;
            assert(false);
        }
        // ...
    }

Apparently, once my code calls stripRight, I should be sure that this string contains only well-formed UTF. Right now, my code doesn't guarantee that.

Should I always validate text from files manually with std.utf.validate?

Or should I memorize which functions throw, then validate manually whenever I call the non-throwing UTF functions? What is the pattern behind what throws and what asserts false?

-- Simon

March 02, 2017

Re: Always std.utf.validate, or rely on exceptions?

Posted by ketmar
in reply to SimonN

ketmar

Posted in reply to SimonN

SimonN wrote:

> Should I always validate text from files manually with std.utf.validate?
>
> Or should I memorize which functions throw, then validate manually whenever I call the non-throwing UTF functions? What is the pattern behind what throws and what asserts false?

i'd say: "ALWAYS validate before ANY further processing".

March 02, 2017

Re: Always std.utf.validate, or rely on exceptions?

Posted by Kagamin
in reply to SimonN

Kagamin

Posted in reply to SimonN

On Thursday, 2 March 2017 at 16:20:30 UTC, SimonN wrote:
> Should I always validate text from files manually with std.utf.validate?
>
> Or should I memorize which functions throw, then validate manually whenever I call the non-throwing UTF functions? What is the pattern behind what throws and what asserts false?

If you expect file with malformed utf that can cause you trouble and want to handle it gracefully, pass its content through validator and catch exception from validator. Functions working with strings usually assume valid utf and can behave incorrectly on malformed utf.

March 02, 2017

Re: Always std.utf.validate, or rely on exceptions?

Posted by Kagamin
in reply to Kagamin

Kagamin

Posted in reply to Kagamin

On Thursday, 2 March 2017 at 17:03:01 UTC, Kagamin wrote:
> Functions working with strings usually assume valid utf and can behave incorrectly on malformed utf.

Or rather they report an unrecoverable error terminating the process.

March 02, 2017

Re: Always std.utf.validate, or rely on exceptions?

Posted by SimonN
in reply to Kagamin

SimonN

Posted in reply to Kagamin

ketmar wrote:
> i'd say: "ALWAYS validate before ANY further processing".

On Thursday, 2 March 2017 at 17:03:01 UTC, Kagamin wrote:
> If you expect file with malformed utf that can cause you trouble and want to handle it gracefully, pass its content through validator and catch exception from validator.

Thanks. Now, I still call std.stdio.byLine or std.stdio.lines on the raw data, this seems robust with random binary blobs. Then, I validate each line before calling anything else.

-- Simon

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation