September 18, 2014
On Thursday, 18 September 2014 at 16:05:15 UTC, ketmar via Digitalmars-d-learn wrote:
> On Thu, 18 Sep 2014 15:53:02 +0000
> Ilya Yaroshenko via Digitalmars-d-learn
> <digitalmars-d-learn@puremagic.com> wrote:
>
>> Seriously, console application (in Russian lang. Windows) is not unicode-ready.
> that's 'cause authors tend to ignore W-functions. but GNU/Linux is not
> better, 'cause authors tend to ignore any encodings except latin1 and
> utf-8. koi? what is koi? it's broken utf-8, we don't know about koi!
> and we don't care what your locale says, it's utf-8! bwah, D compiler
> does just that.

"one ring to rule them all"
UTF-8 = Lord of the encodings.
September 18, 2014
On Thursday, 18 September 2014 at 16:05:15 UTC, ketmar via
Digitalmars-d-learn wrote:
> On Thu, 18 Sep 2014 15:53:02 +0000
> Ilya Yaroshenko via Digitalmars-d-learn
> <digitalmars-d-learn@puremagic.com> wrote:
>
>> Seriously, console application (in Russian lang. Windows) is not unicode-ready.
> that's 'cause authors tend to ignore W-functions. but GNU/Linux is not
> better, 'cause authors tend to ignore any encodings except latin1 and
> utf-8. koi? what is koi? it's broken utf-8, we don't know about koi!
> and we don't care what your locale says, it's utf-8! bwah, D compiler
> does just that.

"one ring to rule them all"
UTF-8 = Lord of the encodings.
September 18, 2014
On Thu, 18 Sep 2014 16:24:17 +0000
Ilya Yaroshenko via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

> You can choice encoding for console in Linux
yes. and i chose koi8. yet many utilities tend to ignore my locale
when reading files (hey, D compiler, i'm talking about you!). i don't
care about localized messages (i'm using English messages anyway), but
trying to tell me that my text file is invalid utf-8, or my filename is
invalid utf-8, or spitting utf-8 encoded messages to my terminal drives
me mad. what is so wrong with locale detection that virtually nobody
does that? we have iconv, it's readily available on any decent
GNU/Linux platform, yet it's still so hard to detect that stinky locale
and convert that stinky utf-8 to it? BS. (hey, phobos, i'm talking about
your stdout.write() here too!)

the whole "utf-8 or die" attitude has something very wrong in it.


September 18, 2014
On Thu, 18 Sep 2014 16:31:08 +0000
Ilya Yaroshenko via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

> "one ring to rule them all"
> UTF-8 = Lord of the encodings.
i want 42th symbol from the string. what? what do you mean saying that i must scan the whole string from the beginning to get it? oh, High Lord, this one Lord is fake!


September 18, 2014
On Thursday, 18 September 2014 at 16:51:06 UTC, ketmar via Digitalmars-d-learn wrote:
> On Thu, 18 Sep 2014 16:31:08 +0000
> Ilya Yaroshenko via Digitalmars-d-learn
> <digitalmars-d-learn@puremagic.com> wrote:
>
>> "one ring to rule them all"
>> UTF-8 = Lord of the encodings.
> i want 42th symbol from the string. what? what do you mean saying that
> i must scan the whole string from the beginning to get it? oh, High
> Lord, this one Lord is fake!

That's why a while ago I was considering convert a string from UTF-8 to UTF-32. UTF-32 is nice I don't understand when people say there are no any advantage to use it. Indexing is just possible. Memory size isn't much an issue.

I needed to extend support for UTF-8 in a program where I had some routines where I could move forward and backward very easily just indexing but using UTF-8 it isn't possible so I needed to make my own an iterator when I need to save a pointer instead of a index. In memory usage it isn't so bad since a size of that index is same as pointer but the structure of the program was a bit "ugly", a kind of "hack", IMHO.
September 18, 2014
On Thursday, 18 September 2014 at 16:49:14 UTC, ketmar via Digitalmars-d-learn wrote:
> On Thu, 18 Sep 2014 16:24:17 +0000
> Ilya Yaroshenko via Digitalmars-d-learn
> <digitalmars-d-learn@puremagic.com> wrote:
>
>> You can choice encoding for console in Linux
> yes. and i chose koi8. yet many utilities tend to ignore my locale
> when reading files (hey, D compiler, i'm talking about you!). i don't
> care about localized messages (i'm using English messages anyway), but
> trying to tell me that my text file is invalid utf-8, or my filename is
> invalid utf-8, or spitting utf-8 encoded messages to my terminal drives
> me mad. what is so wrong with locale detection that virtually nobody
> does that? we have iconv, it's readily available on any decent
> GNU/Linux platform, yet it's still so hard to detect that stinky locale
> and convert that stinky utf-8 to it? BS. (hey, phobos, i'm talking about
> your stdout.write() here too!)
>
> the whole "utf-8 or die" attitude has something very wrong in it.

I didn't know about this encoding. Why should you use KOI8-R instead of UTF-8? what does it conver that UTF-8 didn't? I used to think UTF-8 does conver all the alphabets around, japonese people does use it, isn't?
September 18, 2014
On Thu, 18 Sep 2014 18:14:36 +0000
AsmMan via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> I didn't know about this encoding. Why should you use KOI8-R instead of UTF-8? what does it conver that UTF-8 didn't? I used to think UTF-8 does conver all the alphabets around, japonese people does use it, isn't?
koi8: one symbol == one byte.
utf8: one symbol == ... ah, who knows? only Shadow knows...

koi8-u is enough for me. i can use three languages with it and still have my strings easily indexable. it's ok to use utf-8 when i need to interchange some data with "outer world" -- i.e. send or receive some text over network. but i can't see why i must use utf-8 for my local data. i know what i'm doing yet... yet i can't have koi8 string in my D code without ugly "\x" escapes. i can't have koi8 text in my comments. Great Lord, it's just comments, it's not even DDoc, why can't i write anything i want there?!


September 18, 2014
On Thu, 18 Sep 2014 21:26:27 +0300
ketmar via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

btw, D lexer tries to validate even shebangs. WUT?! why can't i put non-utf8 text in shebang? ah, it's "utf-8 or die" again, i see...


September 19, 2014
On Thursday, 18 September 2014 at 15:53:03 UTC, Ilya Yaroshenko wrote:
> Windows 9 will be based on Windows 98 =)
> Seriously, console application (in Russian lang. Windows) is not unicode-ready.

Console API is unicode too. What can be not unicode is console font, but that can happen for GUI too.
September 19, 2014
On Thursday, 18 September 2014 at 18:26:37 UTC, ketmar via Digitalmars-d-learn wrote:
> i can't have koi8 string in my D
> code without ugly "\x" escapes. i can't have koi8 text in my comments.
> Great Lord, it's just comments, it's not even DDoc, why can't i write
> anything i want there?!

Editors usually can handle various encodings independently from your system settings, you should be able to keep D code in utf-8. For example, notepad on windows can handle utf-8 even though it's not a native encoding for windows.
1 2
Next ›   Last »