Why UTF-8/16 character encodings? (page 8) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Why UTF-8/16 character encodings? (page 8)

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Vladimir Panteleev
in reply to Joakim

Vladimir Panteleev

Posted in reply to Joakim

On Sunday, 26 May 2013 at 15:23:33 UTC, Joakim wrote:
> On Sunday, 26 May 2013 at 14:37:27 UTC, H. S. Teoh wrote:
>> IHBT.

> I've made my position clear: I don't write toy code.

1. Make extraordinary claims
2. Refuse to back up said claims with small examples because "I don't write toy code"
3. Refuse to back up said claims with elaborate examples because "It will
take too long"
4. Use arrogant tone throughout thread, imply that you're smarter than the creators of UTF, and creators and long-time contributors of D (never contribute code to D yourself)

Result: 70-post thread

Conclusion: Successful troll is successful :)

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Dmitry Olshansky
in reply to Vladimir Panteleev

Dmitry Olshansky

Posted in reply to Vladimir Panteleev

26-May-2013 20:54, Vladimir Panteleev пишет:
> On Sunday, 26 May 2013 at 15:23:33 UTC, Joakim wrote:
>> On Sunday, 26 May 2013 at 14:37:27 UTC, H. S. Teoh wrote:
>>> IHBT.
>
>> I've made my position clear: I don't write toy code.
>
> 1. Make extraordinary claims
> 2. Refuse to back up said claims with small examples because "I don't
> write toy code"
> 3. Refuse to back up said claims with elaborate examples because "It will
> take too long"
> 4. Use arrogant tone throughout thread, imply that you're smarter than
> the creators of UTF, and creators and long-time contributors of D (never
> contribute code to D yourself)
>
> Result: 70-post thread
>
> Conclusion: Successful troll is successful :)

+1

Result: 71-post thread ;)

-- 
Dmitry Olshansky

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Joakim
in reply to Vladimir Panteleev

Joakim

Posted in reply to Vladimir Panteleev

On Sunday, 26 May 2013 at 16:54:53 UTC, Vladimir Panteleev wrote:
> 1. Make extraordinary claims
What is extraordinary about "UTF-8 is shit?"  It is obviously so.

> 2. Refuse to back up said claims with small examples because "I don't write toy code"
I never refused small examples.  I have provided several analyses of how a single-byte encoding would compare to UTF-8, along with listing optimizations that make it much faster.  I finally refused to analyze Teoh's examples because he accused me of trolling and demanded code as the only possible explanation.

> 3. Refuse to back up said claims with elaborate examples because "It will
> take too long"
You are confused.  What I said is "I don't write toy code, non-toy code would take too long, and you wouldn't understand it anyway."

The whole demand for code is idiotic anyway.

If I outlined TCP/IP as a packet-switched network and briefly sketched what the header might look like and the queuing algorithms that I might use, I can just imagine you saying, "But there's no code... how can I possibly understand what you're saying without any code?"  If you can't understand networking without seeing working code, you're not equipped to understand it anyway, same here.

> 4. Use arrogant tone throughout thread, imply that you're smarter than the creators of UTF, and creators and long-time contributors of D (never contribute code to D yourself)
Hey, if the shoe fits. :)

I actually had a lot of respect for Walter till I read this thread.  I can only assume that his past experience with code pages was so maddening that he cannot be rational on the subject of going to any single-byte encoding that would be similar, same with others griping about code pages above.  I also don't think he and others are paying much attention to the various points I'm raising, hence his recent claim that I wouldn't handle Chinese, when I addressed that from the beginning.

Or it could just be that I'm much smarter than everybody else in this thread, ;) I can't rule it out given the often silly responses I've been getting.

> Result: 70-post thread
>
> Conclusion: Successful troll is successful :)
Conclusion: Vladimir trolls me because he doesn't understand what I'm talking about, which is why he doesn't raise a single technical point in this post.

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Andrei Alexandrescu
in reply to Joakim

Andrei Alexandrescu

Posted in reply to Joakim

On 5/26/13 1:45 PM, Joakim wrote:
> What is extraordinary about "UTF-8 is shit?" It is obviously so.

Congratulations, you are literally the only person on the Internet who said so: http://goo.gl/TFhUO

On 5/26/13 1:45 PM, Joakim wrote:
> Or it could just be that I'm much smarter than everybody else in this
> thread, ;) I can't rule it out given the often silly responses I've been
> getting.

One odd thing about this thread is it's extremely rare that most everybody in this forum raises like one to the same opinion. Usually it's like whatever the topic, a debate will ensue between two ad-hoc groups.

It has become clear that people involved in this have gotten too frustrated to have a constructive exchange. I suggest we collectively drop it. What you may want to do is to use D's modeling abilities to define a great string type pursuant to your ideas. If it is as good as you believe it could, then it will enjoy use and adoption and everybody will be better off.

Andrei

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Joakim
in reply to Andrei Alexandrescu

Joakim

Posted in reply to Andrei Alexandrescu

On Sunday, 26 May 2013 at 18:29:38 UTC, Andrei Alexandrescu wrote:
> On 5/26/13 1:45 PM, Joakim wrote:
>> What is extraordinary about "UTF-8 is shit?" It is obviously so.
>
> Congratulations, you are literally the only person on the Internet who said so: http://goo.gl/TFhUO
Haha, that is funny, :D though "unicode is shit" returns at least 8 results.  How many people even know how UTF-8 works?  Given how few people use it, I'm not surprised most don't know enough about how it works to criticize it.

> On 5/26/13 1:45 PM, Joakim wrote:
>> Or it could just be that I'm much smarter than everybody else in this
>> thread, ;) I can't rule it out given the often silly responses I've been
>> getting.
>
> One odd thing about this thread is it's extremely rare that most everybody in this forum raises like one to the same opinion. Usually it's like whatever the topic, a debate will ensue between two ad-hoc groups.
I suspect it's because I'm presenting an original idea about a not well-understood technology, Unicode, not the usual "emacs vs vim" or "D should not have null references" argument.  For example, how many here know what UCS is?  Most people never dig into Unicode, it's just a black box that is annoying to deal with.

> It has become clear that people involved in this have gotten too frustrated to have a constructive exchange. I suggest we collectively drop it. What you may want to do is to use D's modeling abilities to define a great string type pursuant to your ideas. If it is as good as you believe it could, then it will enjoy use and adoption and everybody will be better off.
I agree.  I am enjoying your book, btw.

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Mr. Anonymous
in reply to Joakim

Mr. Anonymous

Posted in reply to Joakim

On Sunday, 26 May 2013 at 19:05:32 UTC, Joakim wrote:
> On Sunday, 26 May 2013 at 18:29:38 UTC, Andrei Alexandrescu wrote:
>> On 5/26/13 1:45 PM, Joakim wrote:
>>> What is extraordinary about "UTF-8 is shit?" It is obviously so.
>>
>> Congratulations, you are literally the only person on the Internet who said so: http://goo.gl/TFhUO
> Haha, that is funny, :D though "unicode is shit" returns at least 8 results.  How many people even know how UTF-8 works?  Given how few people use it, I'm not surprised most don't know enough about how it works to criticize it.

On the other hand:
https://www.google.com/search?q=%22utf-8+is+awesome%22

:D

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Marcin Mstowski
in reply to Joakim

Marcin Mstowski

Posted in reply to Joakim

Attachments:

text/html part

Character Data Representation
Architecture<http://www-01.ibm.com/software/globalization/cdra/>by
IBM. It is what you want to do with additions and it is available
since
1995.
When you come up with an inventive idea, i suggest you to first check what
was already done in that area and then rethink this again to check if you
can do this better or improve existing solution. Other approaches are
usually waste of time and efforts, unless you are doing this for fun or you
can't use existing solutions due to problems with license, copyrights,
price, etc.


On Sun, May 26, 2013 at 9:05 PM, Joakim <joakim@airpost.net> wrote:

> On Sunday, 26 May 2013 at 18:29:38 UTC, Andrei Alexandrescu wrote:
>
>> On 5/26/13 1:45 PM, Joakim wrote:
>>
>>> What is extraordinary about "UTF-8 is shit?" It is obviously so.
>>>
>>
>> Congratulations, you are literally the only person on the Internet who said so: http://goo.gl/TFhUO
>>
> Haha, that is funny, :D though "unicode is shit" returns at least 8 results.  How many people even know how UTF-8 works?  Given how few people use it, I'm not surprised most don't know enough about how it works to criticize it.
>
>
>  On 5/26/13 1:45 PM, Joakim wrote:
>>
>>> Or it could just be that I'm much smarter than everybody else in this thread, ;) I can't rule it out given the often silly responses I've been getting.
>>>
>>
>> One odd thing about this thread is it's extremely rare that most everybody in this forum raises like one to the same opinion. Usually it's like whatever the topic, a debate will ensue between two ad-hoc groups.
>>
> I suspect it's because I'm presenting an original idea about a not well-understood technology, Unicode, not the usual "emacs vs vim" or "D should not have null references" argument.  For example, how many here know what UCS is?  Most people never dig into Unicode, it's just a black box that is annoying to deal with.
>
>
>  It has become clear that people involved in this have gotten too
>> frustrated to have a constructive exchange. I suggest we collectively drop it. What you may want to do is to use D's modeling abilities to define a great string type pursuant to your ideas. If it is as good as you believe it could, then it will enjoy use and adoption and everybody will be better off.
>>
> I agree.  I am enjoying your book, btw.
>

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Joakim
in reply to Mr. Anonymous

Joakim

Posted in reply to Mr. Anonymous

On Sunday, 26 May 2013 at 19:11:42 UTC, Mr. Anonymous wrote:
> On Sunday, 26 May 2013 at 19:05:32 UTC, Joakim wrote:
>> On Sunday, 26 May 2013 at 18:29:38 UTC, Andrei Alexandrescu wrote:
>>> On 5/26/13 1:45 PM, Joakim wrote:
>>>> What is extraordinary about "UTF-8 is shit?" It is obviously so.
>>>
>>> Congratulations, you are literally the only person on the Internet who said so: http://goo.gl/TFhUO
>> Haha, that is funny, :D though "unicode is shit" returns at least 8 results.  How many people even know how UTF-8 works?  Given how few people use it, I'm not surprised most don't know enough about how it works to criticize it.
>
> On the other hand:
> https://www.google.com/search?q=%22utf-8+is+awesome%22
I'm not sure if you were trying to make my point, but you just did.  There are only 19 results for that search string.  If UTF-8 were such a rousing success and most developers found it easy to understand, you wouldn't expect only 19 results for it and 8 against it.  The paucity of results suggests most don't know how it works or perhaps simply annoyed by it, liking the internationalization but disliking the complexity.

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Mr. Anonymous
in reply to Joakim

Mr. Anonymous

Posted in reply to Joakim

On Sunday, 26 May 2013 at 19:25:37 UTC, Joakim wrote:
> On Sunday, 26 May 2013 at 19:11:42 UTC, Mr. Anonymous wrote:
>> On Sunday, 26 May 2013 at 19:05:32 UTC, Joakim wrote:
>>> On Sunday, 26 May 2013 at 18:29:38 UTC, Andrei Alexandrescu wrote:
>>>> On 5/26/13 1:45 PM, Joakim wrote:
>>>>> What is extraordinary about "UTF-8 is shit?" It is obviously so.
>>>>
>>>> Congratulations, you are literally the only person on the Internet who said so: http://goo.gl/TFhUO
>>> Haha, that is funny, :D though "unicode is shit" returns at least 8 results.  How many people even know how UTF-8 works?  Given how few people use it, I'm not surprised most don't know enough about how it works to criticize it.
>>
>> On the other hand:
>> https://www.google.com/search?q=%22utf-8+is+awesome%22
> I'm not sure if you were trying to make my point, but you just did.  There are only 19 results for that search string.  If UTF-8 were such a rousing success and most developers found it easy to understand, you wouldn't expect only 19 results for it and 8 against it.  The paucity of results suggests most don't know how it works or perhaps simply annoyed by it, liking the internationalization but disliking the complexity.

Man, you're a bullshit machine!

May 26, 2013

Re: Why UTF-8/16 character encodings?

Posted by Joakim
in reply to Marcin Mstowski

Joakim

Posted in reply to Marcin Mstowski

On Sunday, 26 May 2013 at 19:20:15 UTC, Marcin Mstowski wrote:
> Character Data Representation
> Architecture<http://www-01.ibm.com/software/globalization/cdra/>by
> IBM. It is what you want to do with additions and it is available
> since
> 1995.
> When you come up with an inventive idea, i suggest you to first check what
> was already done in that area and then rethink this again to check if you
> can do this better or improve existing solution. Other approaches are
> usually waste of time and efforts, unless you are doing this for fun or you
> can't use existing solutions due to problems with license, copyrights,
> price, etc.
You might be right, but I gave it a quick look and can't make out what the encoding actually is.  There is an appendix that lists several possible encodings, including UTF-8!

Also, one of the first pages talks about representations of floating point and integer numbers, which are outside the purview of the text encodings we're talking about.  I cannot possibly be expected to know about every dead format out there.  If you can show that it is materially similar to my single-byte encoding idea, it might be worth looking into.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation