May 12, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack Stouffer | On 5/12/2016 5:47 PM, Jack Stouffer wrote:
> D is much less popular now than was Python at the time, and Python 2 problems
> were more straight forward than the auto-decoding problem. You'll need a very
> clear migration path, years long deprecations, and automatic tools in order to
> make the transition work, or else D's usage will be permanently damaged.
I agree, if it is possible at all.
|
May 13, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: > 2. Every time one wants an algorithm to work with both strings and ranges, you wind up special casing the strings to defeat the autodecoding, or to decode the ranges. Having to constantly special case it makes for more special cases when plugging together components. These issues often escape detection when unittesting because it is convenient to unittest only with arrays. This is a great example of special casing in Phobos that someone showed me: https://github.com/dlang/phobos/blob/master/std/algorithm/searching.d#L1714 |
May 13, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote:
>
> Here are some that are not matters of opinion.
>
> 1. Ranges of characters do not autodecode, but arrays of characters do. This is a glaring inconsistency.
>
> 2. Every time one wants an algorithm to work with both strings and ranges, you wind up special casing the strings to defeat the autodecoding, or to decode the ranges. Having to constantly special case it makes for more special cases when plugging together components. These issues often escape detection when unittesting because it is convenient to unittest only with arrays.
>
> 3. Wrapping an array in a struct with an alias this to an array turns off autodecoding, another special case.
>
> 4. Autodecoding is slow and has no place in high speed string processing.
>
> 5. Very few algorithms require decoding.
>
> 6. Autodecoding has two choices when encountering invalid code units - throw or produce an error dchar. Currently, it throws, meaning no algorithms using autodecode can be made nothrow.
>
> 7. Autodecode cannot be used with unicode path/filenames, because it is legal (at least on Linux) to have invalid UTF-8 as filenames. It turns out in the wild that pure Unicode is not universal - there's lots of dirty Unicode that should remain unmolested, and autocode does not play with that.
>
> 8. In my work with UTF-8 streams, dealing with autodecode has caused me considerably extra work every time. A convenient timesaver it ain't.
>
> 9. Autodecode cannot be turned off, i.e. it isn't practical to avoid importing std.array one way or another, and then autodecode is there.
>
> 10. Autodecoded arrays cannot be RandomAccessRanges, losing a key benefit of being arrays in the first place.
>
> 11. Indexing an array produces different results than autodecoding, another glaring special case.
Wow, that's eleven things wrong with just one tiny element of D, with the potential to cause problems, whether fixed or not. And I get called a troll and other names when I list half a dozen things wrong with D, my posts get removed/censored, etc, all because I try to inform people not to waste time with D because it's a broken and failed language.
*sigh*
Phobos, a piece of useless rock orbiting a dead planet ... the irony.
|
May 13, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Hicks | On Friday, 13 May 2016 at 06:50:49 UTC, Bill Hicks wrote:
> *rant*
Actually, chap, it's the attitude that's the turn-off in your post there. Listing problems in order to improve them, and listing problems to convince people something is a waste of time are incompatible mindsets around here.
|
May 13, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Hicks | On Friday, 13 May 2016 at 06:50:49 UTC, Bill Hicks wrote:
> On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote:
>> (...)
> Wow, that's eleven things wrong with just one tiny element of D, with the potential to cause problems, whether fixed or not. And I get called a troll and other names when I list half a dozen things wrong with D, my posts get removed/censored, etc, all because I try to inform people not to waste time with D because it's a broken and failed language.
>
> *sigh*
>
> Phobos, a piece of useless rock orbiting a dead planet ... the irony.
You get banned because there is a difference between torpedoing a project and having constructive criticism.
Also, you are missing the point by claiming that a technical problem is sure to kill D. Note that very successful languages like C++, python and so on also have undergone heated discussions about various features, and often live design mistakes for many years. The real reason why languages are successful is what they enable, not how many quirks they have.
Quirks are why they get replaced by others 20 years later. :)
|
May 13, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack Stouffer | On Friday, 13 May 2016 at 00:47:04 UTC, Jack Stouffer wrote:
> D is much less popular now than was Python at the time, and Python 2 problems were more straight forward than the auto-decoding problem. You'll need a very clear migration path, years long deprecations, and automatic tools in order to make the transition work, or else D's usage will be permanently damaged.
Python 2 is/was deployed at a much larger scale and with far more library dependencies, so I don't think it is comparable. It is easier for D to get away with breaking changes.
I am still using Python 2.7 exclusively, but now I use:
from __future__ import division, absolute_import, with_statement, unicode_literals
D can do something similar.
C++ is using a comparable solution. Use switches to turn on different compatibility levels.
|
May 13, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, 13 May 2016 at 01:00:54 UTC, Walter Bright wrote:
> On 5/12/2016 5:47 PM, Jack Stouffer wrote:
>> D is much less popular now than was Python at the time, and Python 2 problems
>> were more straight forward than the auto-decoding problem. You'll need a very
>> clear migration path, years long deprecations, and automatic tools in order to
>> make the transition work, or else D's usage will be permanently damaged.
>
> I agree, if it is possible at all.
I don't know to which extent my problems with string handling are related to autodecode. However, I had to write some utility functions to get around issues with code points, graphemes and the like. While it is not a huge issue in terms of programming time, it does slow down my program, because even simple operations may be referred to a utility function to make sure the result is correct (.length for example). But that might be an issue related to Unicode in general (or D's handling of it).
If autodecode is killed, could we have a test version asap? I'd be willing to test my programs with autodecode turned off and see what happens. Others should do likewise and we could come up with a transition strategy based on what happened.
|
May 13, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Hicks | On Friday, 13 May 2016 at 06:50:49 UTC, Bill Hicks wrote:
>
> Wow, that's eleven things wrong with just one tiny element of D, with the potential to cause problems, whether fixed or not. And I get called a troll and other names when I list half a dozen things wrong with D, my posts get removed/censored, etc, all because I try to inform people not to waste time with D because it's a broken and failed language.
>
> *sigh*
>
> Phobos, a piece of useless rock orbiting a dead planet ... the irony.
Is there any PL that doesn't have multiple issues? Look at Swift. They keep changing it, although it started out as _the_ big thing, because, you know, it's Apple. C#, Java, Go and of course the chronically ill C++. There is no such thing as the perfect PL, and as hardware is changing, PLs are outdated anyway and have to catch up. The question is not whether a language sucks or not, the question is which language sucks the least for the task at hand.
PS I wonder does Bill Hicks know you're using his name? But I guess he's lost interest in this planet and happily lives on Mars now.
|
May 13, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Hicks | On Friday, 13 May 2016 at 06:50:49 UTC, Bill Hicks wrote:
> not to waste time with D because it's a broken and failed language.
D is a better broken thing among all the broken things in this broken world, so it's to be expected to be preferred to spend time on.
|
May 13, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Thursday, May 12, 2016 13:15:45 Walter Bright via Digitalmars-d wrote:
> On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote:
> > I am as unclear about the problems of autodecoding as I am about the
> > necessity to remove curl. Whenever I ask I hear some arguments that work
> > well emotionally but are scant on reason and engineering. Maybe it's
> > time to rehash them? I just did so about curl, no solid argument seemed
> > to come together. I'd be curious of a crisp list of grievances about
> > autodecoding. -- Andrei
>
> Here are some that are not matters of opinion.
>
> 1. Ranges of characters do not autodecode, but arrays of characters do. This is a glaring inconsistency.
>
> 2. Every time one wants an algorithm to work with both strings and ranges, you wind up special casing the strings to defeat the autodecoding, or to decode the ranges. Having to constantly special case it makes for more special cases when plugging together components. These issues often escape detection when unittesting because it is convenient to unittest only with arrays.
>
> 3. Wrapping an array in a struct with an alias this to an array turns off autodecoding, another special case.
>
> 4. Autodecoding is slow and has no place in high speed string processing.
>
> 5. Very few algorithms require decoding.
>
> 6. Autodecoding has two choices when encountering invalid code units - throw or produce an error dchar. Currently, it throws, meaning no algorithms using autodecode can be made nothrow.
>
> 7. Autodecode cannot be used with unicode path/filenames, because it is legal (at least on Linux) to have invalid UTF-8 as filenames. It turns out in the wild that pure Unicode is not universal - there's lots of dirty Unicode that should remain unmolested, and autocode does not play with that.
>
> 8. In my work with UTF-8 streams, dealing with autodecode has caused me considerably extra work every time. A convenient timesaver it ain't.
>
> 9. Autodecode cannot be turned off, i.e. it isn't practical to avoid importing std.array one way or another, and then autodecode is there.
>
> 10. Autodecoded arrays cannot be RandomAccessRanges, losing a key benefit of being arrays in the first place.
>
> 11. Indexing an array produces different results than autodecoding, another glaring special case.
It also results in constantly special-casing algorithms for narrow strings in order to avoid auto-decoding. Phobos does this all over the place. We have a ridiculous amount of code in Phobos just to avoid auto-decoding, and anyone who wants high performance will have to do the same.
And it's not like auto-decoding is even correct. It would be one thing if auto-decoding were fully correct but slow, but to be fully correct, it would need to operate at the grapheme level, not the code point level. So, by default, we get slower code without actually getting fully correct code.
So, we're neither fast nor correct. We _are_ correct in more cases than we'd be if we simply acted like ASCII was all there was, but what we end up with is the illusion that we're correct when we're not. IIRC, Andrei talked in TDPL about how Java's choice to go with UTF-16 was worse than the choice to go with UTF-8, because it was correct in many more cases to operate on the code unit level as if a code unit were a character, and it was therefore harder to realize that what you were doing was wrong, whereas with UTF-8, it's obvious very quickly. We currently have that same problem with auto-decoding except that it's treating UTF-32 code units as if they were full characters rather than treating UTF-16 code units as if they were full characters.
Ideally, algorithms would be Unicode aware as appropriate, but the default would be to operate on code units with wrappers to handle decoding by code point or grapheme. Then it's easy to write fast code while still allowing for full correctness. Granted, it's not necessarily easy to get correct code that way, but anyone who wants fully correctness without caring about efficiency can just use ranges of graphemes. Ranges of code points are rare regardless.
Based on what I've seen in previous conversations on auto-decoding over the past few years (be it in the newsgroup, on github, or at dconf), most of the core devs think that auto-decoding was a major blunder that we continue to pay for. But unfortunately, even if we all agree that it was a huge mistake and want to fix it, the question remains of how to do that without breaking tons of code - though since AFAIK, Andrei is still in favor of auto-decoding, we'd have a hard time going forward with plans to get rid of it even if we had come up with a good way of doing so. But I would love it if we could get rid of auto-decoding and clean up string handling in D.
- Jonathan M Davis
|
Copyright © 1999-2021 by the D Language Foundation