June 02, 2016
On 6/2/2016 2:05 PM, tsbockman wrote:
> On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote:
>> What is supposed to be done with "do not merge" PRs other than close them?
>
> Occasionally people need to try something on the auto tester (not sure if that's
> relevant to that particular PR, though).

I've done that, but that doesn't apply here.


> Presumably if someone marks their own
> PR as "do not merge", it means they're planning to either close it themselves
> after it has served its purpose, or they plan to fix/finish it and then remove
> the "do not merge" label.

That doesn't seem to apply here, either.


> Either way, they shouldn't be closed just because they say "do not merge"
> (unless they're abandoned or something, obviously).

Something like that could not be merged until 132 other PRs are done to fix Phobos. It doesn't belong as a PR.
June 02, 2016
On 06/02/2016 05:58 PM, Walter Bright wrote:
> On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:
>> The lambda returns bool. -- Andrei
>
> Yes, I was wrong about that. But the point still stands with:
>
>  > * s.balancedParens('〈', '〉') works only with autodecoding.
>  > * s.canFind('ö') works only with autodecoding. It returns always
> false without.
>
> Can be made to work without autodecoding.

By special casing? Perhaps. I seem to recall though that one major issue with autodecoding was that it special-cases certain algorithms. So you'd need to go through all of std.algorithm and make sure you can special-case your way out of situations that work today.


Andrei

June 02, 2016
On 6/2/2016 3:11 PM, Timon Gehr wrote:
> Well, this is a somewhat different case, because 10000 is just not representable
> as a byte. Every value that fits in a byte fits in an int though.
>
> It's different for code units. They are incompatible both ways.

Not exactly. (c == 'ö') is always false for the same reason that (b == 1000) is always false.

I'm not sure what the right answer is here.
June 02, 2016
On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
> On 6/2/2016 12:34 PM, deadalnix wrote:
>> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
>>> Pretty much everything. Consider s and s1 string variables with possibly
>>> different encodings (UTF8/UTF16).
>>>
>>> * s.all!(c => c == 'ö') works only with autodecoding. It returns always false
>>> without.
>>>
>>
>> False. Many characters can be represented by different sequences of codepoints.
>> For instance, ê can be ê as one codepoint or ^ as a modifier followed by e. ö is
>> one such character.
>
> There are 3 levels of Unicode support. What Andrei is talking about is Level 1.
>
> http://unicode.org/reports/tr18/tr18-5.1.html
>
> I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness.

There are languages that make heavy use of diacritics, often several on a single "character". Hebrew is a good example. Should there be only one valid ordering of any given set of diacritics on any given character? It's an interesting idea, but it's not how things are.
June 02, 2016
On 6/2/2016 3:10 PM, Marco Leise wrote:
> we haven't looked into borrowing/scoped enough

That's my fault.

As for scoped, the idea is to make scope work analogously to DIP25's 'return ref'. I don't believe we need borrowing, we've worked out another solution that will work for ref counting.

Please do not reply to this in this thread - start a new one if you wish to continue with this topic.

June 02, 2016
On Thursday, 2 June 2016 at 22:20:49 UTC, Walter Bright wrote:
> On 6/2/2016 2:05 PM, tsbockman wrote:
>> Presumably if someone marks their own
>> PR as "do not merge", it means they're planning to either close it themselves
>> after it has served its purpose, or they plan to fix/finish it and then remove
>> the "do not merge" label.
>
> That doesn't seem to apply here, either.
>
>
>> Either way, they shouldn't be closed just because they say "do not merge"
>> (unless they're abandoned or something, obviously).
>
> Something like that could not be merged until 132 other PRs are done to fix Phobos. It doesn't belong as a PR.

I was just responding to the general question you posed about "do not merge" PRs, not really arguing for that one, in particular, to be re-opened. I'm sure @wilzbach is willing to explain if anyone cares to ask him why he did it as a PR, though.
June 03, 2016
On 03.06.2016 00:26, Walter Bright wrote:
> On 6/2/2016 3:11 PM, Timon Gehr wrote:
>> Well, this is a somewhat different case, because 10000 is just not
>> representable
>> as a byte. Every value that fits in a byte fits in an int though.
>>
>> It's different for code units. They are incompatible both ways.
>
> Not exactly. (c == 'ö') is always false for the same reason that (b ==
> 1000) is always false.
> ...

Yes. And _additionally_, some other concerns apply that are not there for byte vs. int. I.e. if b == 10000 is disallowed, then c == d should be disallowed too, but b == 10000 can be allowed even if c == d is disallowed.

> I'm not sure what the right answer is here.

char to dchar is a lossy conversion, so it shouldn't happen.
byte to int is a lossless conversion, so there is no problem a priori.
June 03, 2016
On 03.06.2016 00:23, Andrei Alexandrescu wrote:
> On 06/02/2016 05:58 PM, Walter Bright wrote:
>> On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:
>>> The lambda returns bool. -- Andrei
>>
>> Yes, I was wrong about that. But the point still stands with:
>>
>>  > * s.balancedParens('〈', '〉') works only with autodecoding.
>>  > * s.canFind('ö') works only with autodecoding. It returns always
>> false without.
>>
>> Can be made to work without autodecoding.
>
> By special casing? Perhaps. I seem to recall though that one major issue
> with autodecoding was that it special-cases certain algorithms.

The major issue is that it special cases when there's different, more natural semantics available.
June 02, 2016
On 6/2/2016 3:23 PM, Andrei Alexandrescu wrote:
> On 06/02/2016 05:58 PM, Walter Bright wrote:
>>  > * s.balancedParens('〈', '〉') works only with autodecoding.
>>  > * s.canFind('ö') works only with autodecoding. It returns always
>> false without.
>>
>> Can be made to work without autodecoding.
>
> By special casing? Perhaps.

The argument to canFind() can be detected as not being a char, then decoded into a sequence of char's, then forwarded to a substring search.

> I seem to recall though that one major issue with
> autodecoding was that it special-cases certain algorithms. So you'd need to go
> through all of std.algorithm and make sure you can special-case your way out of
> situations that work today.

That's right. A side effect of that is that the algorithms will go even faster! So it's good.

(A substring of codeunits is faster to search than decoding the input stream.)

June 02, 2016
On 06/02/2016 06:10 PM, Marco Leise wrote:
> Am Thu, 2 Jun 2016 15:05:44 -0400
> schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:
>
>> On 06/02/2016 01:54 PM, Marc Schütz wrote:
>>> Which practical tasks are made possible (and work _correctly_) if you
>>> decode to code points, that don't already work with code units?
>>
>> Pretty much everything.
>>
>> s.all!(c => c == 'ö')
>
> Andrei, your ignorance is really starting to grind on
> everyones nerves.

Indeed there seem to be serious questions about my competence, basic comprehension, and now knowledge.

I understand it is tempting to assume that a disagreement is caused by the other simply not understanding the matter. Even if that were true it's not worth sacrificing civility over it.

> If after 350 posts you still don't see
> why this is incorrect: s.any!(c => c == 'o'), you must be
> actively skipping the informational content of this thread.

Is it 'o' with an umlaut or without?

At any rate, consider s of type string and x of type dchar. The dchar type is defined as "a Unicode code point", or at least my understanding that has been a reasonable definition to operate with in the D language ever since its first release. Also in the D language, the various string types char[], wchar[] etc. with their respective qualified versions are meant to hold Unicode strings with one of the UTF8, UTF16, and UTF32 encodings.

Following these definitions, it stands to reason to infer that the call s.find(c => c == x) means "find the code point x in string s and return the balance of s positioned there". It's prima facie application of the definitions of the entities involved.

Is this the only possible or recommended meaning? Most likely not, viz. the subtle cases in which a given grapheme is represented via either one or multiple code points by means of combining characters. Is it the best possible meaning? It's even difficult to define what "best" means (fastest, covering most languages, etc).

I'm not claiming that meaning is the only possible, the only recommended, or the best possible. All I'm arguing is that it's not retarded, and within a certain universe confined to operating at code point level (which is reasonable per the definitions of the types involved) it can be considered correct.

If at any point in the reasoning above some rampant ignorance comes about, please point it out.

> You are in error, no one agrees with you, and you refuse to see
> it and in the end we have to assume you will make a decisive
> vote against any PR with the intent to remove auto-decoding
> from Phobos.

This seems to assume I have some vesting in the position that makes it independent of facts. That is not the case. I do what I think is right to do, and you do what you think is right to do.

> Your so called vocal minority is actually D's panel of Unicode
> experts who understand that auto-decoding is a false ally and
> should be on the deprecation track.

They have failed to convince me. But I am more convinced than before that RCStr should not offer a default mode of iteration. I think its impact is lost in this discussion, because once it's understood RCStr will become D's recommended string type, the entire matter becomes moot.

> Remember final-by-default? You promised, that your objection
> about breaking code means that D2 will only continue to be
> fixed in a backwards compatible way, be it the implementation
> of shared or whatever else. Yet months later you opened a
> thread with the title "inout must go". So that must have been
> an appeasement back then. People don't forget these things
> easily and RCStr seems to be a similar distraction,
> considering we haven't looked into borrowing/scoped enough and
> you promise wonders from it.

What the hell is this, digging dirt on me? Paying back debts? Please stop that crap.


Andrei