Why the hell doesn't foreach decode strings (page 3)

October 21, 2011

Re: Why the hell doesn't foreach decode strings

Posted by Andrei Alexandrescu
in reply to Walter Bright

Permalink

Andrei Alexandrescu

Posted in reply to Walter Bright

Permalink

On 10/21/11 1:38 PM, Walter Bright wrote:
> On 10/21/2011 2:51 AM, Martin Nowak wrote:
>> You have a good point here. I would have immediately thrown out the
>> loop AFTER
>> profiling.
>> What hits me here is that I had an incorrect program with built-in
>> unicode aware
>> strings.
>> This is counterintuitive to correct unicode handling throughout the
>> std library,
>> and even more to the complementary operation of appending any char
>> type to strings.
>
> I understand the issue, but I don't think it's resolvable.

It is resolvable, just not without breaking compatibility. Latching on the notion that a problem is unsolvable is highly nocive because it sets up the mind for failing to not only look for solutions, but also to see and understand them when they're in the open.

I said this a number of times, and I repeat: if we had the luxury of doing it over again, I'd disable random access and .length for char[], wchar[], and their qualified versions. For those types I would add a property .rep that yields respectively ubyte[], ushort[], and the appropriately-qualified variants.

This would shed the remaining awkwardnesses from a generally very elegant approach to string handling.

The loop issue would be trivial to solve. foreach (x; s) would iterate one dchar at a time, whereas foreach (x; s.rep) would iterate one ubyte or ushort at a time. There would be the ability to iterate foreach (ref x; s.rep) but not foreach (ref x; s).

Andrei

On 10/21/11 1:39 PM, Walter Bright wrote: > On 10/21/2011 4:14 AM, Steven Schveighoffer wrote: >>> Making such a string type would be terribly inefficient. It would make D >>> completely uncompetitive for processing strings. >> >> I don't think it would. Do you have any proof to support this? > > I've done string processing code, and done a lot of profiling of them. > Every cycle is critical, and decoding adds a *lot* of cycles. The key here is to allow people to easily either decode strings or not, without defaulting to an error-prone choice. Andrei

On Fri, 21 Oct 2011 21:11:14 +0300, Peter Alexander <peter.alexander.au@gmail.com> wrote: > Of course, people will still need to understand UTF-8. I don't think that's a problem. It's unreasonable to expect the language to do the thinking for you. The problem is that we have people that *do* understand UTF-8 (like the OP), but *don't* understand D's strings. Indeed, if one knows/understands how unicode works AND knows the structure of D strings, the current scheme is the best of both worlds. foreach(e; string) // chars foreach(e; byUTFxxx(string)) // ...

On Fri, 21 Oct 2011 21:38:39 +0300, Jonathan M Davis <jmdavisProg@gmx.com> wrote: > In another post in this thread, Walter said in reference to post on > essentially this idea: "Making such a string type would be terribly > inefficient. It would make D completely uncompetitive for processing strings." > Now, whether that's true is debatable, but that's his stance on the idea. > > - Jonathan M Davis Well he is right, the reason built-in strings everywhere and rarely (so rare that i have never seen one yet) someone comes up with a better alternative. IMO people are spoiled by dynamic languages and fail to see the strengths of D strings. Someone coming from C/C++ this is heaven, both for performance and flexibility.

so: > IMO people are spoiled by dynamic languages We should aim to something *better* than dynamic languages, where possible. If you have to do certain things (even "slow" ones), there's no point in making them harder than necessary in D. Python is a good language, but it's not perfect, and in a new language I'd like something even better. D contains several small things that are better than Python (an many things that are worse than Python). Bye, bearophile

Andrei Alexandrescu: > Latching on the notion that a problem is unsolvable is highly nocive because it sets up the mind for failing to not only look for solutions, but also to see and understand them when they're in the open. Right. Experimental psychology has confirmed this some decades ago :-) > I said this a number of times, and I repeat: Maybe I have missed your precedent explanations of this idea. Or to me this time it seems more clear and focused. > if we had the luxury of doing it over again, Unfortunately D3 language can't break too much backward compatibility :-| > I'd disable random access and .length for char[], wchar[], and their qualified versions. For those types I would add a property .rep that yields respectively ubyte[], ushort[], and the appropriately-qualified variants. Good. But the need to know the length of a variable-sized string is common. So I presume in such cases you have to use: somestring.walkLength() This is not too much bad, but it looks a bit long, and it requires an import. So maybe a shorter named property function in the object module is needed: somestring.wlength What about slices? I do need to take slices of variable-length strings too. Bye, bearophile

On Sat, 22 Oct 2011 02:28:28 +0300, bearophile <bearophileHUGS@lycos.com> wrote: > so: > >> IMO people are spoiled by dynamic languages > > We should aim to something *better* than dynamic languages, where possible. If you have to do certain things (even "slow" ones), there's no point in making them harder than necessary in D. > > Python is a good language, but it's not perfect, and in a new language I'd like something even better. D contains several small things that are better than Python (an many things that are worse than Python). > > Bye, > bearophile With spoiled i didn't mean those languages do it better in general (expressiveness, maybe), what i meant is that they just work, without any effort from the programmer. It may be all right for them and you could say it is what every language should do but in reality this is not the case for D like languages. Not with the expense of efficiency on a low level task. We could aim all we like but we can't beat dynamic languages that easily, if you don't care reaching or beating C (on every aspect, system access, performance...) you don't have much of a limit do you? :)

On Sat, 22 Oct 2011 01:28:28 +0200, bearophile <bearophileHUGS@lycos.com> wrote: > so: > >> IMO people are spoiled by dynamic languages > > We should aim to something *better* than dynamic languages, where possible. If you have to do certain things (even "slow" ones), there's no point in making them harder than necessary in D. > > Python is a good language, but it's not perfect, and in a new language I'd like something even better. D contains several small things that are better than Python (an many things that are worse than Python). > > Bye, > bearophile Well the first thing I tried out was: #!/usr/bin/env python for c in "f#a# ∞": print c Which I still didn't get to run after: - reading SyntaxError: Non-ASCII character '\xe2' in file ./run.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details - adding a BOM to the script - remove the shebang which collides with the BOM - writing unicode("f#a# ∞") so I don't get an encoding error martin

On 21/10/11 11:03 PM, so wrote: > On Fri, 21 Oct 2011 21:38:39 +0300, Jonathan M Davis > <jmdavisProg@gmx.com> wrote: > >> In another post in this thread, Walter said in reference to post on >> essentially this idea: "Making such a string type would be terribly >> inefficient. It would make D completely uncompetitive for processing >> strings." >> Now, whether that's true is debatable, but that's his stance on the idea. >> >> - Jonathan M Davis > > Well he is right, the reason built-in strings everywhere and rarely (so > rare that i have never seen one yet) someone comes up with a better > alternative. > IMO people are spoiled by dynamic languages and fail to see the > strengths of D strings. Someone coming from C/C++ this is heaven, both > for performance and flexibility. Which operations do you believe would be less efficient?

On 10/22/2011 2:21 AM, Peter Alexander wrote: > Which operations do you believe would be less efficient? All of the ones that don't require decoding, such as searching, would be less efficient if decoding was done.

Forums