Range of chars (narrow string ranges) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Range of chars (narrow string ranges)

Thread overview

Range of chars (narrow string ranges)
Apr 24, 2015 Martin Nowak
Apr 24, 2015 H. S. Teoh
Apr 24, 2015 Walter Bright
Apr 24, 2015 Martin Nowak
Apr 24, 2015 Brad Anderson
Apr 24, 2015 Walter Bright
Apr 24, 2015 Jonathan M Davis
Apr 24, 2015 Steven Schveighoffer
Apr 25, 2015 Walter Bright
Apr 25, 2015 Steven Schveighoffer
Apr 25, 2015 Jonathan M Davis
Apr 27, 2015 H. S. Teoh
Apr 27, 2015 Jonathan M Davis
Apr 28, 2015 Chris
Apr 28, 2015 Jonathan M Davis
Apr 28, 2015 Vladimir Panteleev
Apr 28, 2015 H. S. Teoh
Apr 28, 2015 Damian
Apr 28, 2015 Jonathan M Davis
Apr 29, 2015 Jonathan M Davis
Apr 29, 2015 Chris
Apr 29, 2015 Jonathan M Davis
Apr 29, 2015 Chris
Apr 25, 2015 ketmar
Apr 30, 2015 Kagamin

April 24, 2015

Range of chars (narrow string ranges)

Posted by Martin Nowak

Martin Nowak

Just want to make this a bit more visible. https://github.com/D-Programming-Language/phobos/pull/3206#issuecomment-95681812

We just added entabber to std.phobos, and AFAIK, it's the first range algorithm that transforms narrow strings to a range of chars, instead of decoding the original string and returning a range of dchars.

Most of phobos can't handle such ranges like strings and you'd have to decode them using byDchar to work with them.

April 24, 2015

Re: Range of chars (narrow string ranges)

Posted by H. S. Teoh
in reply to Martin Nowak

H. S. Teoh

Posted in reply to Martin Nowak

On Fri, Apr 24, 2015 at 08:39:36PM +0200, Martin Nowak via Digitalmars-d wrote:
> Just want to make this a bit more visible. https://github.com/D-Programming-Language/phobos/pull/3206#issuecomment-95681812
> 
> We just added entabber to std.phobos, and AFAIK, it's the first range algorithm that transforms narrow strings to a range of chars, instead of decoding the original string and returning a range of dchars.
> 
> Most of phobos can't handle such ranges like strings and you'd have to decode them using byDchar to work with them.

I really wish we would just *make the darn decision* already, whether to kill off autodecoding or not, and MAKE IT CONSISTENT ACROSS PHOBOS, instead of introducing this schizophrenic dichotomy where some functions give you a range of dchar while others give you a range of char/wchar, and the two don't work well together. This is totally going to make a laughing stock of D one day.

T

-- 
Guns don't kill people. Bullets do.

April 24, 2015

Re: Range of chars (narrow string ranges)

Posted by Walter Bright
in reply to H. S. Teoh

Walter Bright

Posted in reply to H. S. Teoh

On 4/24/2015 11:52 AM, H. S. Teoh via Digitalmars-d wrote:
> I really wish we would just *make the darn decision* already, whether to
> kill off autodecoding or not, and MAKE IT CONSISTENT ACROSS PHOBOS,
> instead of introducing this schizophrenic dichotomy where some functions
> give you a range of dchar while others give you a range of char/wchar,
> and the two don't work well together. This is totally going to make a
> laughing stock of D one day.

Some facts:

1. When I started D, there was a lot of speculation about whether the world would settle on UTF8, UTF16, or UTF32. So D supports natively all three. Time has shown, however, that UTF8 has pretty much won. wchar only exists for Windows API and Java, dchar strings pretty much don't exist in the wild.

2. dchar is very useful as a character type, but not as a string type.

3. Pretty much none of the algorithms in Phobos work when presented with a range of chars or wchars. This is not even documented.

4. Autodecoding is inefficient, especially considering that few algorithms actually need decoding. Re-encoding the result back to UTF8 is another inefficiency.

I'm afraid we are stuck with autodecoding, as taking it out may be far too disruptive.

But all is not lost. The Phobos algorithms can all be fixed to not care about autodecoding. The changes I've made to std.string all reflect that.

https://github.com/D-Programming-Language/phobos/pulls/WalterBright

April 24, 2015

Re: Range of chars (narrow string ranges)

Posted by Martin Nowak
in reply to Walter Bright

Martin Nowak

Posted in reply to Walter Bright

On 04/24/2015 10:44 PM, Walter Bright wrote:
> 4. Autodecoding is inefficient, especially considering that few algorithms actually need decoding. Re-encoding the result back to UTF8 is another inefficiency.
> 
> I'm afraid we are stuck with autodecoding, as taking it out may be far too disruptive.
> 
> But all is not lost. The Phobos algorithms can all be fixed to not care about autodecoding. The changes I've made to std.string all reflect that.

It probably won't be too disruptive to optimize algorithms such as filter to return a range of chars, but only if we support such ranges as narrow strings everywhere.

April 24, 2015

Re: Range of chars (narrow string ranges)

Posted by Brad Anderson
in reply to Walter Bright

Brad Anderson

Posted in reply to Walter Bright

On Friday, 24 April 2015 at 20:44:34 UTC, Walter Bright wrote:
> [snip]
> I'm afraid we are stuck with autodecoding, as taking it out may be far too disruptive.

No!

> But all is not lost. The Phobos algorithms can all be fixed to not care about autodecoding. The changes I've made to std.string all reflect that.

Yay!

I haven't really followed the autodecoding conversations. The problem is that front on char ranges decode, right? Is there quick way to tell which functions are auto decoding so we can have a list of candidates for replacement? It'd be good for hackweek.

I'm reminded of this conversation http://forum.dlang.org/post/xgnurdjcqiyatpvnwznd@forum.dlang.org
which contains a partial list of candidates. Following your lead with implementing these lazy versions (without autodecoding) would be good hackweek projects.

Finally, there is this http://goo.gl/Wmotu4 list from http://forum.dlang.org/post/lvmydbvjivsvmwtimobs@forum.dlang.org that has some good candidates for hackweek I think.

Are we collecting hackweek ideas anywhere?

April 24, 2015

Re: Range of chars (narrow string ranges)

Posted by Walter Bright
in reply to Brad Anderson

Walter Bright

Posted in reply to Brad Anderson

On 4/24/2015 3:29 PM, Brad Anderson wrote:
> I haven't really followed the autodecoding conversations. The problem is that
> front on char ranges decode, right?

Nope. Only front on narrow string arrays. Ranges aren't autodecoded.


> Is there quick way to tell which functions
> are auto decoding so we can have a list of candidates for replacement? It'd be
> good for hackweek.

If they accept ranges, and don't special case narrow strings, then they autodecode.


> I'm reminded of this conversation
> http://forum.dlang.org/post/xgnurdjcqiyatpvnwznd@forum.dlang.org
> which contains a partial list of candidates.

PR's exist for most of these now.

> Following your lead with
> implementing these lazy versions (without autodecoding) would be good hackweek
> projects.

Yup.


> Finally, there is this http://goo.gl/Wmotu4 list from
> http://forum.dlang.org/post/lvmydbvjivsvmwtimobs@forum.dlang.org that has some
> good candidates for hackweek I think.

Yes, we should have an answer for each of the Boost string algorithms.


> Are we collecting hackweek ideas anywhere?

Andrei?

April 24, 2015

Re: Range of chars (narrow string ranges)

Posted by Jonathan M Davis
in reply to Walter Bright

Jonathan M Davis

Posted in reply to Walter Bright

On Friday, 24 April 2015 at 20:44:34 UTC, Walter Bright wrote:
> On 4/24/2015 11:52 AM, H. S. Teoh via Digitalmars-d wrote:
>> I really wish we would just *make the darn decision* already, whether to
>> kill off autodecoding or not, and MAKE IT CONSISTENT ACROSS PHOBOS,
>> instead of introducing this schizophrenic dichotomy where some functions
>> give you a range of dchar while others give you a range of char/wchar,
>> and the two don't work well together. This is totally going to make a
>> laughing stock of D one day.
>
> Some facts:
>
> 1. When I started D, there was a lot of speculation about whether the world would settle on UTF8, UTF16, or UTF32. So D supports natively all three. Time has shown, however, that UTF8 has pretty much won. wchar only exists for Windows API and Java, dchar strings pretty much don't exist in the wild.
>
> 2. dchar is very useful as a character type, but not as a string type.
>
> 3. Pretty much none of the algorithms in Phobos work when presented with a range of chars or wchars. This is not even documented.
>
> 4. Autodecoding is inefficient, especially considering that few algorithms actually need decoding. Re-encoding the result back to UTF8 is another inefficiency.
>
> I'm afraid we are stuck with autodecoding, as taking it out may be far too disruptive.
>
> But all is not lost. The Phobos algorithms can all be fixed to not care about autodecoding. The changes I've made to std.string all reflect that.
>
> https://github.com/D-Programming-Language/phobos/pulls/WalterBright

I really think that leaving things with autodecoding in some cases and not in others is just asking for trouble. Even if we manage to figure out how to fix it so that Phobos doesn't autodecode in any of its algorithms without breaking any user code in the process, that then leaves user code with the problem, and since Phobos _wouldn't_ have the problem, it then would be all the more confusing.

It _is_ possible to get rid of it entirely without breaking code if we move the array range primitives to a new module and later deprecate the old ones, though that would probably mean breaking up std.array into submodules and deprecating _all_ of it in favor of its submodules, since anyone importing std.array would then have the old array range primitives rather than the new ones - or both, causing conflicts. And it's made worse by the fact that std.range publicly imports std.array. So, yes, it _is_ ugly. But it _can_ be done.

If we leave autodecoding in and just work around it everywhere in Phobos, it's just going to forever screw with user code and confuse users. They get confused enough by it as it is, and at least now, they're running into it in Phobos where we can explain it, whereas if they don't see it with Phobos and only with their own code, then they're going to think that they're doing something wrong and potentially get very frustrated.

I definitely share the concern that removing autodecoding outright will be too disruptive, but at the same time, I don't know if we can afford to go halfway with it.

April 24, 2015

Re: Range of chars (narrow string ranges)

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On 4/24/15 4:44 PM, Walter Bright wrote:

> I'm afraid we are stuck with autodecoding, as taking it out may be far
> too disruptive.

This is pretty easy. We just have to create a string type that is backed by, but isn't simply an alias to, an array of char.

-Steve

April 25, 2015

Re: Range of chars (narrow string ranges)

Posted by Walter Bright
in reply to Steven Schveighoffer

Walter Bright

Posted in reply to Steven Schveighoffer

On 4/24/2015 4:56 PM, Steven Schveighoffer wrote:
> This is pretty easy. We just have to create a string type that is backed by, but
> isn't simply an alias to, an array of char.

Just shoot me now!

April 25, 2015

Re: Range of chars (narrow string ranges)

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On 4/24/15 9:02 PM, Walter Bright wrote:
> On 4/24/2015 4:56 PM, Steven Schveighoffer wrote:
>> This is pretty easy. We just have to create a string type that is
>> backed by, but
>> isn't simply an alias to, an array of char.
>
> Just shoot me now!
>

Yeah, that's the reaction I figured I'd get ;) But it doesn't hurt to keep trying since we keep coming back to this over, and over, and over, and over...

-Steve

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation