Thread overview
[Issue 19518] std.range.front() returns a dchar when applied to char[]
Dec 27, 2018
Basile B.
Dec 27, 2018
Vijay Nayar
Feb 14, 2019
Basile-z
Feb 14, 2019
Basile-z
Feb 14, 2019
anonymous4
Feb 14, 2019
Seb
Feb 14, 2019
Vijay Nayar
Feb 14, 2019
Alex
Feb 14, 2019
Seb
Mar 21, 2020
Basile-z
December 27, 2018
https://issues.dlang.org/show_bug.cgi?id=19518

Basile B. <b2.temp@gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |b2.temp@gmx.com

--- Comment #1 from Basile B. <b2.temp@gmx.com> ---
This is not a bug. D standard library auto decoded input ranged of char and wchar so their ElementEncodingType is dchar. The reasoning behind is this is that imagine an array such as

   ['é','µ','ç'] (which is somewhat equivalent to the string "éµç".dup btw)

You'd expect 3 elements, not 6. So if you want to get rid of decoding, cast your array as ubyte[] (or use std.range.byCodeUnit)

--
December 27, 2018
https://issues.dlang.org/show_bug.cgi?id=19518

--- Comment #2 from Vijay Nayar <madric@gmail.com> ---
That makes sense for character processing. Perhaps my understanding of what .front() and .popFront() do is incorrect then. I had assumed that they were general purpose range methods that could also be used on arrays to treat them like ranges as well.

In this particular case, I was implementing a DenseHashSet algorithm, optimized for low memory overhead, when during my unittests, I discovered that they were failing when I made a set of characters. The reason was that my template code was using .front() to manage an internal array.

That may be the dilemma. What does the user have in mind when they use 'char'? Is it strictly for unicode text processing, or is it piece of data with a well defined size? Is it incumbent upon those who use templates to not use 'char' for data in templates (and type-cast bytes), or is it incumbent upon template writers to always consider this special case?

Or is this just the wrong usage of .front(), and array indexing, like data[0],
should be preferred?

--
February 14, 2019
https://issues.dlang.org/show_bug.cgi?id=19518

Basile-z <b2.temp@gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
          Component|dmd                         |phobos
           Hardware|x86_64                      |All
         Resolution|---                         |FIXED
                 OS|Linux                       |All

--- Comment #3 from Basile-z <b2.temp@gmx.com> ---
it was for phobos anyway.

--
February 14, 2019
https://issues.dlang.org/show_bug.cgi?id=19518

Basile-z <b2.temp@gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|FIXED                       |INVALID

--
February 14, 2019
https://issues.dlang.org/show_bug.cgi?id=19518

--- Comment #4 from anonymous4 <dfj1esp02@sneakemail.com> ---
One possible solution is to publish a fork of std.range that treats text as array of code units and use it instead of phobos std.range.

--
February 14, 2019
https://issues.dlang.org/show_bug.cgi?id=19518

Seb <greeenify@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |greeenify@gmail.com

--- Comment #5 from Seb <greeenify@gmail.com> ---
Or use .byCodeUnit, .byChar, . representation, or the upcoming rcstring ;-)

--
February 14, 2019
https://issues.dlang.org/show_bug.cgi?id=19518

--- Comment #6 from Vijay Nayar <madric@gmail.com> ---
I think the tricky case is not so much when one begins and ends thinking of character processing, but when one is writing a generic algorithm using templates that makes use of std.range.front.

A template that takes a range type and an element and works with them will function fine in most cases for most types when they make use of ".front()" in their algorithms.

But as it stands right now, if anyone attempts to use said template with a `char` type, the template will no longer compile, because '.front()' returns a different element type than the range.

This means that either '.front()' shouldn't be used in generic algorithms that need to pull an element out of the range, in favor or something like '[0]', or it means that algorithm writers need to make `char` a special case in any algorithm they write.

I don't actually have a good answer for what approach is best.

--
February 14, 2019
https://issues.dlang.org/show_bug.cgi?id=19518

Alex <sascha.orlov@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sascha.orlov@gmail.com

--
February 14, 2019
https://issues.dlang.org/show_bug.cgi?id=19518

--- Comment #7 from Seb <greeenify@gmail.com> ---
Well, we all agree that it's not super nice, but it also has advantages. Take e.g. 'ü'. If .front would only return a char, you would get the invalid utf symbol. Try printing "ü"[0]

Yes, it has downsides though, but with ElementType!R or auto must generic algorithms don't care about the actual return of .front and if they do, they need special casing for strings anyhow.

Auto-decoding by default is considered as the top2 design error of D, but it's
super hard to fix it now. The solutions so far are:
- fork std.range
- use byCodeUnit or similar
- use rcstring (or similar)

If you come up with a better idea, please share it in the NG, but we can't change std.range.front because it would break room of code. Thanks!

--
March 21, 2020
https://issues.dlang.org/show_bug.cgi?id=19518

Basile-z <b2.temp@gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|b2.temp@gmx.com             |

--