January 09, 2014
On Thursday, 9 January 2014 at 15:14:04 UTC, Manu wrote:
> However, I think to get the expected result from unicode you need
>>
>> string y = "Hello".byGrapheme.retro.find('H').to!string;
>>
>> but I might be wrong.
>>
>
> Bugger that. This is not an example of "D is good at strings!".

Agreed. std.range and std.algorithm should be unicode correct with strings and leave the byte by byte access to ubyte arrays.
January 09, 2014
 On Thursday, 9 January 2014 at 14:25:20 UTC, Benjamin Thaut wrote:
> The best example in D is the deprection of indexOf. Now you have to call countUntil. But if I have to choose between the two names, indexOf actually tells me what it does, while countUntil does not. count until what?

std.algorithm.indexOf was deprecated, not std.string.indexOf, so you can still use it of course and it still gives you the byte (array-access) index of the supplied parameter. And countUntil counts elements until it finds the supplied parameter. I think this is logical and useful and easy to understand.
January 09, 2014
Am Fri, 10 Jan 2014 01:20:26 +1000
schrieb Manu <turkeyman@gmail.com>:

> Awesome! Although it looks like you still have a lot of work ahead of you :)

So... when was std.simd going to be in Phobos again? :p

-- 
Marco

January 09, 2014
On Thursday, 9 January 2014 at 15:14:04 UTC, Manu wrote:
> However, I think to get the expected result from unicode you need
>>
>> string y = "Hello".byGrapheme.retro.find('H').to!string;
>>
>> but I might be wrong.
>>
>
> Bugger that. This is not an example of "D is good at strings!".

I have 0 ideas how are you going to get same functionality in C with strchr. This small line uses quite lot of features to be reliably unicode-correct.
January 09, 2014
On 10 January 2014 01:56, Marco Leise <Marco.Leise@gmx.de> wrote:

> Am Fri, 10 Jan 2014 01:20:26 +1000
> schrieb Manu <turkeyman@gmail.com>:
>
> > Awesome! Although it looks like you still have a lot of work ahead of
> you :)
>
> So... when was std.simd going to be in Phobos again? :p
>

When there are a zillion unit tests >_<
And I kinda wanna prove it is efficient on other architectures before it is
committed to the stone tablet that is phobos; that can never be changed
once committed.


January 09, 2014
On 10 January 2014 02:05, Dicebot <public@dicebot.lv> wrote:

> On Thursday, 9 January 2014 at 15:14:04 UTC, Manu wrote:
>
>> However, I think to get the expected result from unicode you need
>>
>>>
>>> string y = "Hello".byGrapheme.retro.find('H').to!string;
>>>
>>> but I might be wrong.
>>>
>>>
>> Bugger that. This is not an example of "D is good at strings!".
>>
>
> I have 0 ideas how are you going to get same functionality in C with strchr. This small line uses quite lot of features to be reliably unicode-correct.
>

It's nice that it's unicode correct, but it's not nice that you have to be
familiar with a massive amount of the standard library and you need to
search through 4-5 (huge! and often poorly documented) modules to find the
functions you need to perform _basic string operations_, like finding the
last instance of a character...
My standing opinion is that string manipulation in D is not nice, it is
possibly the most difficult and time consuming I have used in any language
ever. Am I alone?


January 09, 2014
Am Thu, 09 Jan 2014 15:20:13 +0000
schrieb "John Colvin" <john.loughran.colvin@gmail.com>:

> On Thursday, 9 January 2014 at 14:34:43 UTC, John Colvin wrote:
> > On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
> >> This works fine:
> >>  string x = find("Hello", 'H');
> >>
> >> This doesn't:
> >>  string y = find(retro("Hello"), 'H');
> >>  > Error: cannot implicitly convert expression
> >> (find(retro("Hello"), 'H'))
> >> of type Result!() to string
> >
> > In order to return the result as a string it would require an allocation. You have to request that allocation (and associated eager evaluation) explicitly
> >
> > string y = "Hello".retro.find('H').to!string;
> >
> >
> > However, I think to get the expected result from unicode you need
> >
> > string y = "Hello".byGrapheme.retro.find('H').to!string;
> >
> > but I might be wrong.
> 
> Oh. I see you actually wanted strrchr behaviour. That's different.

The point about graphemes is good. D's functions still stop
mid-way. From UTF-8 you can iterate UTF-32 code points, but
grapheme clusters are the new characters. I.e. the basic need
to iterate Unicode _characters_ is not supported!
I cannot even come up with use cases for working with code
points and think they are a conceptual black hole. Something
carried over from a time when grapheme clusters didn't exist.

When you search for 'A', 'Ä' shows up when it is built from an A and the "two dots" symbol. It also has the walk length 2. This isn't an issue as long as we use strings from languages that are traditionally well supported with single code-unit characters.

Basically the element type when iterating over a string would have to be another string of arbitrary length, since you could attach any number of combining diacritical symbols to a letter. See?: e͜͟͡͞

-- 
Marco

January 09, 2014
Am Fri, 10 Jan 2014 02:21:35 +1000
schrieb Manu <turkeyman@gmail.com>:

> On 10 January 2014 01:56, Marco Leise <Marco.Leise@gmx.de> wrote:
> 
> > Am Fri, 10 Jan 2014 01:20:26 +1000
> > schrieb Manu <turkeyman@gmail.com>:
> >
> > > Awesome! Although it looks like you still have a lot of work ahead of
> > you :)
> >
> > So... when was std.simd going to be in Phobos again? :p
> >
> 
> When there are a zillion unit tests >_<
> And I kinda wanna prove it is efficient on other architectures before it is
> committed to the stone tablet that is phobos; that can never be changed
> once committed.

I Phobos should follow OpenGL in this regard and use a
prefix like `etc` for useful but not finalized modules, so
early adapters can try out new modules compare them with any
existing API in Phobos where applicable (e.g. streams,
json, ...) and report any issues. I have a feeling that right
now most modules are tested by 2 people prior to the merge,
because they spent a life in obscurity.

-- 
Marco

January 09, 2014
On Thursday, 9 January 2014 at 16:22:08 UTC, Manu wrote:
> It's nice that it's unicode correct, but it's not nice that you have to be
> familiar with a massive amount of the standard library and you need to
> search through 4-5 (huge! and often poorly documented) modules to find the
> functions you need to perform _basic string operations_, like finding the
> last instance of a character...

That I do agree. One idea is that once everything is split into smaller packages we can start providing meta-packages that do public imports of small sets of commonly used functions.

Still once needed functions are found I do consider end result very robust for what it actually does and don't know any other language that does it better.

> My standing opinion is that string manipulation in D is not nice, it is
> possibly the most difficult and time consuming I have used in any language
> ever. Am I alone?

Unicode is the doom. If you only keep ASCII in mind you statement is indeed true and D stuff seems ridiculously complicated compared even to plain C. But it has also teached me that _every single_ program I have written before in other languages was broken in regards to Unicode handling. So, yes, it is quite difficult but it is the cost for doing what no one else does - being correct out of the box. Well, at least in most scenarios :)
January 09, 2014
On Thu, 09 Jan 2014 14:07:36 -0000, Manu <turkeyman@gmail.com> wrote:
> This works fine:
>   string x = find("Hello", 'H');
>
> This doesn't:
>   string y = find(retro("Hello"), 'H');
>   > Error: cannot implicitly convert expression (find(retro("Hello"), 'H'))
> of type Result!() to string
>
> Is that wrong? That seems to be how the docs suggest it should be used.
>
> On a side note, am I the only one that finds std.algorithm/std.range/etc
> for string processing really obtuse?
> I can rarely understand the error messages, so say it's better than STL is  optimistic.
> Using std.algorithm and std.range to do string manipulation feels really
> lame to me.
> I hate looking through the docs of 3-4 modules to understand the complete
> set of useful string operations (std.string, std.uni, std.algorithm,
> std.range... at least).
> I also find the names of the generic algorithms are often unrelated to the name of the string operation.
> My feeling is, everyone is always on about how cool D is at string, but
> other than 'char[]', and the builtin slice operator, I feel really
> unproductive whenever I do any heavy string manipulation in D.
> I also hate that I need to import at least 4-5 modules to do anything
> useful with strings... I feel my program bloating and cringe with every
> gigantic import that sources exactly one symbol.

I feel exactly the same way.  I must admit I haven't done any serious D for a couple of years now, and the main reason is lack of free time, but the other is that each time I come back to try and do something I get weird arse error messages like the one you got above.

I realise that it is probably the way it is, to avoid bloating the language with several ways to do the same thing.  I agree with that position, however..  I don't think it's a bad thing (TM) to have a custom/specific set of operations for a given area which re-use more generic operations behind the scenes.

In other words, why can't we alias or wrap the generic routines in std.string such that the expected operations are easy to find and do exactly what you'd expect, for strings.

If someone is dealing with generic code where the ranges involved might be strings/arrays or might be something else of course they will call std.range functions, but if they are only dealing with strings there should be string specific functions for them to call - which may/may not use std.range or std.algorithm functions etc behind the scenes.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/