January 10, 2014
On 10 January 2014 12:48, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Fri, Jan 10, 2014 at 11:33:35AM +1000, Manu wrote:
> > On 10 January 2014 06:27, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
> >
> > > On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
> > > > On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
> [...]
> > > > >I also find the names of the generic algorithms are often unrelated to the name of the string operation.  My feeling is, everyone is always on about how cool D is at string, but other than 'char[]', and the builtin slice operator, I feel really unproductive whenever I do any heavy string manipulation in D.
> > >
> > > Really?? I find myself much more productive, because I only have to learn one set of generic algorithms, and I can use them not just for strings but for all sorts of other stuff that implement the range API.
> > >
> >
> > That sounds good in theory, but if any time you try and actually use D's generic algorithms you end up with many of the kind of errors you refer to in your prior paragraph, then that basically undermines the whole experience.
>
> Really? I only encounter those kinds of errors once in a while. They *are* extremely annoying when they happen, but on the whole, they're relatively rare. You must be doing something wrong if you're seeing them all the time.
>

I think not really knowing quite what you need to do in advance elevates
the probability of doing something wrong ;)
The quality of these range error messages needs to be improved somehow if
basic string operations are supposed to use them comfortably.


> I don't like wasting my time, and I don't like pushing my way through
> > learning something that I feel is obtuse to begin with, so I usually take a side path and work around it (most things can be done easily with a couple of nested foreach-es). So, perhaps embarrassingly, despite my 3+ years spent hanging around here, part of the problem is that I barely know/use phobos. Call me lazy, but I don't think it's an unrealistic experience for any end-user. If it saves me time/headache (and bloat) not using it, why would I?
> >
> > ** Yes, it's the 'standard' library, and I like that concept in essence, and feel like I should make use of it on principle... but it's like, you need to already know phobos intimately to think it's awesome, which creates a weird barrier to entry. And the docs don't help a lot.
>
> I think you're tainted by your experience with C. :-) Using Phobos effectively requires that you take the time to understand and use ranges; or, as somebody else said, stick with std.string. But if that doesn't do what you need, then you need to ... er, understand and use ranges. :-P  Expecting to use things the same way as in C is probably the root cause for your frustrations.
>

I don't agree that something like ranges shouldn't be more or less
intuitive. C doesn't have ranges, so I don't think I'm really transposing C
baggage when considering how to debug my mistakes in range based code in
this case.
Like most things, once you know your way around it, it's fine, but is there
opportunities (mostly in trivial things like better naming
conventions/standards and improved error messages) to make it a whole lot
more intuitive?


> > Whereas in languages like C, sure you get familiar with
> > > string-specific functions, but then when you need a similar-operating function for an array of ints, you have to name it something else, and then basically the same algorithm reimplemented for linked lists, called by yet another name, etc.. Added together, it's many times more mental load than just learning a single set of generic algorithms that work on (almost) everything.
> > >
> > > The composability of generic algorithms also allow me to think on a more abstract level -- instead of thinking about manipulating individual chars, I can figure out OK, if I split the string by "," then I can filter for the strings I'm looking for, then join them back again with another delimiter. Since the same set of algorithms work with other ranges too, I can apply exactly the same thought process for working with arrays, linked lists, and other containers, without having to remember 5 different names of essentially the same algorithm but applied to 5 different types.
> > >
> >
> > See, I get that idea about composability. Maybe it's just baggage from C, but I just don't think that way. Maybe that's a large part of why I always go wrong with phobos.
>
> Yes, the baggage is slowing you down. Cast it overboard and lighten the boat, man. ;-)
>
>
> > I would never think of doing something fundamental like string processing with a sequence of generic algorithm. I'd freak out about the relatively unknown performance characteristics.
>
> I think your caution is misplaced. Things like std.algorithm.find are actually quite efficient -- don't be misled by the verbose layers of template abstractions surrounding the code; for the common cases, it translates to a simple loop. And recently, certain cases even translate straight to C's strchr / memchr, and so are on par with C.
>

Surely it can't do that if the operation requires any composition? How do you specialise a composed sequence of operations?

> Algorithms are usually a lot simpler when performed on strings of
> > bytes than they are performed on strings of objects with any imaginable copying mechanisms and allocations patterns.
>
> Phobos also has lots of template specializations that take advantage of strings and arrays.
>

Again, I'm talking WRT composition specifically here.


> Unless I wrote something myself, I can never have faith that the sort
> > of concessions required to make it generic also make it fast in the case it happens to be performed in a byte array.
>
> Well, if you're going to insist on NIH syndrome, then you might as well write your own standard library instead of fighting with Phobos. :)
>
>
> > There's an argument that you can specialise for string types, which is true within single functions, but if you're 'composing' a function with generic parts, then you can't specialise for strings anymore... There's no way to specialise a call to a.b.c() as a compound operation.
>
> And how exactly does the C compiler specialize strchr(strcat(a,b),c) as
> a single compound operation?
>

That's equally a composed statement. It's the same as the concern I raise. I was refering to cases where D requires a composed statement as opposed to cases where other languages may have some explicit function that does a single complex thing.

And I'm not talking about specifics, I was illustrating the nature of my
psychological baggage :) .. I have an unreasonable distrust towards
requiring composed statements to do very simple things.
It's not a specific criticism, it's a comment.


If you want a single-pass compound operation on a string, you'd have to
> write it out manually in C... and in D, you could write it out manually too, just use a for loop over the string -- same effort, same performance. Or you could save yourself the trouble and compose two algorithms from std.algorithm, the result of which is *also* single-pass (because ranges are lazy). Sure you can object that there's overhead introduced by using ranges, but since .front translates to just *ptr and .popFront translates to just ++ptr, the only overhead is just a few function calls if the compiler doesn't inline them. Which, for functions that small, it probably does.
>

Surely it can't be *ptr and ++ptr as you say, otherwise none of it would be unicode safe...?


> Like I say, it's probably psychological baggage, but I tend to
> > unconsciously dismiss/reject that sort of thing without a second though...  or maybe experience learned me my lesson (*cough* STL).
>
> OK, let's get one thing straight here. Comparing Phobos to STL is truly unfair. I spent almost 2 decades writing C++, and wrote code both using STL and without (from when STL didn't exist yet), and IME, Phobos's range algorithms are *orders* of magnitude better than STL in terms of usability. At least. In STL, you have to always manage pointer pairs, which become a massive pain when you need to pass multiple pairs around (very error-prone, transpose one argument, and you have a nice segfault or memory corruption bug).  Then you have stupid verbose syntax like:
>
>         // You can't even write the for-loop conditions in a single
>         // line!
>         for (std::vector<MyType<Blah> >::iterator it =
>                 myContainer.start();
>                 it != myContainer.end();
>                 it++)
>         {
>                 // What's with this (*smartPtr)->x nonsense everywhere?
>                 doSomething((*((*it)->impl)->myDataField);
>
>                 // What, I can't even write a simple X != Y if-condition
>                 // in a single line?! Not to mention the silly
>                 // redundancy of having to write out the entire chain of
>                 // dereferences to exactly the same object twice.
>                 if (find((*(*it)->impl)->mySubContainer, key) ==
>                         (*(*it)->impl)->mySubContainer.end())
>                 {
>                         // How I long for D's .init!
>                         std::vector<MyTypeBlah> >::iterator empty;
>                         return empty;
>                 }
>         }
>
> Whereas in D:
>
>         foreach (item; myContainer) {
>                 doSomething(item.impl.myDataField);
>                 if (!item.mySubContainer.canFind(key))
>                         return ElementType!MyContainer.init;
>         }
>
> There's no comparison, I tell you. No comparison at all.
>

Yes, I'm aware that it's syntactically superior, but the quality of the
error messages isn't much better than STL.
I also find things easier to find and/or more logically named (probably
biased from past exposure, i know) in the STL than in phobos.


> > > I actually feel a lot more productive in D than in C++ with
> > > > strings.  Boost's string algorithms library helps fill the gap (and at least you only have one place to look for documentation when you are using it) but overall I prefer my experience working in D with pseudo-member chains.
> > >
> > > I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested.
> > >
> >
> > Perhaps you're right. But I think there's ***HUGE*** room for improvement.  The key in your sentence is, it shouldn't require 'effort'; if it's not intuitive to programmers with decades of experience, then there are probably some fundamental design (or documentation/accessibility) deficiencies that needs to be prioritised. How is any junior programmer meant to take to D?
>
> No offense, but IME, junior programmers tend to pick up these things much faster than experienced programmers with lots of baggage from other languages, precisely because they don't have all that baggage to slow them down. Old habits die hard, as they say.
>

Maybe you're right, but I can't imagine many juniors that would be capable of tracking down what went wrong when they inevitably made a mistake and get met with weird errors relating to ranges and template constraints and all that good stuff... Maybe they'd be doing it differently in the first place though? Who knows.


That's not to say that the D docs don't need improvement, of course. But
> given all your objections about Phobos algorithms despite having barely *used* Phobos, I think the source of your difficulty lies more in the baggage than in the documentation. :)
>

I already said that myself. But I'd like to think the experience could be
smoother, more helpful, and more intuitive. I don't think you can say it's
perfect, or even particularly 'good'. It's acceptable, it does seem to
work, but it's not an easy learning curve, and it's hard to take in small
steps, or to absorb via osmosis.
Every time I try and repeat something that 'I kinda remember seeing a few
months ago' and 'it was kinda like this...', it takes me AGES to get right.
Always finicky little details that take the most time, and I often find the
phobos source code more helpful than the docs, which isn't a good sign.

That's my general point. I think there's a lot of room for case study, and improvement.


January 10, 2014
On 10 January 2014 15:48, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org
> wrote:

> On 1/9/14 8:21 AM, Manu wrote:
>
>> My standing opinion is that string manipulation in D is not nice, it is possibly the most difficult and time consuming I have used in any language ever. Am I alone?
>>
>
> No, but probably in the minority.
>
> The long and short of it is, you must get ranges in order to enjoy the power of D algorithms (as per http://goo.gl/dVprVT).
>
> std.{algorithm,range} are commonly mentioned as an attractive asset of D, and those who get that style of doing things have no trouble applying such notions to a variety of data, notably including strings. So going with the attitude "I don't use, know, or care for phobos... I just want to do this pesky string thing!" is bound to create frustration.
>

The thing is, that pesky string thing is usually a trivial detail in an
otherwise completely unrelated task. I'm not joking when I've had details
like formatting a useful error message take 90% of the time to complete
some totally unrelated task.
I guess I'm a little isolated from high level algorithms, because I spend
most of my time at the level of twiddling bits.

This is a key motivation for my kicking off this all-D game project, and getting others involved. I need excuse to push myself to have more involvement with these type of things. Doing more high-level code than I usually do will help, and having other D users also in the project will keep me in check, and hopefully improve my D code a lot while at it ;)

I personally find strings very easy to deal with in D. They might be easier
> in Perl or sometimes Python, but at a steep efficiency cost.
>
> Walter has recently written a non-trivial utility that beats the pants off (3x performance) the equivalent C program that has been highly scrutinized and honed for literally decades by dozens (hundreds?) of professionals. Walter's implementations uses ranges and algorithms (a few standard, many custom) through and through. If all goes well we'll open-source it. He himself is now an range/algorithm convert, even though he'd be the first to point the no-nonsense nature of a function like strrchr. (And btw strrchr is after all a POS because it needs to scan the string left to right... so lastIndex is faster!)


How long did it take to get him there? I suspect he made the leap only when a particular task that motivated him to do so came up. I suspect I'm likely to follow that same pattern given the context; like him, I'm a somewhat no-frills practicality-oriented programmer, and don't get too excited about futuristic shiny things unless it's readily apparent they can make my workload simpler and more efficient (although I would also require it not sacrifice computation efficiency). But my point remains, as a trivial ancillary detail - I'm not doing stuff with strings; I'm working on other stuff that just _has_ some strings - it's not presented in a way that one can just get the job done with low friction, and without at least tripling the number of imports from the std library.


January 10, 2014
On Fri, Jan 10, 2014 at 04:37:03PM +1000, Manu wrote:
> On 10 January 2014 15:48, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org
> > wrote:
> 
> > On 1/9/14 8:21 AM, Manu wrote:
> >
> >> My standing opinion is that string manipulation in D is not nice, it is possibly the most difficult and time consuming I have used in any language ever. Am I alone?
> >>
> >
> > No, but probably in the minority.
> >
> > The long and short of it is, you must get ranges in order to enjoy the power of D algorithms (as per http://goo.gl/dVprVT).
> >
> > std.{algorithm,range} are commonly mentioned as an attractive asset of D, and those who get that style of doing things have no trouble applying such notions to a variety of data, notably including strings. So going with the attitude "I don't use, know, or care for phobos... I just want to do this pesky string thing!" is bound to create frustration.
> >
> 
> The thing is, that pesky string thing is usually a trivial detail in an otherwise completely unrelated task. I'm not joking when I've had details like formatting a useful error message take 90% of the time to complete some totally unrelated task.

You have to be doing something wrong... formatting error messages is as trivial as using std.string.format:

	if (argsAreBad(x,y,z))
		throw new Exception("Parameters x=%s y=%s z=%s are invalid!"
					.format(x,y,z));

I can't imagine what can be simpler than this. (Not to mention, %s in D just means "string format of X", so the above code will actually work for x, y, z of *any* type that has some kind of conversion to string. Try this with C/C++, and you'll be segfaulting all day.)


> I guess I'm a little isolated from high level algorithms, because I spend most of my time at the level of twiddling bits.

That would explain your difficulty with Phobos algorithms. :)


> This is a key motivation for my kicking off this all-D game project, and getting others involved. I need excuse to push myself to have more involvement with these type of things. Doing more high-level code than I usually do will help, and having other D users also in the project will keep me in check, and hopefully improve my D code a lot while at it ;)

Well, maybe the reward of not having to grit your teeth everytime you do string manipulation in D will motivate you to learn how to use Phobos effectively? :)


> > I personally find strings very easy to deal with in D. They might be easier in Perl or sometimes Python, but at a steep efficiency cost.
> >
> > Walter has recently written a non-trivial utility that beats the pants off (3x performance) the equivalent C program that has been highly scrutinized and honed for literally decades by dozens (hundreds?) of professionals.  Walter's implementations uses ranges and algorithms (a few standard, many custom) through and through. If all goes well we'll open-source it. He himself is now an range/algorithm convert, even though he'd be the first to point the no-nonsense nature of a function like strrchr. (And btw strrchr is after all a POS because it needs to scan the string left to right... so lastIndex is faster!)
> 
> How long did it take to get him there? I suspect he made the leap only when a particular task that motivated him to do so came up. I suspect I'm likely to follow that same pattern given the context; like him, I'm a somewhat no-frills practicality-oriented programmer, and don't get too excited about futuristic shiny things unless it's readily apparent they can make my workload simpler and more efficient (although I would also require it not sacrifice computation efficiency).

I'm not the kind to get excited about futuristic shiny things either... I don't even use a GUI, for example! (Well, technically I do, since I'm running on X11, but it's so bare bones to the point that my manager is baffled how I could even begin to use such an interface. I barely ever touch the mouse except when browsing, for one thing. Almost everything is completely keyboard-driven.) And I'm also skeptical of new trendy overhyped things that has people jumping on the bandwagon by droves -- and usually it turns out that it's just another ordinary idea blown out of proportion by the PR machine.

Yet I had no trouble getting up to speed with Phobos algorithms.  I *will* say there's a learning curve, though -- you need to understand what ranges are and why they're the way they are, before you can fully grok Phobos algorithms. Andrei's article "On Iteration" (linked from the std.range docs) is almost a must-read. But IMO it's more than worth the time to learn this. It will revolutionize the way you think about code. ;-)


> But my point remains, as a trivial ancillary detail - I'm not doing stuff with strings; I'm working on other stuff that just _has_ some strings - it's not presented in a way that one can just get the job done with low friction, and without at least tripling the number of imports from the std library.

But that's the thing, if you have some level of facility with ranges, you could be using exactly the same algorithms for your other stuff as you'd use for strings. That's much less mental overhead than having to remember one set of API's for manipulating said other stuff, and a different set of API's for manipulating strings.

The number of imports needed, though, is a different issue. That's something that Phobos needs improvement in. At least the last time I checked, the "Phobos philosophy", as stated on dlang.org, is that you shouldn't need to import half the library just to do a single simple operation like reading a file. Unfortunately, from what I can tell, that philosophy hasn't really been carried through. Lazy imports, discussed earlier this week, are a direction I'd like to see implemented some time in the near future. Some of the code bloat just from importing a single std module is a bit excessive, and bugs me quite a bit.

Nevertheless, I haven't experienced any "high friction" issues in getting stuff done with strings. Once you learn where things are and what is available, it's pretty straightforward to throw something together. It does take a bit of time to learn this, but honestly, that's not any more effort than learning C for the first time and learning what strchr or memset means, and when to use strcat and when not to. In fact, I'd argue that learning the C string functions is a lot more effort, because they have so many pitfalls and gotchas that you must memorize and constantly keep in mind, otherwise your program suddenly acquires gratuitous segfaults, pointer bugs, and buffer overruns. IME, it takes *more* effort to write string manipulation code in C, rather than less, since so many more things can go wrong.


T

-- 
Turning your clock 15 minutes ahead won't cure lateness---you're just making time go faster!
January 10, 2014
On 2014-01-10 01:57, Manu wrote:

> I've heard that, and I think that's a lame argument. Would people rather
> break peoples code *who deliberately chose to use a beta feature, and
> accept the contract while doing so (that it would later be moved to
> 'std' proper)*, or consistently produce features that have very little
> proven foundation in practical application? It takes year(/s) before
> enough people can have had a crack at a new API in enough scenarios to
> reveal where it went right, and where it went wrong.

I think it's a good idea, others don't.

-- 
/Jacob Carlborg
January 10, 2014
On 2014-01-10 02:04, Jesse Phillips wrote:

> Interesting, I've had the opposite experience. I keep trying to perform
> range operations and C# doesn't have them. Slicing is of course ever
> more desired.
>
> That isn't to say C# is bad, but
>
>      if(string.IsNullOrEmpty(str))
>
> vs
>
>      if(str.empty)
>
> keeps throwing me off.

Or as in Ruby on Rails:

if str.blank?
end

"str" is conisderd blank if:

* it's nil (null)
* empty (its length is 0)
* it only contains whitespce

BTW, it works on all objects, not just strings. For arrays it will check the length as well, but for other objects it will just check for nil.

-- 
/Jacob Carlborg
January 10, 2014
10-Jan-2014 05:16, Adam D. Ruppe пишет:
> On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
>> So is it 'correct'?
>
> Yes, with the caveat that it might find a surrogate pair (like H
> followed by an accent code point). That's what byGrapheme is about:
> combining those pairs.

Not at all. Take time to read the Unicode standard.
Surrogate pairs are a part of UTF-16 encoding and little else.


-- 
Dmitry Olshansky
January 10, 2014
10-Jan-2014 11:49, Dmitry Olshansky пишет:
> 10-Jan-2014 05:16, Adam D. Ruppe пишет:
>> On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
>>> So is it 'correct'?
>>
>> Yes, with the caveat that it might find a surrogate pair (like H
>> followed by an accent code point). That's what byGrapheme is about:
>> combining those pairs.
>
> Not at all. Take time to read the Unicode standard.
> Surrogate pairs are a part of UTF-16 encoding and little else.
>

To clarify: grapheme cluster is not a pair, nor it's a surrogate pair, but H with accent is a grapheme cluster ;)

-- 
Dmitry Olshansky
January 10, 2014
On 2014-01-10 06:48, Andrei Alexandrescu wrote:
> On 1/9/14 8:21 AM, Manu wrote:
>> My standing opinion is that string manipulation in D is not nice, it is
>> possibly the most difficult and time consuming I have used in any
>> language ever. Am I alone?
>
> No, but probably in the minority.
>
> The long and short of it is, you must get ranges in order to enjoy the
> power of D algorithms (as per http://goo.gl/dVprVT).
>
> std.{algorithm,range} are commonly mentioned as an attractive asset of
> D, and those who get that style of doing things have no trouble applying
> such notions to a variety of data, notably including strings. So going
> with the attitude "I don't use, know, or care for phobos... I just want
> to do this pesky string thing!" is bound to create frustration.

Even if you do get how ranges work it can be difficult to figure out where a function is located, in std.algorithms, std.string, std.array, std.uni or std.range. Like, "is this a string operation or a general container algorithm?". Why is there a std.string.indexOf function? Isn't that a general array operation or algorithm? Isn't std.string.(left|right)Justify a general operation as well?

-- 
/Jacob Carlborg
January 10, 2014
On 2014-01-10 00:34, H. S. Teoh wrote:

> Yeah, any public imports should be mentioned somewhere in the docs,
> otherwise it's just random invisible magic as far as the end-user is
> concerned ("Hmm, I imported std.string in one module, and array.front
> works, but in this other module, array.front doesn't work! Why? Who
> knows.");
>
> Please submit a pull request to add that to the docs.

I agree, and it should be automatic.

-- 
/Jacob Carlborg
January 10, 2014
On 2014-01-10 02:34, Manu wrote:

> Or just alias the functions useful for string processing...

I agree. It already has some aliases, converting to lower and uppercase.

-- 
/Jacob Carlborg