Should this work? (page 7)

On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote: > The D docs are pretty terrible, they don't do much to help you find what > you're looking for. > You have a massive block of function names at the top of the page, you have > to carefully scan through one by one, hoping that it's named something > obvious that will stand out to you, and in the event it doesn't have a > helper function, you need to work out the proper sequence of > algorithm/range/whatever operations to do what you want (and then repeat > the process finding the small parts you need across a bunch of modules). I find this to be true in other languages, except the "block of function names." When I want to do something but don't know where it is in C#, I Google it and find some StackOverflow page with the answer. In Java, I Google it and find a Java API page (this was mostly be for StackOverflow took over). D, I have a generally idea of where I need to be. Maybe it there are a couple modules to look at. Searching isn't as effective, there just aren't enough arbitrary tutorials on how to do the most basic of things to be able to find those basic things. Trying to look over the API pages for C# or Java to find what you need isn't fun. But it can be a little better if you know which class you need.

January 10, 2014

Re: Should this work?

Posted by H. S. Teoh

Permalink

H. S. Teoh

Permalink

On Fri, Jan 10, 2014 at 11:33:35AM +1000, Manu wrote:
> On 10 January 2014 06:27, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
> 
> > On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
> > > On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[...]
> > > >I also find the names of the generic algorithms are often unrelated to the name of the string operation.  My feeling is, everyone is always on about how cool D is at string, but other than 'char[]', and the builtin slice operator, I feel really unproductive whenever I do any heavy string manipulation in D.
> >
> > Really?? I find myself much more productive, because I only have to learn one set of generic algorithms, and I can use them not just for strings but for all sorts of other stuff that implement the range API.
> >
> 
> That sounds good in theory, but if any time you try and actually use D's generic algorithms you end up with many of the kind of errors you refer to in your prior paragraph, then that basically undermines the whole experience.

Really? I only encounter those kinds of errors once in a while. They *are* extremely annoying when they happen, but on the whole, they're relatively rare. You must be doing something wrong if you're seeing them all the time.

> I don't like wasting my time, and I don't like pushing my way through learning something that I feel is obtuse to begin with, so I usually take a side path and work around it (most things can be done easily with a couple of nested foreach-es). So, perhaps embarrassingly, despite my 3+ years spent hanging around here, part of the problem is that I barely know/use phobos. Call me lazy, but I don't think it's an unrealistic experience for any end-user. If it saves me time/headache (and bloat) not using it, why would I?
>
> ** Yes, it's the 'standard' library, and I like that concept in essence, and feel like I should make use of it on principle... but it's like, you need to already know phobos intimately to think it's awesome, which creates a weird barrier to entry. And the docs don't help a lot.

I think you're tainted by your experience with C. :-) Using Phobos effectively requires that you take the time to understand and use ranges; or, as somebody else said, stick with std.string. But if that doesn't do what you need, then you need to ... er, understand and use ranges. :-P  Expecting to use things the same way as in C is probably the root cause for your frustrations.

> > Whereas in languages like C, sure you get familiar with string-specific functions, but then when you need a similar-operating function for an array of ints, you have to name it something else, and then basically the same algorithm reimplemented for linked lists, called by yet another name, etc.. Added together, it's many times more mental load than just learning a single set of generic algorithms that work on (almost) everything.
> >
> > The composability of generic algorithms also allow me to think on a more abstract level -- instead of thinking about manipulating individual chars, I can figure out OK, if I split the string by "," then I can filter for the strings I'm looking for, then join them back again with another delimiter. Since the same set of algorithms work with other ranges too, I can apply exactly the same thought process for working with arrays, linked lists, and other containers, without having to remember 5 different names of essentially the same algorithm but applied to 5 different types.
> >
> 
> See, I get that idea about composability. Maybe it's just baggage from C, but I just don't think that way. Maybe that's a large part of why I always go wrong with phobos.

Yes, the baggage is slowing you down. Cast it overboard and lighten the boat, man. ;-)

> I would never think of doing something fundamental like string processing with a sequence of generic algorithm. I'd freak out about the relatively unknown performance characteristics.

I think your caution is misplaced. Things like std.algorithm.find are actually quite efficient -- don't be misled by the verbose layers of template abstractions surrounding the code; for the common cases, it translates to a simple loop. And recently, certain cases even translate straight to C's strchr / memchr, and so are on par with C.

> Algorithms are usually a lot simpler when performed on strings of bytes than they are performed on strings of objects with any imaginable copying mechanisms and allocations patterns.

Phobos also has lots of template specializations that take advantage of strings and arrays.

> Unless I wrote something myself, I can never have faith that the sort of concessions required to make it generic also make it fast in the case it happens to be performed in a byte array.

Well, if you're going to insist on NIH syndrome, then you might as well write your own standard library instead of fighting with Phobos. :)

> There's an argument that you can specialise for string types, which is true within single functions, but if you're 'composing' a function with generic parts, then you can't specialise for strings anymore... There's no way to specialise a call to a.b.c() as a compound operation.

And how exactly does the C compiler specialize strchr(strcat(a,b),c) as
a single compound operation?

If you want a single-pass compound operation on a string, you'd have to write it out manually in C... and in D, you could write it out manually too, just use a for loop over the string -- same effort, same performance. Or you could save yourself the trouble and compose two algorithms from std.algorithm, the result of which is *also* single-pass (because ranges are lazy). Sure you can object that there's overhead introduced by using ranges, but since .front translates to just *ptr and .popFront translates to just ++ptr, the only overhead is just a few function calls if the compiler doesn't inline them. Which, for functions that small, it probably does.

> Like I say, it's probably psychological baggage, but I tend to unconsciously dismiss/reject that sort of thing without a second though...  or maybe experience learned me my lesson (*cough* STL).

OK, let's get one thing straight here. Comparing Phobos to STL is truly unfair. I spent almost 2 decades writing C++, and wrote code both using STL and without (from when STL didn't exist yet), and IME, Phobos's range algorithms are *orders* of magnitude better than STL in terms of usability. At least. In STL, you have to always manage pointer pairs, which become a massive pain when you need to pass multiple pairs around (very error-prone, transpose one argument, and you have a nice segfault or memory corruption bug).  Then you have stupid verbose syntax like:

	// You can't even write the for-loop conditions in a single
	// line!
	for (std::vector<MyType<Blah> >::iterator it =
		myContainer.start();
		it != myContainer.end();
		it++)
	{
		// What's with this (*smartPtr)->x nonsense everywhere?
		doSomething((*((*it)->impl)->myDataField);

		// What, I can't even write a simple X != Y if-condition
		// in a single line?! Not to mention the silly
		// redundancy of having to write out the entire chain of
		// dereferences to exactly the same object twice.
		if (find((*(*it)->impl)->mySubContainer, key) ==
			(*(*it)->impl)->mySubContainer.end())
		{
			// How I long for D's .init!
			std::vector<MyTypeBlah> >::iterator empty;
			return empty;
		}
	}

Whereas in D:

	foreach (item; myContainer) {
		doSomething(item.impl.myDataField);
		if (!item.mySubContainer.canFind(key))
			return ElementType!MyContainer.init;
	}

There's no comparison, I tell you. No comparison at all.

> > > I actually feel a lot more productive in D than in C++ with strings.  Boost's string algorithms library helps fill the gap (and at least you only have one place to look for documentation when you are using it) but overall I prefer my experience working in D with pseudo-member chains.
> >
> > I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested.
> >
> 
> Perhaps you're right. But I think there's ***HUGE*** room for improvement.  The key in your sentence is, it shouldn't require 'effort'; if it's not intuitive to programmers with decades of experience, then there are probably some fundamental design (or documentation/accessibility) deficiencies that needs to be prioritised. How is any junior programmer meant to take to D?

No offense, but IME, junior programmers tend to pick up these things much faster than experienced programmers with lots of baggage from other languages, precisely because they don't have all that baggage to slow them down. Old habits die hard, as they say.

That's not to say that the D docs don't need improvement, of course. But given all your objections about Phobos algorithms despite having barely *used* Phobos, I think the source of your difficulty lies more in the baggage than in the documentation. :)

T

-- 
Give me some fresh salted fish, please.

On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote: > On 10 January 2014 04:00, Adam D. Ruppe <destructionator@gmail.com> wrote: > >> On Thursday, 9 January 2014 at 17:54:05 UTC, Dicebot wrote: >> >>> It is not the same thing as sample with byGrapheme though. >>> >> >> Right, but it works for ascii (and others) and shows std.string isn't as >> weak as being said in this thread. >> > > So is it 'correct'? It is interesting that you ask this about the D code but not the C function, which is not correct, you're trying to mimic.

On 01/10/2014 02:19 AM, Brad Anderson wrote: > On Friday, 10 January 2014 at 00:52:27 UTC, H. S. Teoh wrote: >> >> <snip> >> >> So to summarize: >> (1) use sig constraints to define the scope of an overload; and >> (2) use static if inside the function body (or template body) to enforce >> type requirements within that scope. >> >> This solves the problem of needing the compiler to somehow read your >> mind and figure out exactly which of the 56 overloads of find() you >> intended to match but failed to. >> >> >> T > > Ok, you've convinced me. I still think highlighting which constraints > failed should happen but for well implemented modules like those in the > standard library your approach offers even more helpful and tight error > messages. static assert is not a good way to implement custom error messages because it also changes the behaviour of the declaration.

On 10 January 2014 12:40, Brad Anderson <eco@gnuk.net> wrote: > On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote: > >> The D docs are pretty terrible, they don't do much to help you find what >> you're looking for. >> You have a massive block of function names at the top of the page, you >> have >> >> to carefully scan through one by one, hoping that it's named something obvious that will stand out to you, and in the event it doesn't have a helper function, you need to work out the proper sequence of algorithm/range/whatever operations to do what you want (and then repeat the process finding the small parts you need across a bunch of modules). >> > > DDox improves on this a bit by giving a table with brief descriptions right up top: http://vibed.org/temp/dlang.org/library/std/string.html > > Still plenty left to do though. > I prefer this immeasurably.

On 1/9/14 12:53 PM, Craig Dillabaugh wrote: > At the very least the documentation for std.string should say something > along the lines of: > > "The libraries std.unicode and std.array also include a number of > functions that operate on strings, so if what you are looking for isn't > here, try looking there." Pull request please. Andrei

On 1/9/14 4:34 PM, Manu wrote: > On 10 January 2014 03:40, John Colvin <john.loughran.colvin@gmail.com > <mailto:john.loughran.colvin@gmail.com>> wrote: > > On Thursday, 9 January 2014 at 17:39:00 UTC, Adam D. Ruppe wrote: > > On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote: > > string y = find(retro("Hello"), 'H'); > > > import std.string; > auto idx = lastIndexOf("Hello", 'H'); > > Wow, that's unbelievable difficult. D sucks. > > > How on earth did I miss that... > > > I have to wonder the same thing. > It's just not anything like anything I've ever called it before I guess. > I guess I started with find, and then it refers you to retro if you want > to reverse find, and of course, by this time I'm nowhere near std.string > anymore. Hard to find something if you're not even looking in the same > file :/ Probably an xref of indexOf/lastIndexOf in find would be useful. PRP Andrei

On 1/9/14 6:00 PM, H. S. Teoh wrote: > On Fri, Jan 10, 2014 at 01:34:46AM +0000, Adam D. Ruppe wrote: > [...] >> Some code differences from the old days: >> >> * before: converting to and from string was in std.string. Functions >> like toInt, toString, etc. Nowadays, this is all done with >> std.conv.to. The new way is way cool, but a newbie's first place to >> look might be for std.string.toString rather than std.conv.to!string. > > Right, so it should be mentioned in std.string. > > But probably your idea of more concept-oriented overview pages is > better. It doesn't seem like the right solution to just insert > hyperlinks to std.conv in every other Phobos module. A tutorial on string manipulation in D would be awesome. Andrei

On 1/9/14 8:21 AM, Manu wrote: > My standing opinion is that string manipulation in D is not nice, it is > possibly the most difficult and time consuming I have used in any > language ever. Am I alone? No, but probably in the minority. The long and short of it is, you must get ranges in order to enjoy the power of D algorithms (as per http://goo.gl/dVprVT). std.{algorithm,range} are commonly mentioned as an attractive asset of D, and those who get that style of doing things have no trouble applying such notions to a variety of data, notably including strings. So going with the attitude "I don't use, know, or care for phobos... I just want to do this pesky string thing!" is bound to create frustration. I personally find strings very easy to deal with in D. They might be easier in Perl or sometimes Python, but at a steep efficiency cost. Walter has recently written a non-trivial utility that beats the pants off (3x performance) the equivalent C program that has been highly scrutinized and honed for literally decades by dozens (hundreds?) of professionals. Walter's implementations uses ranges and algorithms (a few standard, many custom) through and through. If all goes well we'll open-source it. He himself is now an range/algorithm convert, even though he'd be the first to point the no-nonsense nature of a function like strrchr. (And btw strrchr is after all a POS because it needs to scan the string left to right... so lastIndex is faster!) Andrei

Forums