January 09, 2014
On Thursday, 9 January 2014 at 17:15:43 UTC, Regan Heath wrote:

clip

>
> In other words, why can't we alias or wrap the generic routines in std.string such that the expected operations are easy to find and do exactly what you'd expect, for strings.
>
> If someone is dealing with generic code where the ranges involved might be strings/arrays or might be something else of course they will call std.range functions, but if they are only dealing with strings there should be string specific functions for them to call - which may/may not use std.range or std.algorithm functions etc behind the scenes.
>
> R

I think this would be a nice solution.  I only use D for string processing rarely and as a result I always struggle a bit, because I can never remember where to go to look for things.  Happily, my most recent experience with it was fairly smooth.

A while ago I was trying to do something with splitter on a string and I ended up asking a question on D.learn.  I got into a very confusing debate because the person trying to help me thought I was using the splitter in std.array and I was using the one from another module (see the last few posts from here):

http://www.digitalmars.com/d/archives/digitalmars/D/learn/splitting_numbers_from_a_test_file_39448.html

It would be nice if std.string in D provided a nice, easy, string manipulation that swept most of the difficulties under the table, and provided links in the documentation to the functions they wrap for when people want to do more complex things.

January 09, 2014
On Thursday, 9 January 2014 at 18:57:26 UTC, Craig Dillabaugh wrote:
> A while ago I was trying to do something with splitter on a string and I ended up asking a question on D.learn. [...]
>
> It would be nice if std.string in D provided a nice, easy, string manipulation that swept most of the difficulties under the table

http://dlang.org/phobos/std_array.html#split

Note that std.array is publicly imported from std.string so this works:

void main() {
        import std.string;
        auto parts = "hello".split("l");

        import std.stdio;
        writeln(parts);
}


> provided links in the documentation to the functions they wrap for when people want to do more complex things.

Actually, when writing my D book, I decided to spend more time on the unicode stuff in strings than these basic operations, since I thought these were pretty straightforward.

But maybe the docs suck more than I thought. I learned most of D string stuff from Phobos1 which kept it all simple...
January 09, 2014
On 2014-01-09 17:35, Marco Leise wrote:

> I Phobos should follow OpenGL in this regard and use a
> prefix like `etc` for useful but not finalized modules, so
> early adapters can try out new modules compare them with any
> existing API in Phobos where applicable (e.g. streams,
> json, ...) and report any issues. I have a feeling that right
> now most modules are tested by 2 people prior to the merge,
> because they spent a life in obscurity.

That has been suggested before and the counter argument is that people will start using and complain when it's changed, even if it's in an experimental. Someone here said that the javax. packages originally was experimental packages to they continued to live in the javax namespace to avoid breaking changes.

-- 
/Jacob Carlborg
January 09, 2014
On 2014-01-09 18:20, Manu wrote:

> That's great and all, but it's no good if I have to pay for it (time and
> money!) even when that's not a requirement. I'm dealing with ascii right
> now.

There are couple of functions in std.ascii but not what you needed here.

-- 
/Jacob Carlborg
January 09, 2014
On 2014-01-09 15:07, Manu wrote:
> This works fine:
>    string x = find("Hello", 'H');
>
> This doesn't:
>    string y = find(retro("Hello"), 'H');
>    > Error: cannot implicitly convert expression (find(retro("Hello"),
> 'H')) of type Result!() to string
>
> Is that wrong? That seems to be how the docs suggest it should be used.

As other as said, the problem is that "find" returns a range, which is not implicitly convertible to "string". The main reason is to avoid temporary allocations when chaining algorithms.

If it was the other way around you would probably be complaining it wasn't efficient enough ;)

> On a side note, am I the only one that finds std.algorithm/std.range/etc
> for string processing really obtuse?
> I can rarely understand the error messages, so say it's better than STL
> is optimistic.
> Using std.algorithm and std.range to do string manipulation feels really
> lame to me.
> I hate looking through the docs of 3-4 modules to understand the
> complete set of useful string operations (std.string, std.uni,
> std.algorithm, std.range... at least).

You forgot std.array ;)

> I also find the names of the generic algorithms are often unrelated to
> the name of the string operation.
> My feeling is, everyone is always on about how cool D is at string, but
> other than 'char[]', and the builtin slice operator, I feel really
> unproductive whenever I do any heavy string manipulation in D.

You have built-in appending, concatenation, using strings in switch statements and so on.

> I also hate that I need to import at least 4-5 modules to do anything
> useful with strings... I feel my program bloating and cringe with every
> gigantic import that sources exactly one symbol.

I agree with you. I have built up a small library through out the years that basically allows me to only import a single module to do most string operations I need.

You probably don't like it but you could have a look at Tango as well. It contains two useful modules (for this case). One for handling arbitrary array operators and one for string operations.

tango.core.Array
tango.text.Util

https://github.com/SiegeLord/Tango-D2
http://siegelord.github.io/Tango-D2/

-- 
/Jacob Carlborg
January 09, 2014
On Thu, Jan 09, 2014 at 09:19:40PM +0100, Jacob Carlborg wrote:
> On 2014-01-09 17:35, Marco Leise wrote:
> 
> >I Phobos should follow OpenGL in this regard and use a
> >prefix like `etc` for useful but not finalized modules, so
> >early adapters can try out new modules compare them with any
> >existing API in Phobos where applicable (e.g. streams,
> >json, ...) and report any issues. I have a feeling that right
> >now most modules are tested by 2 people prior to the merge,
> >because they spent a life in obscurity.
> 
> That has been suggested before and the counter argument is that people will start using and complain when it's changed, even if it's in an experimental. Someone here said that the javax. packages originally was experimental packages to they continued to live in the javax namespace to avoid breaking changes.
[...]

Maybe instead of calling it 'etc' we should outright call it 'experimental'. If you have code like:

	import experimental.myawesomemodule;
	...

I doubt you'd object very much when you have to rename it to:

	import std.myawesomemodule;
	...

since the word 'experimental' staring you in the face every time you open up the file will be a constant nagging reminder that you're depending on something unstable, giving you motivation to want to move it to something stable as soon as you can.


T

-- 
"I speak better English than this villain Bush" -- Mohammed Saeed al-Sahaf, Iraqi Minister of Information
January 09, 2014
On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
> On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[...]
> >On a side note, am I the only one that finds std.algorithm/std.range/etc for string processing really obtuse?  I can rarely understand the error messages, so say it's better than STL is optimistic.
> 
> I absolutely hate the "does not match any template declaration" error. It's extremely unhelpful for figuring out what you need to do and anytime I try to do something fun with ranges I can expect to see it a dozen times.

Yeah, that error drives me up the wall too. I often get screenfuls of errors, dumping 25 or so overloads of some obscure Phobos internal function (like toImpl) as though an end-user would understand any of it. You have to parse all the sig constraints (and boy some of them are obscure), *understand* what they mean (which requires understanding how Phobos works internally), and *then* try to figure out, by elimination, which is the one that you intended to match, and why your code failed to match it.

I'm almost tempted to say that using sig constraints to differentiate between template overloads is a bad idea. Instead, consider this alternative implementation of toImpl:

	template toImpl(S,T)
		// N.B.: no sig constraints here
	{
		static if (... /* sig constraint conditions for overload #1 */)
		{
			S toImpl(T t)
			{
				// implementation here
			}
		}
		else static if (... /* sig constraint conditions for overload #2 */)
		{
			S toImpl(T t)
			{
				// implementation here
			}
		}
		...
		else // N.B.: user-readable error message
		{
			static assert(0, "Unable to convert " ~
				T.stringof ~ " to " ~ S.stringof);
		}
	}

By putting all overloads inside a single template, we can give a useful default message when no overloads match.

Alternatively, maybe sig constraints can have an additional string parameter that specifies a message that explains why that particular overload was rejected. These messages are not displayed if at least one overload matches; only if no overload matches, they will be displayed (so that the user can at least see why each of the overloads didn't match).


[...]
> >I also find the names of the generic algorithms are often unrelated to the name of the string operation.  My feeling is, everyone is always on about how cool D is at string, but other than 'char[]', and the builtin slice operator, I feel really unproductive whenever I do any heavy string manipulation in D.

Really?? I find myself much more productive, because I only have to learn one set of generic algorithms, and I can use them not just for strings but for all sorts of other stuff that implement the range API. Whereas in languages like C, sure you get familiar with string-specific functions, but then when you need a similar-operating function for an array of ints, you have to name it something else, and then basically the same algorithm reimplemented for linked lists, called by yet another name, etc.. Added together, it's many times more mental load than just learning a single set of generic algorithms that work on (almost) everything.

The composability of generic algorithms also allow me to think on a more abstract level -- instead of thinking about manipulating individual chars, I can figure out OK, if I split the string by "," then I can filter for the strings I'm looking for, then join them back again with another delimiter. Since the same set of algorithms work with other ranges too, I can apply exactly the same thought process for working with arrays, linked lists, and other containers, without having to remember 5 different names of essentially the same algorithm but applied to 5 different types.


> I actually feel a lot more productive in D than in C++ with strings. Boost's string algorithms library helps fill the gap (and at least you only have one place to look for documentation when you are using it) but overall I prefer my experience working in D with pseudo-member chains.

I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested.


T

-- 
Claiming that your operating system is the best in the world because more people use it is like saying McDonalds makes the best food in the world. -- Carl B. Constantine
January 09, 2014
Marco Leise <Marco.Leise@gmx.de> writes:

> Am Thu, 09 Jan 2014 15:20:13 +0000
> schrieb "John Colvin" <john.loughran.colvin@gmail.com>:
>

> The point about graphemes is good. D's functions still stop
> mid-way. From UTF-8 you can iterate UTF-32 code points, but
> grapheme clusters are the new characters. I.e. the basic need
> to iterate Unicode _characters_ is not supported!
> I cannot even come up with use cases for working with code
> points and think they are a conceptual black hole. Something
> carried over from a time when grapheme clusters didn't exist.

Actually, you can do tons of NLP without grapheme clusters.  If you're paranoid, you standardize on a specific Unicode normalization first.

You can probably get a bit better results by paying attention to clusters, but I suspect it will be a marginal improvement.

That said, I do agree with the OP that the string API is currently more complex to understand than I'd like.  However, it's significantly easier to use than what's in standard C++ for anything beyond ascii.

Jerry
January 09, 2014
On Thursday, 9 January 2014 at 19:05:19 UTC, Adam D. Ruppe wrote:
> On Thursday, 9 January 2014 at 18:57:26 UTC, Craig Dillabaugh wrote:
>> A while ago I was trying to do something with splitter on a string and I ended up asking a question on D.learn. [...]
>>
>> It would be nice if std.string in D provided a nice, easy, string manipulation that swept most of the difficulties under the table
>
> http://dlang.org/phobos/std_array.html#split
>
> Note that std.array is publicly imported from std.string so this works:
>
> void main() {
>         import std.string;
>         auto parts = "hello".split("l");
>
>         import std.stdio;
>         writeln(parts);
> }
>
>
>> provided links in the documentation to the functions they wrap for when people want to do more complex things.
>
> Actually, when writing my D book, I decided to spend more time on the unicode stuff in strings than these basic operations, since I thought these were pretty straightforward.
>
> But maybe the docs suck more than I thought. I learned most of D string stuff from Phobos1 which kept it all simple...

Thats the thing.  In most cases the correct way to do something in D, does end up being rather nice.  However, its often a bit of a challenge finding the that correct way!

When I had my troubles I expected to find the library solutions in std.string (remember I rarely use D's string processing utilities). It never really occurred to me that I might want to check std.array for the function I wanted. So what it std.array is imported when I import std.string, as a programmer I still had no idea 'split()' was there!

At the very least the documentation for std.string should say something along the lines of:

"The libraries std.unicode and std.array also include a number of functions that operate on strings, so if what you are looking for isn't here, try looking there."

January 09, 2014
On Thursday, 9 January 2014 at 20:40:33 UTC, H. S. Teoh wrote:
> On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
>> On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
> [...]
>> >On a side note, am I the only one that finds
>> >std.algorithm/std.range/etc for string processing really obtuse?  I
>> >can rarely understand the error messages, so say it's better than STL
>> >is optimistic.
>> 
>> I absolutely hate the "does not match any template declaration"
>> error. It's extremely unhelpful for figuring out what you need to do
>> and anytime I try to do something fun with ranges I can expect to
>> see it a dozen times.
>
> Yeah, that error drives me up the wall too. I often get screenfuls of
> errors, dumping 25 or so overloads of some obscure Phobos internal
> function (like toImpl) as though an end-user would understand any of it.
> You have to parse all the sig constraints (and boy some of them are
> obscure), *understand* what they mean (which requires understanding how
> Phobos works internally), and *then* try to figure out, by elimination,
> which is the one that you intended to match, and why your code failed to
> match it.
>
> I'm almost tempted to say that using sig constraints to differentiate
> between template overloads is a bad idea. Instead, consider this
> alternative implementation of toImpl:
>
> 	template toImpl(S,T)
> 		// N.B.: no sig constraints here
> 	{
> 		static if (... /* sig constraint conditions for overload #1 */)
> 		{
> 			S toImpl(T t)
> 			{
> 				// implementation here
> 			}
> 		}
> 		else static if (... /* sig constraint conditions for overload #2 */)
> 		{
> 			S toImpl(T t)
> 			{
> 				// implementation here
> 			}
> 		}
> 		...
> 		else // N.B.: user-readable error message
> 		{
> 			static assert(0, "Unable to convert " ~
> 				T.stringof ~ " to " ~ S.stringof);
> 		}
> 	}
>
> By putting all overloads inside a single template, we can give a useful
> default message when no overloads match.
>

Interesting and there is a lot of flexibility there. It does make the functions a lot more verbose though for something that is really the compiler's job (clearly describing errors).

> Alternatively, maybe sig constraints can have an additional string
> parameter that specifies a message that explains why that particular
> overload was rejected. These messages are not displayed if at least one
> overload matches; only if no overload matches, they will be displayed
> (so that the user can at least see why each of the overloads didn't
> match).
>

Each constraint would have a string? I think that would help for some of the more obscure constraints that aren't wrapped up in an eponymous template helper but I don't think it'd help with the problem generally because the problem is identifying which exact constraint failed.

Example:

    void main()
    {
      import std.algorithm, std.range;
      struct A { }
      auto a = recurrence!"n"(0).take(5).find(A());
    }

This is the error message you get:

---
/d14/f101.d(5): Error: template std.algorithm.find does not match any function template declaration. Candidates are:
/opt/compilers/dmd2/include/std/algorithm.d(3650):        std.algorithm.find(alias pred = "a == b", R, E)(R haystack, E needle) if (isInputRange!R && is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
/opt/compilers/dmd2/include/std/algorithm.d(3713):        std.algorithm.find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle) if (isForwardRange!R1 && isForwardRange!R2 && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) && !isRandomAccessRange!R1)
/opt/compilers/dmd2/include/std/algorithm.d(3749):        std.algorithm.find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle) if (isRandomAccessRange!R1 && isBidirectionalRange!R2 && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool))
/opt/compilers/dmd2/include/std/algorithm.d(3821):        std.algorithm.find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle) if (isRandomAccessRange!R1 && isForwardRange!R2 && !isBidirectionalRange!R2 && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool))
/opt/compilers/dmd2/include/std/algorithm.d(4053):        std.algorithm.find(alias pred = "a == b", Range, Ranges...)(Range haystack, Ranges needles) if (Ranges.length > 1 && is(typeof(startsWith!pred(haystack, needles))))
---

Where do you even begin with that flood of information? To fix it all you really want to see is which constraint you didn't satisfy. An error message like this would help greatly:

---
/d539/f571.d(5): Error: template std.algorithm.find call fails all constraints. Candidates are:
/opt/compilers/dmd2/include/std/algorithm.d:
  (3650) find(alias pred = "a == b", R, E)(R haystack, E needle):
              isInputRange!R
           && is(typeof(binaryFun!pred(haystack.front, needle)) : bool) <- FAILS
  (3713) find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle):
              isForwardRange!R1
           && isForwardRange!R2 <- FAILS
           && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool)
           && !isRandomAccessRange!R1
  (3749) find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle):
              isRandomAccessRange!R1 <- FAILS
           && isBidirectionalRange!R2
           && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool)
  (3821) find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle)
              isRandomAccessRange!R1 <- FAILS
           && isForwardRange!R2
           && !isBidirectionalRange!R2
           && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool)
  (4053) find(alias pred = "a == b", Range, Ranges...)(Range haystack, Ranges needles)
              Ranges.length > 1 <-- FAILS
           && is(typeof(startsWith!pred(haystack, needles)))

---

The NG line limit will probably mangle that and I'm assuming constraints are short-circuited. The exact appearance isn't as important as just pointing out the failing constraints as strongly as you can.

>
> [...]
>> >I also find the names of the generic algorithms are often unrelated
>> >to the name of the string operation.  My feeling is, everyone is
>> >always on about how cool D is at string, but other than 'char[]', and
>> >the builtin slice operator, I feel really unproductive whenever I do
>> >any heavy string manipulation in D.
>
> Really?? I find myself much more productive, because I only have to
> learn one set of generic algorithms, and I can use them not just for
> strings but for all sorts of other stuff that implement the range API.
> Whereas in languages like C, sure you get familiar with string-specific
> functions, but then when you need a similar-operating function for an
> array of ints, you have to name it something else, and then basically
> the same algorithm reimplemented for linked lists, called by yet another
> name, etc.. Added together, it's many times more mental load than just
> learning a single set of generic algorithms that work on (almost)
> everything.
>
> The composability of generic algorithms also allow me to think on a more
> abstract level -- instead of thinking about manipulating individual
> chars, I can figure out OK, if I split the string by "," then I can
> filter for the strings I'm looking for, then join them back again with
> another delimiter. Since the same set of algorithms work with other
> ranges too, I can apply exactly the same thought process for working
> with arrays, linked lists, and other containers, without having to
> remember 5 different names of essentially the same algorithm but applied
> to 5 different types.
>
>
>> I actually feel a lot more productive in D than in C++ with strings.
>> Boost's string algorithms library helps fill the gap (and at least
>> you only have one place to look for documentation when you are using
>> it) but overall I prefer my experience working in D with
>> pseudo-member chains.
>
> I found that what I got out of taking the time to learn std.algorithm
> and std.range was worth far more than the effort invested.
>

Agreed. Except for some hiccups and those terrible error messages I find std.algorithm and std.range to be a work of genius. I envy them every day while I'm stuck using C++ at work.

>
> T