October 12, 2010 improving the join function | ||||
|---|---|---|---|---|
| ||||
I'm looking at http://d.puremagic.com/issues/show_bug.cgi?id=3313 and that got me looking at std.string.join, which currently has the sig: string join(in string[] words, string sep); A narrow fix: Char[] join(Char)(in Char[][] words, in Char[] sep) if (isSomeChar!Char); I think it's reasonable to assume that people would want to join things that aren't necessarily arrays of characters, so T could be pretty much any type. An obvious step towards generalization is: T[] join(T)(in T[][] items, T[] sep); But join doesn't really need random access for words - really, an input range should suffice. So a generally useful join, almost worth putting in std.algorithm, would be: ElementType!R1[] join(R1, R2)(R1 items, R2 sep) if (isInputRange!R1 && isForwardRange!R2 && is(ElementType!R2 : ElementType!R1); Notice how the separator must be a forward range because it gets spanned multiple times, whereas the items need only be an input range as they are spanned once. This is at the same time a very general and very precise interface. One thing is still bothering me: the array output type. Why would the "default" output range be an array? What can be done to make join() at the same time a general function and also one that works for strings the way the old join did? For example, if I want to join things into an already-existing buffer, or if I want to write them straight to a file, there's no way to do so without having an array allocation in the loop. I have a couple of ideas but I wouldn't want to bias yours. I also have a question from people who dislike Phobos. Was there a point in the changes of signature above where you threw your hands thinking, "do the darn string version already and cut all that crap!"? Thanks, Andrei | ||||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei: > One thing is still bothering me: the array output type. Why would the "default" output range be an array? The chain() function that returns a range is already present. > What can be done to make join() at the same time a general function and also one that works for strings the way the old join did? > I also have a question from people who dislike Phobos. Was there a point in the changes of signature above where you threw your hands thinking, "do the darn string version already and cut all that crap!"? Too much over-generalization is bad, and not just for D newbies. So std.string may contain wrappers specialized for strings. You may implement a generic std.algorithm.join, and then implement the std.string.join that uses just strings (the second argument may be a single char too) and calls std.algorithm.join for its implementation. Bye, bearophile | |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | > You may implement a generic std.algorithm.join, and then implement the std.string.join that uses just strings (the second argument may be a single char too) and calls std.algorithm.join for its implementation.
If you don't like that name collision, the std.algorithm one may be named joinRange or something else.
Bye,
bearophile
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 10/11/2010 08:02 PM, bearophile wrote:
>> You may implement a generic std.algorithm.join, and then implement the std.string.join that uses just strings (the second argument may be a single char too) and calls std.algorithm.join for its implementation.
>
> If you don't like that name collision, the std.algorithm one may be named joinRange or something else.
This is not a matter of name collision. The new join should be a backward-compatible generalization of the existing one, so it should just work for existing calls.
Andrei
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile schrieb:
> Andrei:
>
>> One thing is still bothering me: the array output type. Why would the "default" output range be an array?
>
> The chain() function that returns a range is already present.
>
>
>> What can be done to make join() at the same time a general function and also one that works for strings the way the old join did?
>
>> I also have a question from people who dislike Phobos. Was there a point in the changes of signature above where you threw your hands thinking, "do the darn string version already and cut all that crap!"?
>
> Too much over-generalization is bad, and not just for D newbies. So std.string may contain wrappers specialized for strings. You may implement a generic std.algorithm.join, and then implement the std.string.join that uses just strings (the second argument may be a single char too) and calls std.algorithm.join for its implementation.
>
> Bye,
> bearophile
I like that idea.
I don't like the name "join" - especially for general ranges.
When I hear join I think of database like joins. These may not be horribly interesting for strings but certainly are for general ranges (*).
union() or concat() would be better names for doing what std.string.join does.
(*) Something like
Range!(Tuple!(T1, T2)) join(T1, T2)(Range!(T1) r1, Range!(T2) r2, BinaryPredicate!(T1, T2) joinPred)
just pseudo-code, I'm not really familiar with D2 and std.algorithm.
The idea is you have a Range r1 with elements of type T1, a Range r1 with elements of type T2 and a predicate that gets a T1 value and a T2 value and returns bool if they match and in that case a Tuple with those two values is part of the Range that is returned.
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | Daniel Gibson <metalcaedes@gmail.com> wrote: > (*) Something like > Range!(Tuple!(T1, T2)) join(T1, T2)(Range!(T1) r1, Range!(T2) r2, BinaryPredicate!(T1, T2) joinPred) > just pseudo-code, I'm not really familiar with D2 and std.algorithm. > The idea is you have a Range r1 with elements of type T1, a Range r1 with elements of type T2 and a predicate that gets a T1 value and a T2 value and returns bool if they match and in that case a Tuple with those two values is part of the Range that is returned. Once again I see the combinatorial range in the background. Man, why does this have to be so hard? That is, your join could be implemented as follows, given the combinatorial product range combine: auto join( alias fun, R... )( R ranges ) if ( allSatisfy!( isForwardRange, R ) ) { return filter!fun( combine( ranges ); } -- Simen | |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | Daniel Gibson Wrote:
> bearophile schrieb:
> > Andrei:
> >
> >> One thing is still bothering me: the array output type. Why would the "default" output range be an array?
> >
> > The chain() function that returns a range is already present.
> >
> >
> >> What can be done to make join() at the same time a general function and also one that works for strings the way the old join did?
> >
> >> I also have a question from people who dislike Phobos. Was there a point in the changes of signature above where you threw your hands thinking, "do the darn string version already and cut all that crap!"?
> >
> > Too much over-generalization is bad, and not just for D newbies. So std.string may contain wrappers specialized for strings. You may implement a generic std.algorithm.join, and then implement the std.string.join that uses just strings (the second argument may be a single char too) and calls std.algorithm.join for its implementation.
> >
> > Bye,
> > bearophile
>
> I like that idea.
>
> I don't like the name "join" - especially for general ranges.
> When I hear join I think of database like joins. These may not be horribly interesting for strings
> but certainly are for general ranges (*).
> union() or concat() would be better names for doing what std.string.join does.
>
> (*) Something like
> Range!(Tuple!(T1, T2)) join(T1, T2)(Range!(T1) r1, Range!(T2) r2, BinaryPredicate!(T1, T2) joinPred)
> just pseudo-code, I'm not really familiar with D2 and std.algorithm.
> The idea is you have a Range r1 with elements of type T1, a Range r1 with elements of type T2 and a
> predicate that gets a T1 value and a T2 value and returns bool if they match and in that case a
> Tuple with those two values is part of the Range that is returned.
>
Yes£¬reference should learn java naming philosophy£¬
for non-English speaking countries ordinary programmers can easily use it£¬not every programmer is the master.
thanks
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Simen kjaeraas | On Tue, Oct 12, 2010 at 03:28, Simen kjaeraas <simen.kjaras@gmail.com> wrote:
> Daniel Gibson <metalcaedes@gmail.com> wrote:
>
>> (*) Something like
>> Range!(Tuple!(T1, T2)) join(T1, T2)(Range!(T1) r1, Range!(T2) r2,
>> BinaryPredicate!(T1, T2) joinPred)
>> just pseudo-code, I'm not really familiar with D2 and std.algorithm.
>> The idea is you have a Range r1 with elements of type T1, a Range r1 with
>> elements of type T2 and a predicate that gets a T1 value and a T2 value and
>> returns bool if they match and in that case a Tuple with those two values is
>> part of the Range that is returned.
>
> Once again I see the combinatorial range in the background. Man, why does this have to be so hard?
>
> That is, your join could be implemented as follows, given the combinatorial product range combine:
>
>
> auto join( alias fun, R... )( R ranges ) if ( allSatisfy!( isForwardRange, R
> ) ) {
> return filter!fun( combine( ranges );
> }
And IIRC, there is a difference between outer join, inner join and
some other versions.
So
filter!fun(zip(ranges))
(that is, filtering in parallel) is also a possibilty. I should read
some again on DB joints.
There is also the need for creating a range of ranges on this one
(aka, tensor product, but that scares people when I say that)
Anyway, that's derailing the thread, so I'll stop now.
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Simen kjaeraas | Simen kjaeraas schrieb:
> Daniel Gibson <metalcaedes@gmail.com> wrote:
>
>> (*) Something like
>> Range!(Tuple!(T1, T2)) join(T1, T2)(Range!(T1) r1, Range!(T2) r2, BinaryPredicate!(T1, T2) joinPred)
>> just pseudo-code, I'm not really familiar with D2 and std.algorithm.
>> The idea is you have a Range r1 with elements of type T1, a Range r1 with elements of type T2 and a predicate that gets a T1 value and a T2 value and returns bool if they match and in that case a Tuple with those two values is part of the Range that is returned.
>
> Once again I see the combinatorial range in the background. Man, why does
> this have to be so hard?
>
> That is, your join could be implemented as follows, given the
> combinatorial product range combine:
>
>
> auto join( alias fun, R... )( R ranges ) if ( allSatisfy!( isForwardRange, R ) ) {
> return filter!fun( combine( ranges );
> }
>
Yes, but if:
* at least the second input range is a sorted random access range join could be calculated a lot cheaper, especially on the (common) case that the predicate checks for equality (=> binary search)
* both ranges are sorted and the predicate checks for equality the join can even be done in linear time (instead of quadratic like when using a cross product/combinatorical product)
However for generic cases combine() would certainly be very helpful (on the other hand if there were a proper join() you could get combine() by just using a predicate that is always true).
But right now the point is: join() does something completely different and should be renamed (or deprecated in std.string and replaced by union() - a real join isn't needed in std.string anyway, but when join() is deprecated in std.string you can implement a real join in std.algorithm without causing too much confusion).
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tue, Oct 12, 2010 at 02:33, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > ElementType!R1[] join(R1, R2)(R1 items, R2 sep) > if (isInputRange!R1 && isForwardRange!R2 > && is(ElementType!R2 : ElementType!R1); > Notice how the separator must be a forward range because it gets spanned multiple times, whereas the items need only be an input range as they are spanned once. This is at the same time a very general and very precise interface. I like this and I've nothing against this signature, but I'm probably biased. When I look at this, I don't even look for the function name: the constraints (ie, the interface) is what catches my eye. > One thing is still bothering me: the array output type. Why would the "default" output range be an array? What can be done to make join() at the same time a general function and also one that works for strings the way the old join did? For example, if I want to join things into an already-existing buffer, or if I want to write them straight to a file, there's no way to do so without having an array allocation in the loop. I have a couple of ideas but I wouldn't want to bias yours. Let to my own, I'd make that a lazy Join struct range: an input range that delivers R1 elements one by one, interspersed with R2 elements. Hmm, now that I think a bit more, I was taking them both (or at least R1) to be ranges of ranges: join(["the","quick","red","fox"], " "). Man, it's 4 pm now, I'll stop. | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply