October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu schrieb:
> I'm looking at http://d.puremagic.com/issues/show_bug.cgi?id=3313 and that got me looking at std.string.join, which currently has the sig:
>
> string join(in string[] words, string sep);
>
> A narrow fix:
>
> Char[] join(Char)(in Char[][] words, in Char[] sep)
> if (isSomeChar!Char);
>
> I think it's reasonable to assume that people would want to join things that aren't necessarily arrays of characters, so T could be pretty much any type. An obvious step towards generalization is:
>
> T[] join(T)(in T[][] items, T[] sep);
>
> But join doesn't really need random access for words - really, an input range should suffice. So a generally useful join, almost worth putting in std.algorithm, would be:
>
> ElementType!R1[] join(R1, R2)(R1 items, R2 sep)
> if (isInputRange!R1 && isForwardRange!R2
> && is(ElementType!R2 : ElementType!R1);
>
> Notice how the separator must be a forward range because it gets spanned multiple times, whereas the items need only be an input range as they are spanned once. This is at the same time a very general and very precise interface.
>
> One thing is still bothering me: the array output type. Why would the "default" output range be an array? What can be done to make join() at the same time a general function and also one that works for strings the way the old join did? For example, if I want to join things into an already-existing buffer, or if I want to write them straight to a file, there's no way to do so without having an array allocation in the loop. I have a couple of ideas but I wouldn't want to bias yours.
>
> I also have a question from people who dislike Phobos. Was there a point in the changes of signature above where you threw your hands thinking, "do the darn string version already and cut all that crap!"?
>
>
> Thanks,
>
> Andrei
Btw: Is "join" not just a (rather trivial) generalization of reduce?
auto inRange = ...; // range of char[]
char[] sep = " ";
auto joined = reduce!( (char[] res, char[] x) {return res~sep~x;}) (inRange);
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | On Tue, Oct 12, 2010 at 6:37 AM, Daniel Gibson <metalcaedes@gmail.com> wrote:
>
> Btw: Is "join" not just a (rather trivial) generalization of reduce?
>
> auto inRange = ...; // range of char[]
> char[] sep = " ";
> auto joined = reduce!( (char[] res, char[] x) {return res~sep~x;})
> (inRange);
>
Not generalization, I meant specialization. (I should probably go to bed.)
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | On Tue, 12 Oct 2010 00:47:33 -0400, Daniel Gibson <metalcaedes@gmail.com> wrote:
> On Tue, Oct 12, 2010 at 6:37 AM, Daniel Gibson <metalcaedes@gmail.com> wrote:
>>
>> Btw: Is "join" not just a (rather trivial) generalization of reduce?
>>
>> auto inRange = ...; // range of char[]
>> char[] sep = " ";
>> auto joined = reduce!( (char[] res, char[] x) {return res~sep~x;})
>> (inRange);
>>
>
> Not generalization, I meant specialization. (I should probably go to bed.)
Well, except for the N memory allocations. Also, for generic ranges you'd also want to use chain and not "~", but chain won't compose properly in a reduce.
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | On Mon, 11 Oct 2010 23:34:41 -0400, Daniel Gibson <metalcaedes@gmail.com> wrote: > Andrei Alexandrescu schrieb: >> On 10/11/2010 08:57 PM, Daniel Gibson wrote: >>> But right now the point is: join() does something completely different >>> and should be renamed (or deprecated in std.string and replaced by >>> union() - a real join isn't needed in std.string anyway, but when join() >>> is deprecated in std.string you can implement a real join in >>> std.algorithm without causing too much confusion). >> I think union() is a worse name than join(). The discussion was to generalize within reason std.string.join, which is present under that name and with that functionality in many other languages and libraries. >> Andrei > > Okay, union does kind of suck, because it implies set semantics (and thus no ordering). > > What about concat()? > It seems like join() is expected to work this way for strings.. but as a generic algorithm working on kind-of-cursors? > std.algorithm already has some operations that are also in the relational algebra (setDifference, setIntersection, setUnion, Filter, even Group (like in group by) etc), adding a join (as in relational algebra join) implementation would only make sense - but how are you gonna name that thing if join() is already taken for some kind of "concatenation with additional seperator"? > Sure, "setJoin" would be available, but having both join and setJoin doing completely different things would be confusing. > > What about something like > char[] concat(char[][] words, char[] sep="") // or sep=null > in the string case and something equivalent in the ranges case? > > Cheers, > - Daniel Regarding the bike shed, Well, std.range already has transversal( range_of_ranges , Nth) and frontTransversal(range_of_ranges). So there is some opportunity for both a transverse all elements, i.e. transversal( range_of_ranges ), and interleaved elements, i.e. transversal( range_of_ranges, separator ). | |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 10/12/2010 02:33 AM, Andrei Alexandrescu wrote:
> I'm looking at http://d.puremagic.com/issues/show_bug.cgi?id=3313 and
> that got me looking at std.string.join, which currently has the sig:
>
> string join(in string[] words, string sep);
>
> A narrow fix:
>
> Char[] join(Char)(in Char[][] words, in Char[] sep)
> if (isSomeChar!Char);
>
> I think it's reasonable to assume that people would want to join things
> that aren't necessarily arrays of characters, so T could be pretty much
> any type. An obvious step towards generalization is:
>
> T[] join(T)(in T[][] items, T[] sep);
>
> But join doesn't really need random access for words - really, an input
> range should suffice. So a generally useful join, almost worth putting
> in std.algorithm, would be:
>
> ElementType!R1[] join(R1, R2)(R1 items, R2 sep)
> if (isInputRange!R1 && isForwardRange!R2
> && is(ElementType!R2 : ElementType!R1);
>
> Notice how the separator must be a forward range because it gets spanned
> multiple times, whereas the items need only be an input range as they
> are spanned once. This is at the same time a very general and very
> precise interface.
>
> One thing is still bothering me: the array output type. Why would the
> "default" output range be an array? What can be done to make join() at
> the same time a general function and also one that works for strings the
> way the old join did? For example, if I want to join things into an
> already-existing buffer, or if I want to write them straight to a file,
> there's no way to do so without having an array allocation in the loop.
> I have a couple of ideas but I wouldn't want to bias yours.
>
> I also have a question from people who dislike Phobos. Was there a point
> in the changes of signature above where you threw your hands thinking,
> "do the darn string version already and cut all that crap!"?
>
>
> Thanks,
>
> Andrei
I think the function signature should be more of isInputRange!R1 && isInputRange(ElementType!R1), same with the is(). As the first one should be a range of ranges.
I think this should be a lazy range of ElementType!(ElementType!R1), or perhaps the common type. No reason to be overly eager :-)
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | Daniel Gibson <metalcaedes@gmail.com> wrote: > inner join is the "normal" join, outer join means that, if a to-be-joined element has no "partner" in the other set (range), it's included in the output anyway with the partner having a NULL value. (This can be done for either the first, the second or both partners). > natural join is like an inner join, but has no explicit predicate, the implicit predicate being that (in database tables) columns with equal names have to contain equal values. So natural joins are rather uninteresting for ranges I guess. Natural join could easily be done in D for ranges of structs or classes. (not sure how it would cope with polymorphism, though) It's trivial to automatically generate a predicate that uses __traits( allMembers ) to check that all fields with the same name have the same value (and even to statically decline natural join on types with eponymous fields of incompatible types). -- Simen | |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Philippe Sigaud | On 10/11/10 21:05 CDT, Philippe Sigaud wrote:
> On Tue, Oct 12, 2010 at 02:33, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>> One thing is still bothering me: the array output type. Why would the
>> "default" output range be an array? What can be done to make join() at the
>> same time a general function and also one that works for strings the way the
>> old join did? For example, if I want to join things into an already-existing
>> buffer, or if I want to write them straight to a file, there's no way to do
>> so without having an array allocation in the loop. I have a couple of ideas
>> but I wouldn't want to bias yours.
>
> Let to my own, I'd make that a lazy Join struct range: an input range
> that delivers R1 elements one by one, interspersed with R2 elements.
> Hmm, now that I think a bit more, I was taking them both (or at least
> R1) to be ranges of ranges: join(["the","quick","red","fox"], " ").
> Man, it's 4 pm now, I'll stop.
You must mean 4am :o). The abstraction you talk about is already implemented in std.algorithm.joiner(). Here I'm discussing eager join.
Andrei
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | On 10/11/10 23:37 CDT, Daniel Gibson wrote:
> Btw: Is "join" not just a (rather trivial) generalization of reduce?
>
> auto inRange = ...; // range of char[]
> char[] sep = " ";
> auto joined = reduce!( (char[] res, char[] x) {return res~sep~x;})
> (inRange);
It is, but things are a bit messed up by empty ranges.
auto joined = inRange.empty
? reduce!( (char[] res, char[] x) {return res~sep~x;})(inRange)
: "":
Andrei
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | On 10/11/10 23:00 CDT, Daniel Gibson wrote: > Of course indexes would speed things up, but as mentioned before join() > would work ok on almost(*) all ranges (with O(n^2) complexity) and a lot > better on std.range.SortedRange. > Because the user would provide a predicate (that should use the same > comparator that was used to sort the range) no additional structure > (metadata like needed for natural join) would be needed. > > (*) the inner range needs to be a FordwardRange so it can be traversed > multiple times From http://www.hookedonlinq.com/JoinOperator.ashx (see the "loop count" section), the way it works is not O(n*n); an index is created automatically. Andrei | |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Pelle | On 10/12/2010 01:25 AM, Pelle wrote: > I think the function signature should be more of isInputRange!R1 && > isInputRange(ElementType!R1), same with the is(). As the first one > should be a range of ranges. Correct. I figured out my mistake when I started playing with an implementation. > I think this should be a lazy range of ElementType!(ElementType!R1), or > perhaps the common type. No reason to be overly eager :-) That's already present, see std.algorithm.joiner(). The problem with joiner() is that it's rather slow - there are a few tests for each element iterated. An eager join() is still necessary. Andrei | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply