October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | Daniel Gibson:
> But right now the point is: join() does something completely different and should be renamed (or deprecated in std.string and replaced by union() - a real join isn't needed in std.string anyway, but when join() is deprecated in std.string you can implement a real join in std.algorithm without causing too much confusion).
I like the std.string.join() function, in Python I use the str.join() method often... :-)
Bye,
bearophile
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Philippe Sigaud | Philippe Sigaud schrieb: > On Tue, Oct 12, 2010 at 03:28, Simen kjaeraas <simen.kjaras@gmail.com> wrote: >> Daniel Gibson <metalcaedes@gmail.com> wrote: >> >>> (*) Something like >>> Range!(Tuple!(T1, T2)) join(T1, T2)(Range!(T1) r1, Range!(T2) r2, >>> BinaryPredicate!(T1, T2) joinPred) >>> just pseudo-code, I'm not really familiar with D2 and std.algorithm. >>> The idea is you have a Range r1 with elements of type T1, a Range r1 with >>> elements of type T2 and a predicate that gets a T1 value and a T2 value and >>> returns bool if they match and in that case a Tuple with those two values is >>> part of the Range that is returned. >> Once again I see the combinatorial range in the background. Man, why does >> this have to be so hard? >> >> That is, your join could be implemented as follows, given the >> combinatorial product range combine: >> >> >> auto join( alias fun, R... )( R ranges ) if ( allSatisfy!( isForwardRange, R >> ) ) { >> return filter!fun( combine( ranges ); >> } > > And IIRC, there is a difference between outer join, inner join and > some other versions. > So > > filter!fun(zip(ranges)) > > (that is, filtering in parallel) is also a possibilty. I should read > some again on DB joints. > There is also the need for creating a range of ranges on this one > (aka, tensor product, but that scares people when I say that) > Anyway, that's derailing the thread, so I'll stop now. zip doesn't work here because it doesn't create a combinatorical/cartesian product[1] that (logically) is the foundation of a join[2], but just combines the first element of range one with the first element of range two, ... the i-th element of range one with the i-the element of range two etc inner join is the "normal" join, outer join means that, if a to-be-joined element has no "partner" in the other set (range), it's included in the output anyway with the partner having a NULL value. (This can be done for either the first, the second or both partners). natural join is like an inner join, but has no explicit predicate, the implicit predicate being that (in database tables) columns with equal names have to contain equal values. So natural joins are rather uninteresting for ranges I guess. [1] http://en.wikipedia.org/wiki/Cartesian_product // I called this cross product before, but "cross product" seems to be normally used for something else [2] http://en.wikipedia.org/wiki/Join_%28relational_algebra%29#Joins_and_join-like_operators | |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile schrieb:
> Daniel Gibson:
>
>> But right now the point is: join() does something completely different and should be renamed (or deprecated in std.string and replaced by union() - a real join isn't needed in std.string anyway, but when join() is deprecated in std.string you can implement a real join in std.algorithm without causing too much confusion).
>
> I like the std.string.join() function, in Python I use the str.join() method often... :-)
>
> Bye,
> bearophile
Then the name in python sucks as well :P
IMHO when using the word "join" in a programming context - especially when dealing with (kinds of) iterators, it should mean the relational algebra/database join and not some kind of concatenation.
But maybe I just had too many database lectures at university ;-)
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 10/11/2010 10:08 PM, bearophile wrote:
> Daniel Gibson:
>
>> But right now the point is: join() does something completely different and should be renamed (or
>> deprecated in std.string and replaced by union() - a real join isn't needed in std.string anyway,
>> but when join() is deprecated in std.string you can implement a real join in std.algorithm without
>> causing too much confusion).
>
> I like the std.string.join() function, in Python I use the str.join() method often... :-)
>
> Bye,
> bearophile
Most other languages call their equivalent function join. Renaming it would be confusing.
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | On 10/11/2010 08:57 PM, Daniel Gibson wrote:
> But right now the point is: join() does something completely different
> and should be renamed (or deprecated in std.string and replaced by
> union() - a real join isn't needed in std.string anyway, but when join()
> is deprecated in std.string you can implement a real join in
> std.algorithm without causing too much confusion).
I think union() is a worse name than join(). The discussion was to generalize within reason std.string.join, which is present under that name and with that functionality in many other languages and libraries.
Andrei
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu schrieb:
> On 10/11/2010 08:57 PM, Daniel Gibson wrote:
>> But right now the point is: join() does something completely different
>> and should be renamed (or deprecated in std.string and replaced by
>> union() - a real join isn't needed in std.string anyway, but when join()
>> is deprecated in std.string you can implement a real join in
>> std.algorithm without causing too much confusion).
>
> I think union() is a worse name than join(). The discussion was to generalize within reason std.string.join, which is present under that name and with that functionality in many other languages and libraries.
>
> Andrei
Okay, union does kind of suck, because it implies set semantics (and thus no ordering).
What about concat()?
It seems like join() is expected to work this way for strings.. but as a generic algorithm working on kind-of-cursors?
std.algorithm already has some operations that are also in the relational algebra (setDifference, setIntersection, setUnion, Filter, even Group (like in group by) etc), adding a join (as in relational algebra join) implementation would only make sense - but how are you gonna name that thing if join() is already taken for some kind of "concatenation with additional seperator"?
Sure, "setJoin" would be available, but having both join and setJoin doing completely different things would be confusing.
What about something like
char[] concat(char[][] words, char[] sep="") // or sep=null
in the string case and something equivalent in the ranges case?
Cheers,
- Daniel
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | On Monday 11 October 2010 20:34:41 Daniel Gibson wrote:
> Andrei Alexandrescu schrieb:
> > On 10/11/2010 08:57 PM, Daniel Gibson wrote:
> >> But right now the point is: join() does something completely different
> >> and should be renamed (or deprecated in std.string and replaced by
> >> union() - a real join isn't needed in std.string anyway, but when join()
> >> is deprecated in std.string you can implement a real join in
> >> std.algorithm without causing too much confusion).
> >
> > I think union() is a worse name than join(). The discussion was to generalize within reason std.string.join, which is present under that name and with that functionality in many other languages and libraries.
> >
> > Andrei
>
> Okay, union does kind of suck, because it implies set semantics (and thus
> no ordering).
>
> What about concat()?
> It seems like join() is expected to work this way for strings.. but as a
> generic algorithm working on kind-of-cursors?
> std.algorithm already has some operations that are also in the relational
> algebra (setDifference, setIntersection, setUnion, Filter, even Group
> (like in group by) etc), adding a join (as in relational algebra join)
> implementation would only make sense - but how are you gonna name that
> thing if join() is already taken for some kind of "concatenation with
> additional seperator"? Sure, "setJoin" would be available, but having both
> join and setJoin doing completely different things would be confusing.
>
> What about something like
> char[] concat(char[][] words, char[] sep="") // or sep=null
> in the string case and something equivalent in the ranges case?
>
> Cheers,
> - Daniel
Really. It's not that hard to have a function with a name that means different stuff in different contexts. join is an excellent name for what join() does. Yes, there are joins in database which are different, but so what? Nothing in std.algorithm has anything to do with databases. We may end up with a module that does, and maybe it'll have a join() function too, but that doesn't mean that std.algorithm can't have one. As others have pointed out, there are other languages which have a join() function which does essentially the same thing as the one in std.string. I say leave it as join(). It's a fine name, doesn't conflict with anything, and doesn't preclude the name being used in database code later.
- Jonathan M Davis
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | Jonathan M Davis schrieb: > On Monday 11 October 2010 20:34:41 Daniel Gibson wrote: >> Andrei Alexandrescu schrieb: >>> On 10/11/2010 08:57 PM, Daniel Gibson wrote: >>>> But right now the point is: join() does something completely different >>>> and should be renamed (or deprecated in std.string and replaced by >>>> union() - a real join isn't needed in std.string anyway, but when join() >>>> is deprecated in std.string you can implement a real join in >>>> std.algorithm without causing too much confusion). >>> I think union() is a worse name than join(). The discussion was to >>> generalize within reason std.string.join, which is present under that >>> name and with that functionality in many other languages and libraries. >>> >>> Andrei >> Okay, union does kind of suck, because it implies set semantics (and thus >> no ordering). >> >> What about concat()? >> It seems like join() is expected to work this way for strings.. but as a >> generic algorithm working on kind-of-cursors? >> std.algorithm already has some operations that are also in the relational >> algebra (setDifference, setIntersection, setUnion, Filter, even Group >> (like in group by) etc), adding a join (as in relational algebra join) >> implementation would only make sense - but how are you gonna name that >> thing if join() is already taken for some kind of "concatenation with >> additional seperator"? Sure, "setJoin" would be available, but having both >> join and setJoin doing completely different things would be confusing. >> >> What about something like >> char[] concat(char[][] words, char[] sep="") // or sep=null >> in the string case and something equivalent in the ranges case? >> >> Cheers, >> - Daniel > > Really. It's not that hard to have a function with a name that means different stuff in different contexts. join is an excellent name for what join() does. Yes, there are joins in database which are different, but so what? Nothing in std.algorithm has anything to do with databases. We may end up with a module that does, and maybe it'll have a join() function too, but that doesn't mean that std.algorithm can't have one. As others have pointed out, there are other languages which have a join() function which does essentially the same thing as the one in std.string. I say leave it as join(). It's a fine name, doesn't conflict with anything, and doesn't preclude the name being used in database code later. > > - Jonathan M Davis It's not about database code (and not primarily about strings or std.string), it's about std.algorithm code. It makes perfect sense to use database-like operations on arrays/containers/iterators (and thus ranges), see LINQ[1]. [1] http://en.wikipedia.org/wiki/LINQ | |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | On 10/11/2010 10:34 PM, Daniel Gibson wrote:
> Andrei Alexandrescu schrieb:
>> On 10/11/2010 08:57 PM, Daniel Gibson wrote:
>>> But right now the point is: join() does something completely different
>>> and should be renamed (or deprecated in std.string and replaced by
>>> union() - a real join isn't needed in std.string anyway, but when join()
>>> is deprecated in std.string you can implement a real join in
>>> std.algorithm without causing too much confusion).
>>
>> I think union() is a worse name than join(). The discussion was to
>> generalize within reason std.string.join, which is present under that
>> name and with that functionality in many other languages and libraries.
>>
>> Andrei
>
> Okay, union does kind of suck, because it implies set semantics (and
> thus no ordering).
>
> What about concat()?
> It seems like join() is expected to work this way for strings.. but as a
> generic algorithm working on kind-of-cursors?
I for one would expect join() in its relational sense to work on things quite a bit more structured than just ranges (there's need for indexes etc). Therefore, if relational join() will be introduced later, overloading will disambiguate it. There's no reason to worry.
Andrei
| |||
October 12, 2010 Re: improving the join function | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu schrieb:
> On 10/11/2010 10:34 PM, Daniel Gibson wrote:
>> Andrei Alexandrescu schrieb:
>>> On 10/11/2010 08:57 PM, Daniel Gibson wrote:
>>>> But right now the point is: join() does something completely different
>>>> and should be renamed (or deprecated in std.string and replaced by
>>>> union() - a real join isn't needed in std.string anyway, but when join()
>>>> is deprecated in std.string you can implement a real join in
>>>> std.algorithm without causing too much confusion).
>>>
>>> I think union() is a worse name than join(). The discussion was to
>>> generalize within reason std.string.join, which is present under that
>>> name and with that functionality in many other languages and libraries.
>>>
>>> Andrei
>>
>> Okay, union does kind of suck, because it implies set semantics (and
>> thus no ordering).
>>
>> What about concat()?
>> It seems like join() is expected to work this way for strings.. but as a
>> generic algorithm working on kind-of-cursors?
>
> I for one would expect join() in its relational sense to work on things quite a bit more structured than just ranges (there's need for indexes etc). Therefore, if relational join() will be introduced later, overloading will disambiguate it. There's no reason to worry.
>
> Andrei
Of course indexes would speed things up, but as mentioned before join() would work ok on almost(*) all ranges (with O(n^2) complexity) and a lot better on std.range.SortedRange.
Because the user would provide a predicate (that should use the same comparator that was used to sort the range) no additional structure (metadata like needed for natural join) would be needed.
(*) the inner range needs to be a FordwardRange so it can be traversed multiple times
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply