June 09, 2015

On 06/09/2015 10:56 AM, luminousone via Digitalmars-d wrote:
> On Tuesday, 9 June 2015 at 17:05:19 UTC, Andrei Alexandrescu wrote:
>> My work on allocators takes the last turn before the straight line. I've arranged with Dicebot to overlap the review period with finalizing details so I can act on feedback quickly.
>>
>> After that I'm ready for some major library work, and I had two things in mind.
>>
>> One would be a good pass of std.container, in particular (a) a design review with the DbI glasses on; (b) better documentation - sadly it seems to me so inadequate as to make containers themselves unusable; (c) investigate use of UFCS - std.container's design predates UFCS yet is a perfect fit for it, and most likely other cool language improvements we've added since.
>>
>> The other would be database connectivity. Erik Smith has shown some cool ideas at DConf, and I encourage him to continue working on them, but it seems to me this is an area where more angles mean more connectivity options.
>>
>> For database connectivity I'm thinking of using ODBC. What I see is that on all major platforms, vendors offer mature, good quality ODBC drivers, and most programs that have anything to do with databases offer ODBC connectivity. So connecting with ODBC means the individual database drivers are already there; no need to waste effort on creating drivers for each (or asking vendors to, which we can't afford).
>>
>> So I gave myself ten minutes the other night just before I went to sleep to see if I can get an ODBC rig on my OSX machine starting from absolutely nothing. I got http://www.odbcmanager.net but then got confused about where to find some dumb driver (text, csv) and gave up.
>>
>> Last night I gave myself another ten minutes, and lo and behold I got up and running. Got a demo CSV driver from http://www.actualtech.com/product_access.php (which also supports Access and others). Then I headed to http://www.easysoft.com/developer/languages/c/odbc_tutorial.html and was able to run a simple ODBC application that lists the available drivers. Nice!
>>
>> It's trivial work to convert the C headers to D declarations. Then it's straight library design to offer convenient libraries on top of the comprehensive but pedestrian ODBC C API. Then, voilĂ  - we'll have database connectivity for all databases out there!
>>
>> Please help me choose what to work on next.
>>
>>
>> Andrei
>
> I would think a good database lib would depend on a good container lib, So IMO, the order is self evident.
>
> 1000+ to containers.
>
I'd say what it depends on is good serialization.  If you extend containers to include disk based B+Trees then I'd agree with you, otherwise not.
June 09, 2015
On 6/9/15 10:44 AM, Daniel Kozak via Digitalmars-d wrote:
>
> On Tue, 09 Jun 2015 10:05:24 -0700
> Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>
>> For database connectivity I'm thinking of using ODBC. What I see is that
>> on all major platforms, vendors offer mature, good quality ODBC drivers,
>> and most programs that have anything to do with databases offer ODBC
>> connectivity. So connecting with ODBC means the individual database
>> drivers are already there; no need to waste effort on creating drivers
>> for each (or asking vendors to, which we can't afford).
>
> Having ODBC support in D is definitely important for some kind of applications.
> But for most applications which works with some (kind of) database it does not
> scale. We really need individual drivers for each of the most popular databases
> (even as a C/C++ lib with d binding around it).

I derive the exact opposite conclusion from the same facts.

* Individual drivers for each database engine: spend effort on designing an API, THEN spend effort on writing or adapting (and then maintaining) one driver per database engine. That has "does not scale" written all over it.

* ODBC: design an API on top of ODBC, then ENJOY all the hard work various database engines have put into their drivers. That scales.


Andrei
June 09, 2015
On Tuesday, 9 June 2015 at 17:05:19 UTC, Andrei Alexandrescu wrote:
> [snip]
> One would be a good pass of std.container, in particular (a) a design review with the DbI glasses on; (b) better documentation - sadly it seems to me so inadequate as to make containers themselves unusable; (c) investigate use of UFCS - std.container's design predates UFCS yet is a perfect fit for it, and most likely other cool language improvements we've added since.
>
> [snip]

Containers without a doubt. They are so fundamental to my day to day programming it's amazing that D has gone for so long without a good set of them (dynamic arrays and associative arrays do go a long way though).

I might as well share some thoughts on it.

Containers that should be in the standard library in order of preference (from my C++ STL and Boost experience primarily):

1.  vector/array
2.  hash map
3.  hash set
4.  flat map
5.  flat set
6.  map
7.  set
8.  deque
9.  stack
10. queue
11. linked list
12. hash multimap
13. hash multiset
14. flat multimap
15. flat multiset
16. multimap
17. multiset
18. priority queue

We already have Array for 1 but the interface needs improvement (I seem to recall having trouble with mutating items during iteration or something like that which caused me to go back to dynamic arrays). I don't think I like that it's RefCounted either. r-value references have made C++'s vector much more pleasant to efficiently work with in C++. We also have map in the form of RedBlackTree. I think you might be able to use RedBlackTree for a set too but I haven't tried it. I think RedBlackTree isn't a very good name though. It's long and is an implementation detail.

We have hash maps in the form of the built-in associative arrays but something that uses the allocators rather than GC is needed.

"Flat" refers to containers that are backed by sorted arrays rather than something like a Red-Black Tree. Insertion speed suffers but in practice, thanks to cache locality and move semantics and fast contiguous memory copying, they are generally much faster than the traditional map backed by something like a red-black tree for the majority of operations. Andrei put them in loki so I'm preaching to the choir here but for anyone that doesn't know about their advantages here's an article: http://lafstern.org/matt/col1.pdf and the Boost container library's rationale: http://www.boost.org/doc/libs/1_56_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx

Chandler Carruth spent a lot of time talking about this stuff at CppCon 2014 as well: https://www.youtube.com/watch?v=fHNmRkzxHWs

As far as more exotic containers go, I rather like Boost.MultiIndex which lets you have a container with several different characteristics (e.g., a single container where you can look up based on name, based on id, based on insertion order, based on lexicographic order of name, etc.). You basically design the perfect container for your needs ("I need random access and fast lookups and a sorted list and bidirectional lookups between keys and values"). It's the swiss army knife of containers. I don't need it often but when I do I love it. I believe someone said they implemented it for D but I haven't looked into it.

Although less versatile, building multi-indexing into most containers could prove useful too. Spitballing an example with plenty of room for improvement:

struct A { string name; int id; }

Map!(A, ((a, b) => a.name < b.name),
        ((a, b) => a.id < b.id)) a_map;
a_map.insert(A("bob", 7));
assert(a_map.index!0["bob"].id == 7);
assert(a_map.index!1[7].name == "bob");


A trie would be nice to have even though I rarely use them.

Boost has a stable_vector as well which is basically just an array that uses indirection to keep iterators valid through insertions/deletions. I could see it being useful but I haven't used it.
June 09, 2015
On Tue, 09 Jun 2015 10:05:24 -0700, Andrei Alexandrescu wrote:

> One would be a good pass of std.container, in particular (a) a design review with the DbI glasses on; (b) better documentation - sadly it seems to me so inadequate as to make containers themselves unusable; (c) investigate use of UFCS - std.container's design predates UFCS yet is a perfect fit for it, and most likely other cool language improvements we've added since.
> 
> The other would be database connectivity. Erik Smith has shown some cool ideas at DConf, and I encourage him to continue working on them, but it seems to me this is an area where more angles mean more connectivity options.

My vote would be for containers.  At EMSI we do make heavy use of MySQL but have our own idiomatic D wrapper around the C library and would have little incentive to switch to a generic ODBC implementation.
June 09, 2015
On Tue, 09 Jun 2015 11:41:37 -0700
Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> On 6/9/15 10:44 AM, Daniel Kozak via Digitalmars-d wrote:
> >
> > On Tue, 09 Jun 2015 10:05:24 -0700
> > Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> >>
> >> For database connectivity I'm thinking of using ODBC. What I see is that on all major platforms, vendors offer mature, good quality ODBC drivers, and most programs that have anything to do with databases offer ODBC connectivity. So connecting with ODBC means the individual database drivers are already there; no need to waste effort on creating drivers for each (or asking vendors to, which we can't afford).
> >
> > Having ODBC support in D is definitely important for some kind of applications. But for most applications which works with some (kind of) database it does not scale. We really need individual drivers for each of the most popular databases (even as a C/C++ lib with d binding around it).
> 
> I derive the exact opposite conclusion from the same facts.
> 
> * Individual drivers for each database engine: spend effort on designing an API, THEN spend effort on writing or adapting (and then maintaining) one driver per database engine. That has "does not scale" written all over it.
> 
> * ODBC: design an API on top of ODBC, then ENJOY all the hard work various database engines have put into their drivers. That scales.
> 
> 
> Andrei

Yep this is the other side of a coin :), and I agree with that. But I do not belive that performance and features would be on same level as individual connectors.
June 09, 2015
On Tuesday, 9 June 2015 at 17:05:19 UTC, Andrei Alexandrescu wrote:
> Please help me choose what to work on next.

> One would be a good pass of std.container, in particular (a) a design review with the DbI glasses on; (b) better documentation - sadly it seems to me so inadequate as to make containers themselves unusable; (c) investigate use of UFCS - std.container's design predates UFCS yet is a perfect fit for it, and most likely other cool language improvements we've added since.

No plans to use std.allocator? I think containers are the next logical step after allocators, and will also serve as a proving ground for the allocator API.

One thing I'd like to note is that for any database connectivity solution to be future-proof, it needs to be based on (or compatible with) async I/O. Seeing as we still don't have async I/O in Phobos, I don't know how much that would influence the implementation.

There are some database drivers in the Vibe.d ecosystem, but my knowledge of these extends to just the above fact.

June 09, 2015
On 6/9/15 4:40 PM, Vladimir Panteleev wrote:
> On Tuesday, 9 June 2015 at 17:05:19 UTC, Andrei Alexandrescu wrote:
>> Please help me choose what to work on next.
>
>> One would be a good pass of std.container, in particular (a) a design
>> review with the DbI glasses on; (b) better documentation - sadly it
>> seems to me so inadequate as to make containers themselves unusable;
>> (c) investigate use of UFCS - std.container's design predates UFCS yet
>> is a perfect fit for it, and most likely other cool language
>> improvements we've added since.
>
> No plans to use std.allocator? I think containers are the next logical
> step after allocators, and will also serve as a proving ground for the
> allocator API.

I agree. In fact RedBlackTree from dcollections used an allocator, and the apparatus is still pretty much in std.container.rbtree.

Part of me hoped to make another stab at getting dcollections suitable for inclusion in Phobos (much of my design philosophy has changed over the last few years), but I don't think I have the cycles :(

-Steve
June 09, 2015
On Tuesday, 9 June 2015 at 17:05:19 UTC, Andrei Alexandrescu wrote:
> Please help me choose what to work on next.

Well, containers matter for almost every program, whereas databases don't. They matter for a lot of programs, but many programs don't have them. And we already have std.container, even if it needs more work. So, I think that it would make more sense if we std.container were completed than to emark on std.database. And we've been saying for years now (be it right or wrong) that one of the main reasons that we weren't doing more with std.container is that we were expecting to have to change it after std.allocator was done. So, it seems well past time that std.container gets sorted out. Of course, that doesn't necessarily mean that you need to be the one who does it, but you are the one who started it, so it would make sense if you tackled it next.

- Jonathan M Davis
June 09, 2015
On 6/9/15 1:56 PM, Steven Schveighoffer wrote:
> On 6/9/15 4:40 PM, Vladimir Panteleev wrote:
>> On Tuesday, 9 June 2015 at 17:05:19 UTC, Andrei Alexandrescu wrote:
>>> Please help me choose what to work on next.
>>
>>> One would be a good pass of std.container, in particular (a) a design
>>> review with the DbI glasses on; (b) better documentation - sadly it
>>> seems to me so inadequate as to make containers themselves unusable;
>>> (c) investigate use of UFCS - std.container's design predates UFCS yet
>>> is a perfect fit for it, and most likely other cool language
>>> improvements we've added since.
>>
>> No plans to use std.allocator? I think containers are the next logical
>> step after allocators, and will also serve as a proving ground for the
>> allocator API.
>
> I agree. In fact RedBlackTree from dcollections used an allocator, and
> the apparatus is still pretty much in std.container.rbtree.

Interesting, I didn't know about that.

> Part of me hoped to make another stab at getting dcollections suitable
> for inclusion in Phobos (much of my design philosophy has changed over
> the last few years), but I don't think I have the cycles :(

Regarding projects that we discussed you are considering, I suggest we focus on putting std.container in good shape and opt for a redesign only if there are great benefits. Also, I think we should stay with libc-based I/O.


Andrei

June 09, 2015
On Tuesday, 9 June 2015 at 20:17:01 UTC, Daniel Kozak wrote:
> Yep this is the other side of a coin :), and I agree with that. But I do not
> belive that performance and features would be on same level as individual
> connectors.

I confess that what little I've heard about ODBC, it's a terrible idea to use it, but I'm not much of a database guy, so I'm not really in a position to judge. Regardless, it would probably make sense to support both ODBC _and_ the individual drivers. Yes, that's more work, but it provides greater flexibility as well, and we could start with ODBC support so that we have _something_ and add support for individual drivers later so that we get the better performance. Then folks can use whichever they prefer, and we get something working sooner rather than later, even if it takes longer to get it all.

- Jonathan M Davis