March 03, 2016
On Thursday, 3 March 2016 at 01:49:22 UTC, Chris Wright wrote:
> If you're trying to connect to a SQL database or a document database, as I'd expect for something called "std.database"

The thing is I strongly encourage to not reserve std.database for external database clients and even what is more limiting to SQL ones only.

> , it's a pretty terrible API:
>  * No index scans, lookups, or range queries.

Indexes can be supported by strings and CTFE, can't they?
e.g.
filter!q{item.elements.length < 10 && item.model == "Sport"}

>  * No support for complex queries.

Not sure what you mean by complex queries. Also I think the API allows arbitrary complex queries.

>  * No support for joins.

Can be done by @attributes or other linking functionality between DbCollections.

>  * No support for projections.

You mean something like referring to part of the item's fields?
I see no problem here.

>  * No support for transactions.
>  * If you add support for transactions, you'll crash all the time because
> the transactions got too large, thanks to the full table scan mentality.

Isn't it just the "index support" case?

>  * In your implementation, updates must bring every affected row over the
> wire, then send back the modified row.

In my implementation there is no wire (that's why I call it embedded). However I thought we talk about API and not particular implementation. I don't see how this API excludes RPC. Query strings (e.g. SQL) can be provided in old fashioned way.

>  * Updates affect an entire row. If one process updates one field in a
> row and another one updates a different field, one of those writes gets
> clobbered.

I think this is just a "must have" for any db engine. I don't see how it applies to the proposed API other than any implementation of db engine has to handle it properly.

When I say DbCollection should behave similar to an ordinal array I don't mean it should be an ordinal array.

>  * The API assumes a total ordering for each DbCollection. This is not valid.

I don't know what you mean here. Example would be good.

>  * If there are multiple rows that compare as equals, there's no way to
> update only one of them in your implementation.
>  * In your implementation, updating one row is a ϴ(N) operation. It still
> costs ϴ(N) when the row you want to update is the first one in the collection.

I'm still not sure if you are referring to my implementation or hypothetical API. To be clear: my current implementation is still proof of concept and surly *unfinished*. And in case you refer to my implementation I plan to support O(1), O(log n) and O(n) access patterns with its "rights and duties".

Cheers,
Piotrek


March 03, 2016
On Thursday, 3 March 2016 at 15:07:43 UTC, Kagamin wrote:
> Also member names: methods are named after verbs, you use nouns. Method `next` is ambiguous: is the first row the next row? `fetch` or `fetchRow` would be better.

Those are actually internal methods not intended for the interface user.  They might get exposed in some way to the implementer so I'll have to consider this issue.

However, If I apply what you are suggesting to the the interface methods, it would look like this:

auto r = db.connection().statement("select from t").range();  // nouns

auto r = db.getConnection().getStatement("select from t").getRange();  // verbs

Maybe there is a more creative way to change the names rather than prepending with get (let me know if you have suggestions), but the nouns seem cleaner to me.  I'll have to look at more phobos code to get a better sense of the conventions.  Also range will be replaced by an opSlice operator.

erik






March 03, 2016
On Thursday, 3 March 2016 at 11:16:03 UTC, Dejan Lekic wrote:
> On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
>> I'm back to actively working on a std.database specification & implementation.  It's still unstable, minimally tested, and there is plenty of work to do, but I wanted to share an update on my progress.
>
> I suggest you call the package stdx.db - it is not (and may not become) a standard package, so `std` is out of question. If it is supposed to be *proposed* as standard package, then `stdx` is good because that is what some people have used in the past (while others used the ugly std.experimental. for the same purpose).
>
> I humbly believe that this effort **must** be collaborative as such package is doomed to fail if done wrong.

std.experimental, ugly or not, is what is in phobos.
See std.experimental.allocator, std.experimental.logger and std.experimental.ndslice
March 03, 2016
On Wednesday, 2 March 2016 at 17:13:32 UTC, Erik Smith wrote:
> There are a number of areas where this design is an improvement over DDBC: ease-of-use, better resource management (no scope, no GC), phobos compatibility, to name a few.  There is a lot more that needs to be added to make it standards grade.

I agree with you we need database manipulation in Phobos. However modules like db, gui, xml or similar are too much work for a one developer. And as you can see from time to time there apears someone with its own vision.

That's why, long time ago, I suggested DIP73 (http://wiki.dlang.org/DIP73) so the collaborative work would be controlled by the D community (or the D foundation). But I am aware that there is no agreement nor resources for that.

> Your engine project is interesting.  I think there is common ground in the interfaces in the two projects, particularly in how the interface for the results might work. I will look more closely at the details to see what might be workable.
>
> erik

I agree that we (as a community) should work on common and effective APIs. Maybe when D foundation is big enough...

Piotrek

March 03, 2016
On Thursday, 3 March 2016 at 15:53:28 UTC, Erik Smith wrote:
> auto r = db.connection().statement("select from t").range();  // nouns

db.execute("select from t").range();
`range` is probably ok.
Or
auto connection = db.createConnection();
connection.execute("select from t").range();

> auto r = db.getConnection().getStatement("select from t").getRange();  // verbs

Getters must be actually nouns, but these are not getters, but factory methods: createConnection, createStatement, probably low level and/or private.
March 03, 2016
Other options:
db.execute("select from t").reader; //row range
db.execute("select from t").get!long; //scalar
db.execute("select from t"); //non-query
March 03, 2016
On the other hand execute can simply return the reader with extra getter for scalar result. Just don't do stuff until it's iterated. Is it possible?

auto rows = db.execute("select * from t");
March 03, 2016
On Thursday, 3 March 2016 at 16:08:03 UTC, Piotrek wrote:
> I agree with you we need database manipulation in Phobos. However modules like db, gui, xml or similar are too much work for a one developer. And as you can see from time to time there apears someone with its own vision.
>
> That's why, long time ago, I suggested DIP73 (http://wiki.dlang.org/DIP73) so the collaborative work would be controlled by the D community (or the D foundation). But I am aware that there is no agreement nor resources for that.
>
> I agree that we (as a community) should work on common and effective APIs. Maybe when D foundation is big enough...


Your process proposal (DIP73) was helpful and gives me a better perspective on the standardization process. Thanks for referencing that Piotrek.  You are right that areas like this are too much work for one developer.  The only leverage I might have is a lot of  familiarity working with many of the native C client interfaces and experience implementing higher level interfaces on top.  That and is also a lot of existing work to draw on that can inform the design.  Also, my sense is that while there is less process for standardization in D at present, the rate at which progress can occur should be much higher (compared to ISOCCP, for example).   The last thing I want, however, is to get something accepted into std.experimental that is highly contentious or of subpar quality.



March 03, 2016
On Thursday, 3 March 2016 at 17:03:58 UTC, Kagamin wrote:
> On Thursday, 3 March 2016 at 15:53:28 UTC, Erik Smith wrote:
>> auto r = db.connection().statement("select from t").range();  // nouns
>
> db.execute("select from t").range();
> `range` is probably ok.
> Or
> auto connection = db.createConnection();
> connection.execute("select from t").range();
>
>> auto r = db.getConnection().getStatement("select from t").getRange();  // verbs
>
> Getters must be actually nouns, but these are not getters, but factory methods: createConnection, createStatement, probably low level and/or private.

Good point.   I will track this as a design option to debate.

> db.execute("select from t").reader; //row range
> db.execute("select from t").get!long; //scalar
> db.execute("select from t"); //non-query

More good options (the 3rd one is there).   Also at the value access level there are several options to consider: v.get!long, v.as!long, v.to!long, etc.

> On the other hand execute can simply return the reader with extra getter for scalar
> result. Just don't do stuff until it's iterated. Is it possible?
> auto rows = db.execute("select * from t");

I'm hedging a bit on this because there are other capabilities that need to be introduced that might present a problem.   Another issue is that this might conflict with the notion of a container and the use of opSlice.  Great feedback though and I'm tracking it.

erik


March 03, 2016
On Thu, 03 Mar 2016 15:50:04 +0000, Piotrek wrote:

> On Thursday, 3 March 2016 at 01:49:22 UTC, Chris Wright wrote:
>> If you're trying to connect to a SQL database or a document database, as I'd expect for something called "std.database"
> 
> The thing is I strongly encourage to not reserve std.database for external database clients and even what is more limiting to SQL ones only.
> 
>> , it's a pretty terrible API:
>>  * No index scans, lookups, or range queries.
> 
> Indexes can be supported by strings and CTFE, can't they?
> e.g.
> filter!q{item.elements.length < 10 && item.model == "Sport"}

You were a bit vague before. I interpreted you as saying "just offer a range and an array-like API, and then you can use it with std.algorithm". But if you meant to offer an API that is similar to std.algorithm and also array-like, that's more feasible.

You're still left with the task of transpiling D to SQL.

This model does not work with CouchDB.

You must avoid using std.algorithm and std.range functions assiduously because they would offer terrible performance.

>>  * No support for complex queries.
> 
> Not sure what you mean by complex queries. Also I think the API allows arbitrary complex queries.

Aggregates, especially with joins. Computed fields.

>>  * No support for joins.
> 
> Can be done by @attributes or other linking functionality between DbCollections.

With attributes, you need users to define aggregate types instead of just using Row and the like. That's ORM territory. At a previous job I maintained an internal BI site that exposed 50-100 different queries, each with their own set of result fields. We didn't want to use ORM there; it would have been cumbersome and inappropriate.

Also, that assumes that you will always want a join when querying a table. I maintained an application once, using ORM, in which we sometimes wanted an eager join and sometimes wanted a lazy one. This posed a nontrivial performance impact.

I'm not sure ORM would be a candidate for phobos.

>>  * No support for projections.
> 
> You mean something like referring to part of the item's fields? I see no problem here.

Let me point you to the existence of the TEXT and BLOB datatypes. They can each hold 2**32 bytes of data in MySQL.

I'm not splitting those off into a separate table to port my legacy database to your API. I'm not dragging in multiple megabytes of data in every query.

If you're going full ORM, you can add lazy fields. That adds complexity. It's also inefficient when I know in advance that I need those fields.

>>  * No support for transactions.
>>  * If you add support for transactions, you'll crash all the
>> time because the transactions got too large, thanks to the full table
>> scan mentality.
> 
> Isn't it just the "index support" case?

You didn't mention transactions at all in the initial outline. After that, yes, in large portion index support addresses this. DB-side aggregation also helps.

>>  * In your implementation, updates must bring every affected
>> row over the wire, then send back the modified row.
> 
> In my implementation there is no wire (that's why I call it embedded). However I thought we talk about API and not particular implementation. I don't see how this API excludes RPC. Query strings (e.g. SQL) can be provided in old fashioned way.

I'm running a website and decide that, with the latest changes, existing users need to get the new user email. So I write:

  UPDATE users SET sent_join_email = FALSE;
  -- ok; 1,377,212 rows affected

Or I'm using your database system. If it uses std.algorithm, I have to iterate through the users list, pulling each row into my process's memory from the database server, and then I have to write everything back to the database server.

Depending on the implementation, it's using a database cursor or issuing a new query for every K results. If it's using a database cursor, those might not be valid across transaction boundaries. I'm not sure. If they aren't, you get a large transaction, which causes problems.

If your database system instead offers a string-based API similar to std.algorithm, you might be able to turn this into a single query, but it's going to be a lot of work for you.

>>  * Updates affect an entire row. If one process updates one
>> field in a row and another one updates a different field, one of those
>> writes gets clobbered.
> 
> I think this is just a "must have" for any db engine. I don't see how it applies to the proposed API other than any implementation of db engine has to handle it properly.

Without transactions, MySQL supports writing to two different columns in two different queries without those writes clobbering each other.

That's handling it properly.

> When I say DbCollection should behave similar to an ordinal array I don't mean it should be an ordinal array.
> 
>>  * The API assumes a total ordering for each DbCollection. This
>> is not valid.
> 
> I don't know what you mean here. Example would be good.

opIndex(size_t offset) assumes the database supports a one-to-one mapping between offsets and rows.

SQLite, for one, does not guarantee query result ordering if the query does not include an ORDER BY clause. So offering on opIndex(size_t offset) operation -- either you load the entire table into memory in advance, or you might get the same row returned for every index.

Alternatively, you must examine the table and generate a sufficiently unique ordering for it. Any mutation to the table's contents can change the indices for all items in the table. This isn't invalid -- C# collection iterators throw an exception if you modify the collection during iteration. But, to avoid silent errors you can't defend against, you have to dump the whole table into a transaction or have the database somehow tell you when someone else has modified it.

Speaking of modifications, let's say I write a simple loop like:
  for (size_t i = 0; i < dbTable.length; i++) {
    writeln(dbTable[i].id);
  }

This can go wrong in about four ways:
 * Someone inserts a row whose index is less than i. This prints out the
same id twice.
 * Someone deletes a row whose index is less than i. This skips a
different row.
 * Someone deletes a row when i == dbTable.length - 1, just before opIndex
executes. I get an index out of bounds error.
 * This is issuing two queries per iteration. It's going to take probably
fifty times longer than using a cursor.

This is a terrible usage pattern, but by offering opIndex and length operations, you are recommending it.

>>  * If there are multiple rows that compare as equals, there's
>> no way to update only one of them in your implementation.
>>  * In your implementation, updating one row is a ϴ(N)
>> operation. It still costs ϴ(N) when the row you want to update is the
>> first one in the collection.
> 
> I'm still not sure if you are referring to my implementation or
> hypothetical API. To be clear: my current implementation is still proof
> of concept and surly *unfinished*. And in case you refer to my
> implementation I plan to support O(1), O(log n) and O(n) access patterns
> with its "rights and duties".

I specifically said "in your implementation" for these two because I was referring to your implementation rather than your proposal in general. The rest refers to the basic idea.