March 04, 2016
On Thursday, 3 March 2016 at 18:48:08 UTC, Chris Wright wrote:
> You were a bit vague before. I interpreted you as saying "just offer a range and an array-like API, and then you can use it with std.algorithm". But if you meant to offer an API that is similar to std.algorithm and also array-like, that's more feasible.

I agree I could be better in describing the concept. But I just sketched the idea.

> You're still left with the task of transpiling D to SQL.
If someone wants to use SQL in its *full power* no D API nor any other language will suffice. Mainly because it will be always a traslation layer . The only we can to is to provide an aid like things suggested by Andrei (sql parser, value binding, etc).

> This model does not work with CouchDB.
I don't know CouchDB so I can't comment.

> You must avoid using std.algorithm and std.range functions assiduously because they would offer terrible performance.

For big data in db, plain vanilla std.algorithm won't be insufficient. I agree.

>>>  * No support for complex queries.
>> 
>> Not sure what you mean by complex queries. Also I think the API allows arbitrary complex queries.
>
> Aggregates, especially with joins. Computed fields.

Regarding computed fields and other database vendor specific features you are right.
But on the other hand aggregations and joins can be represented as objects and proxies of objects.

>>>  * No support for joins.
>> 
>> Can be done by @attributes or other linking functionality between DbCollections.
>
> With attributes, you need users to define aggregate types instead of just using Row and the like. That's ORM territory.

I don't like ORM with respect to SQL. But quasi object database which can look similar to ORM is not a problem for me.

> At a previous job I maintained an internal BI site that exposed 50-100 different queries, each with their own set of result fields. We didn't want to use ORM there; it would have been cumbersome and inappropriate.

I can see your point. But the problem can be solved by not using SQL.

> Also, that assumes that you will always want a join when querying a table. I maintained an application once, using ORM, in which we sometimes wanted an eager join and sometimes wanted a lazy one. This posed a nontrivial performance impact.

Something like DbProxy would handle lazy "joins".

> I'm not sure ORM would be a candidate for phobos.

As I don't plan to use an (traditional) ORM I'm not involved. However if other people would find it worthy I don't object.

>>>  * No support for projections.
>> 
>> You mean something like referring to part of the item's fields? I see no problem here.
>
> Let me point you to the existence of the TEXT and BLOB datatypes. They can each hold 2**32 bytes of data in MySQL.

This is something a DbProxy would handle. Eventually:

struct OrginalObject
{
  int id;
  string bigString;
}

struct StrippedObject
{
  int id;
}

then
auto collA = db.collection!OrginalObject("Big");
auto collA = db.collection!StrippedObject("Big");

In the second line the string is not fetched.

> I'm not splitting those off into a separate table to port my legacy database to your API. I'm not dragging in multiple megabytes of data in every query.
>
> If you're going full ORM, you can add lazy fields. That adds complexity. It's also inefficient when I know in advance that I need those fields.
>
>
>>>  * In your implementation, updates must bring every affected
>>> row over the wire, then send back the modified row.
>> 
>> In my implementation there is no wire (that's why I call it embedded). However I thought we talk about API and not particular implementation. I don't see how this API excludes RPC. Query strings (e.g. SQL) can be provided in old fashioned way.
>
> I'm running a website and decide that, with the latest changes, existing users need to get the new user email. So I write:
>
>   UPDATE users SET sent_join_email = FALSE;
>   -- ok; 1,377,212 rows affected
>
> Or I'm using your database system. If it uses std.algorithm, I have to iterate through the users list, pulling each row into my process's memory from the database server, and then I have to write everything back to the database server.
>
> Depending on the implementation, it's using a database cursor or issuing a new query for every K results. If it's using a database cursor, those might not be valid across transaction boundaries. I'm not sure. If they aren't, you get a large transaction, which causes problems.
>
> If your database system instead offers a string-based API similar to std.algorithm, you might be able to turn this into a single query, but it's going to be a lot of work for you.

For client-server approach I agree with the above. For embedded design (as in my project) this is not a case.

>>>  * Updates affect an entire row. If one process updates one
>>> field in a row and another one updates a different field, one of those
>>> writes gets clobbered.
>> 
>> I think this is just a "must have" for any db engine. I don't see how it applies to the proposed API other than any implementation of db engine has to handle it properly.
>
> Without transactions, MySQL supports writing to two different columns in two different queries without those writes clobbering each other.
>
> That's handling it properly.

Designing a good locking mechanism will be a challenging task, that is what I'm sure :)

>> When I say DbCollection should behave similar to an ordinal array I don't mean it should be an ordinal array.
>> 
>>>  * The API assumes a total ordering for each DbCollection. This
>>> is not valid.
>> 
>> I don't know what you mean here. Example would be good.
>
> opIndex(size_t offset) assumes the database supports a one-to-one mapping between offsets and rows.
[...]
> This is a terrible usage pattern, but by offering opIndex and length operations, you are recommending it.

I don't recommend it. I just added it for evaluation. I'm aware it only works when the collection is not mutated. I think the same goes for all shared collections (also those in memory)

Finally, IMO any DB API will be biased toward one solution.

Cheers
Piotrek


March 04, 2016
On Thursday, 3 March 2016 at 18:08:26 UTC, Erik Smith wrote:
>> db.execute("select from t").reader; //row range
>> db.execute("select from t").get!long; //scalar
>> db.execute("select from t"); //non-query
>
> More good options (the 3rd one is there).   Also at the value access level there are several options to consider: v.get!long, v.as!long, v.to!long, etc.

to!long looks ok.

>> On the other hand execute can simply return the reader with extra getter for scalar
>> result. Just don't do stuff until it's iterated. Is it possible?
>> auto rows = db.execute("select * from t");
>
> I'm hedging a bit on this because there are other capabilities that need to be introduced that might present a problem.

Can you elaborate?
On the other hand these helper methods are built on top of abstraction API and you have to duplicate them in all drivers. Maybe better have them as extension methods in a single module that will work with all drivers? They are effectively a sort of minimal frontend already.

> Another issue is that this might conflict with the notion of a container and the use of opSlice.  Great feedback though and I'm tracking it.

The result set is even iterator/stream, i.e. conceptually has even less container properties than ranges themselves. Why would you think of it as a container? I think it's ok as input range. Anyway, this `execute` method is a frontend, i.e. it's replaceable without touching the driver.
March 04, 2016
On Friday, 4 March 2016 at 11:57:49 UTC, Kagamin wrote:
> On Thursday, 3 March 2016 at 18:08:26 UTC, Erik Smith wrote:
>>> db.execute("select from t").reader; //row range
>>> db.execute("select from t").get!long; //scalar
>>> db.execute("select from t"); //non-query
>>
>> More good options (the 3rd one is there).   Also at the value access level there are several options to consider: v.get!long, v.as!long, v.to!long, etc.
>
> to!long looks ok.
>
>>> On the other hand execute can simply return the reader with extra getter for scalar
>>> result. Just don't do stuff until it's iterated. Is it possible?
>>> auto rows = db.execute("select * from t");
>>
>> I'm hedging a bit on this because there are other capabilities that need to be introduced that might present a problem.
>
> Can you elaborate?

Actually I like this and I think it can work.  I'm trying to keep a single execute function name for both row/no-row queries. I can still return the range proxy for no-row queries that is either empty or throws on access.

> On the other hand these helper methods are built on top of abstraction API and you have to duplicate them in all drivers. Maybe better have them as extension methods in a single module that will work with all drivers? They are effectively a sort of minimal frontend already.

I'm just waiting for the code to settle a bit in a basic working state before I refactor into a two layer design.

> The result set is even iterator/stream, i.e. conceptually has even less container properties than ranges themselves. Why would you think of it as a container? I think it's ok as input range. Anyway, this `execute` method is a frontend, i.e. it's replaceable without touching the driver.

The range itself is an InputRange.  The statement is acting as a container only in that it is the source of the range and holds the data.  I agree that it is confusing to use the term container although it seems to fit the definition of one.



March 04, 2016
On Fri, 04 Mar 2016 06:16:59 +0000, Piotrek wrote:

> For client-server approach I agree with the above. For embedded design (as in my project) this is not a case.

Which is all I'm saying. Something in std.database sounds like it should allow you to interact with databases, like JDBC.

With embedded databases, there's a lot of variety out there, probably a decent selection of tradeoffs, so I'm not sure any one would be appropriate to phobos.
March 04, 2016
On Friday, 4 March 2016 at 14:44:48 UTC, Erik Smith wrote:
> Actually I like this and I think it can work.  I'm trying to keep a single execute function name for both row/no-row queries. I can still return the range proxy for no-row queries that is either empty or throws on access.

Yes, that's the idea.

> I'm just waiting for the code to settle a bit in a basic working state before I refactor into a two layer design.

Another idea from previous attempts: rename `execute` to `query`.
foreach(row; db.query("select * from t"))
{ ... }
And name for package: std.sql

>> The result set is even iterator/stream, i.e. conceptually has even less container properties than ranges themselves. Why would you think of it as a container? I think it's ok as input range. Anyway, this `execute` method is a frontend, i.e. it's replaceable without touching the driver.
>
> The range itself is an InputRange.  The statement is acting as a container only in that it is the source of the range and holds the data.  I agree that it is confusing to use the term container although it seems to fit the definition of one.

`execute` should not return a statement, but a result set reader, that would be a range. Yeah, the result set is indeed a range conceptually: it has a definite number of ordered rows, a beginning and an end.
March 04, 2016
On Thursday, 3 March 2016 at 17:46:02 UTC, Erik Smith wrote:
> The only leverage I might have is a lot of  familiarity working with many of the native C client interfaces and experience implementing higher level interfaces on top.  That and is also a lot of existing work to draw on that can inform the design.

BTW in the oracle driver you use that ODBC idiom of passing strings as pointer+length pairs. Why don't you use it in ODBC driver?
March 04, 2016
On Friday, 4 March 2016 at 16:43:00 UTC, Kagamin wrote:
> On Friday, 4 March 2016 at 14:44:48 UTC, Erik Smith wrote:
>> Actually I like this and I think it can work.  I'm trying to keep a single execute function name for both row/no-row queries. I can still return the range proxy for no-row queries that is either empty or throws on access.
>
> Yes, that's the idea.
>
>> I'm just waiting for the code to settle a bit in a basic working state before I refactor into a two layer design.
>
> Another idea from previous attempts: rename `execute` to `query`.
> foreach(row; db.query("select * from t"))
> { ... }
> And name for package: std.sql
>
>>> The result set is even iterator/stream, i.e. conceptually has even less container properties than ranges themselves. Why would you think of it as a container? I think it's ok as input range. Anyway, this `execute` method is a frontend, i.e. it's replaceable without touching the driver.
>>
>> The range itself is an InputRange.  The statement is acting as a container only in that it is the source of the range and holds the data.  I agree that it is confusing to use the term container although it seems to fit the definition of one.
>
> `execute` should not return a statement, but a result set reader, that would be a range. Yeah, the result set is indeed a range conceptually: it has a definite number of ordered rows, a beginning and an end.



On further thought, execute() should definitely return something but I think it needs to return a Result (alternative name RowSet), the actual container, rather than the range.  This more cleanly separates out the post execute state management into a separate object from Statement.  The user will often want other things from the execute then just the range, such as meta-data (# columns, columns names/types) describing the returned table and RowSet provides convenient access to it. This also suggests that execute should be explicit & once/only, which I think is better.  Here is an updated example to illustrate:

    auto db = createDatabase("file:///testdb");
    auto rowSet = db.connection().statement("select name,score from score").execute;
    foreach (r; rowSet) writeln(r[0].as!string,",",r[1].as!int);

I'll track query as an alternative name for execute() and std.sql as alternative for std.database.




March 04, 2016
On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
>     auto db = createDatabase("file:///testdb");
>     auto rowSet = db.connection().statement("select name,score from score").execute;
>     foreach (r; rowSet) writeln(r[0].as!string,",",r[1].as!int);
>
> I'll track query as an alternative name for execute() and std.sql as alternative for std.database.

Why not only:

auto rowSet = db.connection().query("select name,scorefrom score");

Or instead of query maybe "returnRows" or just "rows", I think the other way is too verbose.

Bubba.
March 04, 2016
On Friday, 4 March 2016 at 19:27:47 UTC, Bubbasaur wrote:
> On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
>>     auto db = createDatabase("file:///testdb");
>>     auto rowSet = db.connection().statement("select name,score from score").execute;
>>     foreach (r; rowSet) writeln(r[0].as!string,",",r[1].as!int);
>>
>> I'll track query as an alternative name for execute() and std.sql as alternative for std.database.
>
> Why not only:
>
> auto rowSet = db.connection().query("select name,scorefrom score");
>
> Or instead of query maybe "returnRows" or just "rows", I think the other way is too verbose.
>
> Bubba.

Agree connection execute is simpler and it is supported, although I need to add the bind version to connection as well.   The statement should only get involved when you need more specific control over the input binding.  I add returnRows/rows to the execute/query alternative name list.

erik

March 04, 2016
On Friday, 4 March 2016 at 16:54:33 UTC, Kagamin wrote:
> On Thursday, 3 March 2016 at 17:46:02 UTC, Erik Smith wrote:
> BTW in the oracle driver you use that ODBC idiom of passing strings as pointer+length pairs. Why don't you use it in ODBC driver?

That will be fixed in the next push.

erik