October 09, 2011
Jonathan M Davis wrote:

> On Saturday, October 08, 2011 12:00:37 Andrei Alexandrescu wrote:
>> 1. If we build a D wrapper for ODBC, then we allow people to write code for any database that has an ODBC driver. This, assuming we commit to ODBC as D's standard database interface, would complete the project.
>> 
>> 2. If we want to go the route of "one std.database API with drivers for each DBMS" and consider ODBC one of several DBMSs, then we need to define our own driver architecture, write a few drivers ourselves (including probably ODBC), and hope that people will add more drivers. That's a larger project but it unties us from ODBC.
>> 
>> 3. If we want to go the route of "similar but not identical specialized APIs per database system" and consider ODBC again only one of the database systems, then we need to define one specialized API per DBMS and force users to essentially choose upfront what DBMS they'll use, and code for it. It's an even larger project and I don't see obvious advantages for it than less of a need for upfront design.
> 
> I definitely vote for #2 or #3. One of our projects at work uses it (though not one that I personally work on), and I've never heard good things about it. Supporting it makes good sense, but I wouldn't want us to design Phobos' database solution around it.
> 
> We should probably explore #2 first, and then if it doesn't work well
> enough, then at least we have a solid base for the API's being similar
> with #3. However, given what I've heard about ODBC and its attempts to
> unify databases, I'm skeptical of how well we'll be able to have a unified
> DBMS API without harming performance. And from what I understand, it's
> pretty rare to change DBMSes for a project. You just pick one and use it.
> And then in the rare case where you have to change, you do the work to do
> it (and as long as the DB is appropriately modularized with regards to the
> rest of the program, it doesn't have a hugely negative affect on the rest
> of the program). So, I question that
> #2 gains us a whole lot over #3 ultimately (_especially_ if it ends up
> #costing
> much of anything in terms of performance or usability), but I'm not a DB
> expert, and I do think that it's at least worth exploring #2 first - if
> nothing else because it could lead to the APIs for #3 being better unified
> without harming their performance or usability.
> 
> - Jonathan M Davis
I agree that once a project picks its database it rarely changes but I find most people tend to use common functionality on the application code side and it's the sql which tends to get optimised a lot (there are some traps in the application code that potentially reduce performance like impropper use of mysql prepared statements). I think with #2 we should be able to hit the main db requirements so people can atleast develop database connected applications. The drivers would give us the flexibility to handle more databases at a basic level (at least an odbc driver will let us achieve this quickly until someone wants to provide D driver implementation of a db if it provides advantage of some kind).

That being said how do people tend to handle the SQL itself? Unless you are using a library to create the SQL for you, like hibernate, does anyone find you tend to run into SQL syntax variation across different databases? For example MySQL and MSSQL handle results pagination differently (http://forums.asp.net/p/1427310/3186427.aspx#3186427). I know there is ANSI SQL but my understanding is most databases don't implement it well (not all of it or poor performance etc). Seems to me even if you used a common API you are going to need to abstract the different types of databases some how just because of the SQL itself.

Even if #2 is completed #3 could still be worked on, if someone is willing. A common API is likely to miss some features from the specific APIs (I've only looked into mysql so this is an assumption). The implmentations from #3 could then be used to implement drivers for #2 allowing those who want flexibility to use #2 but those with more specific requirements, and bothered about performance, can use #3. That is more work than #2 or #3 separately but like I said "if someone is willing".

My plan for my personal implementation was to develop specific APIs (#3) for MySql and Postgresql, trying to keep them similar where possible, then look into providing a wrapper api for common/basic functionality (#2). That project was as much about getting to grips with D as a project I intend to use, which is why I chose the long way round to implement it.

October 09, 2011
Steve Teale Wrote:

> I use this title at Andrei's suggestion, and repeat his idea that it be used as a prefix for discussions as we navigate toward a design. Unless there is resistance to the idea, I will on the job of implementing whatever we decide is appropriate. I am retired, and have the time to do it.
> 
> It seems that every man, and possibly his dog, has a private implementation for at least a favorite DB, so we should have plenty of material to build on.
> 
> At this point I would like to get responses from those who feel they are likely to contribute to the design through to completion.
> 
> I'd also like to get a feel for the magnitude of the task, so I'd like to ask what database systems you think should be supported.
> 
> I have started a github account, and will put my mysqld stuff there shortly, then you can kick me out if you don't like what you see.
> 
> Steve


Please refer to the successful experience of the industry, such as  ejb, jdbc, jdo, jpa, ado.net, delphi db api, ruby on rails active recorder, orm's hibernate, absorb they the advantages of, to create a powerful and simple for both for  enterprise applications  database api framework,  to achieve this goal will Requires more than cooperation, or you personal proficient in all of the above  api, Hoping to start directly from the orm(such as hibernate does not depend on jdbc,or Low dependence
), Improve orm performance.

Hope more database experts involved in the D database api design !


thanks all !

dolive


October 09, 2011
On 2011-10-08 19:00, Andrei Alexandrescu wrote:
> 1. If we build a D wrapper for ODBC, then we allow people to write code
> for any database that has an ODBC driver. This, assuming we commit to
> ODBC as D's standard database interface, would complete the project.
>
> 2. If we want to go the route of "one std.database API with drivers for
> each DBMS" and consider ODBC one of several DBMSs, then we need to
> define our own driver architecture, write a few drivers ourselves
> (including probably ODBC), and hope that people will add more drivers.
> That's a larger project but it unties us from ODBC.
>
> 3. If we want to go the route of "similar but not identical specialized
> APIs per database system" and consider ODBC again only one of the
> database systems, then we need to define one specialized API per DBMS
> and force users to essentially choose upfront what DBMS they'll use, and
> code for it. It's an even larger project and I don't see obvious
> advantages for it than less of a need for upfront design.
>
>
> Andrei

I would say that we declare a high level interface for database drivers. std.database can use this interface to connect to all databases there are drivers for.

We then provide driver implementations for this interface for the databases we choose to support. It should also be possible for a user to create his/her own driver implementation for a database we haven't yet implemented or choose not to implement.

Driver implementations could be:

* MySQL/MariaDB
* PostgreSQL
* SQLite
* ODBC
* Oracle
* SQL Server

The smart thing would probably be to implement ODBC as the first database driver since it would allow most of the common databases to be used.

But I don't want ODBC to be the only driver. I have some bad experience with ODBC from work. It can't handle multiple result sets and it's not very good at handle invalid Unicode characters. This might not be true for ODBC in general but it's a problem we have with the implementation we currently use. Also, ODBC adds an extra layer between the application and the database.

-- 
/Jacob Carlborg
October 09, 2011
On 2011-10-08 23:11, Adam D. Ruppe wrote:
> Microsoft SQL Server is important to cover too. I'm pretty sure
> ODBC works fine for that (there's ODBC bindings for D already,
> it's part of the Windows headers) and I wrote a little something
> for my database.d, but I haven't actually tested it yet!

I have some bad experience with ODBC, connecting to SQL Server, from work. It can't handle multiple result sets and it's not very good at handle invalid Unicode characters. This might not be true for ODBC in general but it's a problem we have with the implementation we currently use.

If we don't want to completely role our own implementation I suggest we use freetds directly instead (ODBC uses freetds).

I think it would be good to have an ODBC implementation but it should not be the only way to connect to SQL Server.

-- 
/Jacob Carlborg
October 09, 2011
On 2011-10-08 23:12, Jonathan M Davis wrote:
> On Saturday, October 08, 2011 12:00:37 Andrei Alexandrescu wrote:
>> 1. If we build a D wrapper for ODBC, then we allow people to write code
>> for any database that has an ODBC driver. This, assuming we commit to
>> ODBC as D's standard database interface, would complete the project.
>>
>> 2. If we want to go the route of "one std.database API with drivers for
>> each DBMS" and consider ODBC one of several DBMSs, then we need to
>> define our own driver architecture, write a few drivers ourselves
>> (including probably ODBC), and hope that people will add more drivers.
>> That's a larger project but it unties us from ODBC.
>>
>> 3. If we want to go the route of "similar but not identical specialized
>> APIs per database system" and consider ODBC again only one of the
>> database systems, then we need to define one specialized API per DBMS
>> and force users to essentially choose upfront what DBMS they'll use, and
>> code for it. It's an even larger project and I don't see obvious
>> advantages for it than less of a need for upfront design.
>
> I definitely vote for #2 or #3. One of our projects at work uses it (though not
> one that I personally work on), and I've never heard good things about it.
> Supporting it makes good sense, but I wouldn't want us to design Phobos'
> database solution around it.

We have some trouble with ODBC at work too.

> We should probably explore #2 first, and then if it doesn't work well enough,
> then at least we have a solid base for the API's being similar with #3.
> However, given what I've heard about ODBC and its attempts to unify databases,
> I'm skeptical of how well we'll be able to have a unified DBMS API without
> harming performance. And from what I understand, it's pretty rare to change
> DBMSes for a project. You just pick one and use it. And then in the rare case
> where you have to change, you do the work to do it (and as long as the DB is
> appropriately modularized with regards to the rest of the program, it doesn't
> have a hugely negative affect on the rest of the program). So, I question that
> #2 gains us a whole lot over #3 ultimately (_especially_ if it ends up costing
> much of anything in terms of performance or usability), but I'm not a DB
> expert, and I do think that it's at least worth exploring #2 first - if nothing
> else because it could lead to the APIs for #3 being better unified without
> harming their performance or usability.
>
> - Jonathan M Davis

I think it can be quite useful to change DBMSes for a project. For example, start with SQLite because it easy to set up and then move to MySQL or similar. It's especially easy to do with Ruby on Rails on Mac OS X. Rails uses SQLite as the default database and Mac OS X comes bundle with SQLite. There is no extra steps to set up this environment.

-- 
/Jacob Carlborg
October 09, 2011
Am 08.10.2011 23:11, schrieb Adam D. Ruppe:
> Microsoft SQL Server is important to cover too.

Yeah, it's probably pretty widely used in the Windows world so it should be supported.
And Oracle should probably be supported as well.

But if we have a generic DB API support for these can be added later (based on ODBC, some product specific API or whatever).

I think support for SQLite, MySQL and PostgreSQL would be a good start.

Cheers,
- Daniel
October 09, 2011
Steve Teale wrote:
> I use this title at Andrei's suggestion, and repeat his idea that it be used
> as a prefix for discussions as we navigate toward a design. Unless there is
> resistance to the idea, I will on the job of implementing whatever we decide
> is appropriate. I am retired, and have the time to do it.
>
> It seems that every man, and possibly his dog, has a private implementation
> for at least a favorite DB, so we should have plenty of material to build on.
>
> At this point I would like to get responses from those who feel they are
> likely to contribute to the design through to completion.
>
> I'd also like to get a feel for the magnitude of the task, so I'd like to ask
> what database systems you think should be supported.
>
> I have started a github account, and will put my mysqld stuff there shortly,
> then you can kick me out if you don't like what you see.

1. I think that we should not design this API using the least common denominator approach. This is to not limit some databases. For example PostgreSQL has many great features not available in MySQL. That's why I started with postgres in my ddb project. I think DB API should be designed to support the most featureful databases and those that have less features may be easily adapted to that API.

Some PostgreSQL vs MySQL differences to note:

- the former does support arrays and composite types that may be stored in single column, the latter doesn't
- the former support asynchronous queries, but I guess its not that important
- the former support async notifications (for example, fired by a trigger, or when server is shutting down)

2. Compile type mapping of fields should support postgres's composites and arrays. For example, this is obvious:

execQuery!(int, string)("select 5, 'abc'");

but this is not:

execQuery!(int, string)("select ROW(5, 'abc')"); // valid postgres query

Does it return two columns or one composite column? I already addressed this ambiguities in ddb, please see documentation on github (http://pszturmaj.github.com/ddb/db.html). See also 'advanced example' on that page.

3. I think that compile type mapping of fields should be generalized to not only support DB API. It could also be used with CSV files or other tabular data. It may also be extended to support tree structures with XML (mapping xml to structs/tuples/arrays).
October 09, 2011
On 10/9/11 7:28 AM, Piotr Szturmaj wrote:
> 1. I think that we should not design this API using the least common
> denominator approach. This is to not limit some databases. For example
> PostgreSQL has many great features not available in MySQL. That's why I
> started with postgres in my ddb project. I think DB API should be
> designed to support the most featureful databases and those that have
> less features may be easily adapted to that API.
>
> Some PostgreSQL vs MySQL differences to note:
>
> - the former does support arrays and composite types that may be stored
> in single column, the latter doesn't

Yah, Hive also has some really interesting data types, such as parameterized arrays and maps and JSON data. Our API should be generic enough to accommodate such types seamlessly.

> - the former support asynchronous queries, but I guess its not that
> important
> - the former support async notifications (for example, fired by a
> trigger, or when server is shutting down)

Not sure how we could support such. Ideas would be welcome.

> 2. Compile type mapping of fields should support postgres's composites
> and arrays. For example, this is obvious:
>
> execQuery!(int, string)("select 5, 'abc'");
>
> but this is not:
>
> execQuery!(int, string)("select ROW(5, 'abc')"); // valid postgres query

I think that should be

execQuery!(Tuple!(int, string))("select ROW(5, 'abc')");

> Does it return two columns or one composite column? I already addressed
> this ambiguities in ddb, please see documentation on github
> (http://pszturmaj.github.com/ddb/db.html). See also 'advanced example'
> on that page.
>
> 3. I think that compile type mapping of fields should be generalized to
> not only support DB API. It could also be used with CSV files or other
> tabular data. It may also be extended to support tree structures with
> XML (mapping xml to structs/tuples/arrays).

Interesting.


Andrei
October 09, 2011
There was some discussion prior to this thread about the relative virtues of binding to structs or binding to arrays of Variants. I was thinking about this, and have experimented with Variants in my trial MySQL implementation. My conclusions below - do they make sense?

Using Variant to capture the output of prepared queries or provide input, as opposed to a struct, gets you out of the realm of what must be determined at compile time, but only at the expense of extra DB server round trip(s).

If you want to use them in a deterministic way with a database table of known and stable structure you have to bind them. To bind them you must be able to determine their type. But with Variants you can't do that until they are initialized. So, with a struct you must have something like:

struct MyTableQuery42
{
   byte col1;
   float col2;
   char[] col3;
}

This can be bound for output without doing any explicit initialization of an instance, since you can write a template function to bind it that discovers everything you need to know about the struct at compile time.

If using variants for a set of out parameters you must have something equivalent to:

Variant[3] va;
va[0] = cast(byte) 0;
va[1] = 0.0F;
va[2] = cast(char[]) [];

So you have to have exactly the same information at compile time for the Variant array as you do for the struct - you still have to specify a set of types.

The difference is that if you have prepared a statement, you can go to the server and ask for the relevant metadata. With a struct you can use this to check if the struct is a match for the query. With an array of Variants you can make if conform to the query.

However, in a large number of cases, using the struct you won't need to bother with the metadata, because you 'know' the types of the query result. You don't have to bother with them for the Variant array either, but in that case you have to provide a function like

Variant[] MyTableQuery42Init() { ... }

which 'knows' the same stuff.

It's probably true to say that the syntax/semantics of the interface will suck slightly more in the Variant case than in the struct case.

Steve
October 09, 2011
Further question. Should we assume in the first instance that we should only attempt to accommodate those DBs that are free or have some free version that may be limited in some way - e.g. the developer version of MS SQL Server.

Presumably when D reaches the point of being all-conquering, then Oracle, IBM and so on will chip in.

Steve