Thread overview | |||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
December 31, 2016 Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Hi Everyone, I've seen a lot of talk on the forums over the past year about the need for database support in the D Standard Library and I completely agree. At the end of the day the purpose of any programming language and its attendant libraries is to allow the developer to solve their problems quickly and efficiently; and a large subset of those solutions require some form of structured data store. To my mind, this makes some form of interface(s) to a data-store an essential component of the D Standard Library. And since this is something that my particular problem spaces also need, I thought it would be useful to attempt to do something about it. First, I've seen a couple of promising projects, the most complete, and recent of which, dstddb (Github: https://github.com/cruisercoder/dstddb) hasn't seen a commit since June. An additional setback came when I tried to use it and was greeted with a litany of compiler errors. This is *not* a problem, it's the natural course of a volunteer community such as ours; and I want to thank Erik Smith profusely for his work. Priorities and circumstances change and that means that valuable projects are inexplicably dropped. But we still lack a critical component, and to get the conversation started, I'd like to break down the issues I've seen brought up in past threads on this subject and encourage you to bring your own. I may have ideas, but I can't possibly know the entire problem space. 1. Isn't this an enormous amount of work? My answer: Absolutely, depending on your preferred scope of work. In general, I've seen two distinct camps on this issue. One says that we should implement everything in D from the ground up, including re-implementation of the database drivers themselves in D. If this is your preferred scope of work then you will invariably become disheartened at the truly stupendous amount of work you face and give up. The other camp says that we should make use of existing drivers and include them in the D Standard Library. This is difficult path to follow as the vanilla build of the D Standard Library now requires a significant number of foreign libraries, all with differing licenses, be built and distributed to everyone; regardless of whether or not they use them in their project. This is more-or-less than path the dstddb is/was on. My idea: Focus on defining the interface, not the individual driver implementations. If instead we focused on defining an interface that a "conforming implementation" had to follow, we would allow developers to only pull in the library they need or build a from-scratch library if they so desire. Indeed this is the model that both Java (JDO) and .NET (ADO.NET) follow and I think we would be well advised to follow their lead here. Not only is the methodology battle-proven, it is also well understood by a significant portion of D's potential user-base. By way of example, Npgsql is the ADO.NET implementation of a driver for PostgreSQL. 2. There are so many different types of data storage systems, how do you design a system generic enough for all of them? My answer: You don't. Nobody else has bothered trying, and I believe that our worry over that question is a large part of why we don't have anything substantive today. My idea: Split the data storage systems out by category of data-store. For example: - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.) - Document: std.database.document (Mongo, CouchDB, etc.) - Key-Value: std.database.keyvalue (Redis, etcd2, etc.) If you want something that doesn't fit into a category above, you're own your own, but you were also on your own in other languages. 3. We need to provide a single interface for all data-stores in the SQL/Document/Key-Value category. My answer: Are you sure? The problem is that each underlying data-store has it's own dialect. For example, PostgreSQL and MSSQL are both ostensibly ANSI-SQL, except where they aren't. Re-targeting data-stores, even in the same category, is never going to be as simple as changing a connection string. And additionally, you will have to implement a super-set of features in the interface to support all the variations and throw exceptions where the chosen implementation does not support a specific feature. My idea: Each data store has it's own implementation with it's own naming convention. For example (ADO.NET): - SqlConnection (MSSQL) - NpgsqlConnection (Npgsql) Yes, this means that you have to change names in your code if you switch data-stores, but since you are already changing your queries, which is a much more difficult change, this isn't a significant additional cost. Also, the code becomes clearer to those who take over maintenance duties from the original author, especially when you are mixing data-stores. But in all honest, most developers will pick on technology and stick with it for the entirety of the software's lifespan. 4. We should hide querying from the developer because they are bad at it, security flaws, etc. My answer: While agree in principal, especially with the security concerns, in reality what you are asking for is an ORM. In my opinion, that is a separate concern from a database interface, and is typically implemented as layer over the DB interface. My idea: Don't do it. Save it for a different project. 5. D has so many useful features for data access, we should use as many as possible! My answer: D absolutely has many useful features for data access and manipulation. But that doesn't mean that a good interface has to use any of them. The first job of a Database Interface, and indeed any library, is to get the job done with a minimum of overhead. Let's worry about that before going crazy adding in all the D goodness. Ranges have been a particular target for abuse here, and while I love ranges, I think the mechanics of data-store manipulation don't lend themselves well to working with ranges. I'd love to hear your ideas on this though. My idea: Focus on a more conservative implementation in the style of JDO or ADO.NET. This will allow us to ship something that works in a reasonable time frame. I'm not saying that we can't use any of D's unique talents, but using those talents should be subordinate to designing an interface that works efficiently. That is all I have for now. I am looking forward to hear your thoughts on this topic! Until then, I am going to go close out 2016 (PST) with family and friends and I wish you all a Happy New Year! -- Adam Wilson IRC: LightBender import quiet.dlang.dev; |
January 01, 2017 Re: Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam Wilson | We do indeed need a good database abstraction. But a core requirement for any implementation has yet to be met. There has to be a standard way for asynchronous sockets. To implement this we need to take into consideration the event loop that it uses and more importantly allow it to be integrated for e.g. windowing. So here is a dependency before we get a database abstraction into Phobos, a nice fast event loop manager that is generic. |
December 31, 2016 Re: Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Posted in reply to rikki cattermole | On 12/31/16 7:31 PM, rikki cattermole wrote: > We do indeed need a good database abstraction. > But a core requirement for any implementation has yet to be met. > > There has to be a standard way for asynchronous sockets. To implement > this we need to take into consideration the event loop that it uses and > more importantly allow it to be integrated for e.g. windowing. > > So here is a dependency before we get a database abstraction into > Phobos, a nice fast event loop manager that is generic. Or, alternatively, use existing drivers that have this capability built in? -- Adam Wilson IRC: LightBender import quiet.dlang.dev; |
January 01, 2017 Re: Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam Wilson | On 01/01/2017 4:46 PM, Adam Wilson wrote:
> On 12/31/16 7:31 PM, rikki cattermole wrote:
>> We do indeed need a good database abstraction.
>> But a core requirement for any implementation has yet to be met.
>>
>> There has to be a standard way for asynchronous sockets. To implement
>> this we need to take into consideration the event loop that it uses and
>> more importantly allow it to be integrated for e.g. windowing.
>>
>> So here is a dependency before we get a database abstraction into
>> Phobos, a nice fast event loop manager that is generic.
>
> Or, alternatively, use existing drivers that have this capability built in?
Which is fine if all you use is c's sockets or only that database connection for a thread.
Which is not very realistic for game or web development.
|
January 01, 2017 Re: Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Posted in reply to rikki cattermole | On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
> Which is fine if all you use is c's sockets or only that database connection for a thread.
The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
|
January 01, 2017 Re: Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam Wilson | On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
> interface(s) to a data-store an essential component of the D Standard Library.
Eh, I count it as would-be-nice just because it isn't that hard to just use the C ones, or another third party lib; it doesn't have to be Phobos itself.
That said though, a basic db interface is quite simple and would be a nice batteries included bit - it is one of the things IMO that PHP did quite successfully (even if its interface sucked, it still just worked)
|
January 01, 2017 Re: Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
> On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
>> Which is fine if all you use is c's sockets or only that database
>> connection for a thread.
>
> The C drivers typically offer handles of some sort (Windows HANDLE, *nix
> file descriptor, that kind of thing) that you can integrate into other
> event loops.
That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).
|
January 01, 2017 Re: Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam Wilson | On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote: > My idea: Split the data storage systems out by category of data-store. > For example: > - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.) This is doable; SQL is an ANSI and ISO standard, and it strongly constrains what you can do with your data. > - Document: std.database.document (Mongo, CouchDB, etc.) > - Key-Value: std.database.keyvalue (Redis, etcd2, etc.) I'm not so certain about this. CouchDB has a rather different approach to things than MongoDB -- possibly not as divergent as Mongo from MySQL, but far more than Postgres from MySQL. Likewise, there are many key/value stores in existence, and they support many different operations. For instance, it looks like etcd2 has a notion of directories, where you can list items in a directory. Redis just lets you list keys with a given prefix. Redis lets you modify values in-place; etcd2 doesn't. We could define a common subset of operations for document databases and key/value stores, but most people probably wouldn't be satisfied with it. There's also a question of where you'd put Cassandra in that, since it's decidedly not a SQL database but tries to pretend it is. > 4. We should hide querying from the developer because they are bad at it, security flaws, etc. > > My answer: While agree in principal, especially with the security concerns, in reality what you are asking for is an ORM. In my opinion, that is a separate concern from a database interface, and is typically implemented as layer over the DB interface. We can encourage people to use prepared queries with documentation and naming. |
January 01, 2017 Re: Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam Wilson | On 2017-01-01 04:24, Adam Wilson wrote: > My idea: Each data store has it's own implementation with it's own > naming convention. For example (ADO.NET): > - SqlConnection (MSSQL) > - NpgsqlConnection (Npgsql) > > Yes, this means that you have to change names in your code if you switch > data-stores, but since you are already changing your queries, which is a > much more difficult change, this isn't a significant additional cost. I don't think we should try to make implementations different just because. If you have an SQL builder or an ORM on top of the interface that abstract the differences in the SQL syntax, it's possible to switch driver, within reason. -- /Jacob Carlborg |
January 01, 2017 Re: Databases and the D Standard Library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jacob Carlborg | On Sun, 01 Jan 2017 10:29:28 +0100, Jacob Carlborg wrote:
> On 2017-01-01 04:24, Adam Wilson wrote:
>
>> My idea: Each data store has it's own implementation with it's own
>> naming convention. For example (ADO.NET):
>> - SqlConnection (MSSQL)
>> - NpgsqlConnection (Npgsql)
>>
>> Yes, this means that you have to change names in your code if you switch data-stores, but since you are already changing your queries, which is a much more difficult change, this isn't a significant additional cost.
>
> I don't think we should try to make implementations different just because. If you have an SQL builder or an ORM on top of the interface that abstract the differences in the SQL syntax, it's possible to switch driver, within reason.
Those both limit your ability to use the underlying database to its full potential. They offer a chance for queries that seem simple and efficient to become horribly inefficient.
I ran across a problem in NHibernate about a decade ago. We had a straightforward HQL query involving joins. It took over a minute to run. We wrote the simple equivalent in SQL and it completed in milliseconds. Fortunately, NHibernate had the ability to run raw SQL queries.
|
Copyright © 1999-2021 by the D Language Foundation