Databases and the D Standard Library (page 4)

January 03, 2017

Re: Databases and the D Standard Library

Posted by Nicholas Wilson
in reply to Chris Wright

Permalink

Nicholas Wilson

Posted in reply to Chris Wright

Permalink

On Tuesday, 3 January 2017 at 08:09:54 UTC, Chris Wright wrote:
> On Mon, 02 Jan 2017 21:25:42 -0800, Adam Wilson wrote:
>> As far as I am aware, the only way to meet those requirements is to use a base-class model. Is there something I am missing?
>
> Templates. Templates everywhere.
>
> Every method in your application that might possibly touch a database, or touch anything that touches a database, and so on, needs to be templated according to what type of database might be used.

That limits you to one DB per compilation or craploads of template bloat.

There are a number variables here: the number of DB backends you wish to support (b), the number of DB backends you actually use at runtime (r), the number of symbols (not quite the word I'm looking for but, oh well) you need to represent an abstract backend API (s),the number of class types you use to abstract the backend (c) and the number of template you use to abstract the back end (t).

b is ideally fixed at "all the backends"
r is variable and dependent on the application (e.g. I may only care for Postgres, but someone else may wish to support many SQL DBs). If r == 1 then a template approach is acceptable.

s is a function of the dissimilarity of the backends you wish to support. Breaking the problem up into SQL like, graph-like and KV-store is a tradeoff somewhere between having "one DB (interface) to rule them all" and one interface for each backend.

c + t = s

What this represents is a tradeoff between compile time dispatch and runtime dispatch. As s moves from being all classes to more templates + structs (from the "bottom up"), the last layer of dynamic dispatch before the static dispatch of the templates becomes an algebraic type selection (i.e. check the tag, choose the type, and then static dispatch).

I believe the sweet spot for this lies at the point where the dissimilarity of similar backends becomes apparent after the start of a logical operation. Or put another way the point where I know the result that I want and no longer care about any implementation details.

As an example using a compute API (sorry I don't know much about DBs): launching a kernel represents a single logical operation but is in fact many driver calls. If one wishes to abstract the compute API then this point becomes the point I would choose.

Finding those points will probably not be easy and may be different for different people, but it is worth considering.

</ramble>

On 2017-01-03 09:38, Chris Wright wrote: > You are unable to interact with two different databases in the same > executable using the same library. For instance, if you're using > hibernated, either you compiled it to connect to mysql, or you compiled > it to connect to oracle. That's true. And that's why I said it's difficult to design an API without trying it in code :) > In exchange, you get...slightly less GC usage. It's not *no* GC usage -- > you'll see a bunch of buffers allocated to hold incoming and outgoing > messages. You'll just peel back one layer of it. 1. I hope there won't be that many buffers in the API, at least not in the user facing API 2. Buffers say nothing how they're allocated. With classes on the other hand, you're basically forced to allocate with the GC > You'd be much better off asking that we encourage the use of > std.experimental.allocator in the driver interface. Then I'll ask for that as well :) -- /Jacob Carlborg

On Tue, 03 Jan 2017 13:23:55 +0100, Jacob Carlborg wrote: > On 2017-01-03 09:38, Chris Wright wrote: > >> You are unable to interact with two different databases in the same executable using the same library. For instance, if you're using hibernated, either you compiled it to connect to mysql, or you compiled it to connect to oracle. > > That's true. And that's why I said it's difficult to design an API without trying it in code :) I didn't try it in code. >> In exchange, you get...slightly less GC usage. It's not *no* GC usage >> -- >> you'll see a bunch of buffers allocated to hold incoming and outgoing messages. You'll just peel back one layer of it. > > 1. I hope there won't be that many buffers in the API, at least not in the user facing API The returned row data is mandatory, and its size can be much larger than the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack limit.) I suppose you could have a streaming API for row data, one that has a stack-allocated buffer and returns slices of that: string fieldName; ubyte[] data; ubyte[][string] fields; db.query("SELECT * FROM USERS") // have to revisit this if a db allows large names .onFieldStart((fieldName) => field = fieldName) .onFieldData((fragment) => data ~= fragment) .onFieldEnd(() { fields[field] = data; data = null; }) .onRowEnd(() => process(fields)) .onResultsEnd!(() => writeln("done")) .exec(); This looks pretty terrible, to be honest. I get this sort of thing from nodejs because it doesn't want to potentially block and also doesn't want to delay letting me process things, but the worst I get there is usually two callbacks. This would also result in more GC use for the majority of people who use the GC. > 2. Buffers say nothing how they're allocated. With classes on the other hand, you're basically forced to allocate with the GC You haven't looked at std.experimental.allocator, have you? http://dpldocs.info/experimental-docs/std.conv.emplace.3.html http://dpldocs.info/experimental-docs/std.experimental.allocator.make.html http://dpldocs.info/experimental-docs/ std.experimental.allocator.dispose.2.html

On 2017-01-03 18:13, Chris Wright wrote: > The returned row data is mandatory, and its size can be much larger than > the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack > limit.) > > I suppose you could have a streaming API for row data, one that has a > stack-allocated buffer and returns slices of that: > > string fieldName; > ubyte[] data; > ubyte[][string] fields; > db.query("SELECT * FROM USERS") > // have to revisit this if a db allows large names > .onFieldStart((fieldName) => field = fieldName) > .onFieldData((fragment) => data ~= fragment) > .onFieldEnd(() { fields[field] = data; data = null; }) > .onRowEnd(() => process(fields)) > .onResultsEnd!(() => writeln("done")) > .exec(); > > This looks pretty terrible, to be honest. I get this sort of thing from > nodejs because it doesn't want to potentially block and also doesn't want > to delay letting me process things, but the worst I get there is usually > two callbacks. > > This would also result in more GC use for the majority of people who use > the GC. Look, I didn't say that using the GC should be completely forbidden. I just said we should try to avoid it. For example, I've been using the ddb Postgres driver [1]. It uses classes for most of its types, even if it might not be necessary. Here's one example [2], unless there some intention to have some form of higher level, DB independent, API on top of this, I don't see a reason why that type needs to be a class. >> 2. Buffers say nothing how they're allocated. With classes on the other >> hand, you're basically forced to allocate with the GC > > You haven't looked at std.experimental.allocator, have you? I know it's possible to allocate a class without the GC, hence the "basically". I'm not sure how other write their code but at least I make the assumption that all objects are allocated with the GC. [1] https://github.com/pszturmaj/ddb [2] https://github.com/pszturmaj/ddb/blob/master/source/ddb/postgres.d#L904 -- /Jacob Carlborg

On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote: > Hi Everyone, > > I've seen a lot of talk on the forums over the past year about the need for database support in the D Standard Library and I completely agree. At the end of the day the purpose of any programming language and its attendant libraries is to allow the developer to solve their problems quickly and efficiently; and a large subset of those solutions require some form of structured data store. To my mind, this makes some form of interface(s) to a data-store an essential component of the D Standard Library. And since this is something that my particular problem spaces also need, I thought it would be useful to attempt to do something about it. The only thing I want, database related, in the standard library is the API! - Nothing else! There should be a standard implementation of that API (libd-db.so for an example), but it should be separated from Phobos. In general, Phobos should only contain the APIs in my humble opinion. We should handle XML processing the same way (API in Phobos, libd-xml.so for the reference implementation), Image processing the same way, GUI, etc... Why? Phobos is enormous already!

Forums