Is it possible to collect object usage information during compilation?

Is it possible to collect object usage information during compilation?
Jan 10, 2015 DaveG
Jan 10, 2015 Jacob Carlborg
Jan 10, 2015 Martin Nowak
Jan 10, 2015 Jacob Carlborg
Jan 10, 2015 Martin Nowak
Jan 10, 2015 Jacob Carlborg
Jan 10, 2015 DaveG
Jan 10, 2015 Paolo Invernizzi
Jan 10, 2015 DaveG
Jan 11, 2015 Jacob Carlborg
Jan 11, 2015 Paolo Invernizzi
Jan 11, 2015 DaveG
Jan 10, 2015 Jacob Carlborg
Jan 10, 2015 Adam D. Ruppe
Jan 11, 2015 DaveG
Jan 11, 2015 Jacob Carlborg
Jan 11, 2015 DaveG
Jan 11, 2015 Andrei Alexandrescu
Jan 10, 2015 Jacob Carlborg
Jan 10, 2015 Martin Nowak

January 10, 2015

Posted by DaveG

Permalink

DaveG

Permalink

Let me preface this by saying I only have a general conceptual understanding of compilers and know nothing about actual implementation.

One common problem with Object-Relational Mapping (ORM) is what data to load and when. There is basically 2 options:
1. Load everything: This certainly works, but is very inefficient, particularly when you have a large number of fields or, even worse, have a derived field that causes a join on one or more tables. If you need all the data this is fine, but most of the time only a small subset is actually used. (lazy loading can mitigate some issues, but introduces other problems)
2. Specify what fields to populate: This can work, but makes more work for the user, adds complexity to user code, and often introduces bugs particularly over time as code changes. Implementation details are leaking into the interface.

Basically, I'm looking for a way to "look ahead" to see what properties on an object are actually referenced (or not referenced) so we know what data needs to be loaded. Simple analysis for things like unused scope variables already exist, but this is needed for properties on each instance of a class (or struct). I guess this would require the compiler to make 2 passes once to trace each object and a second to do something with the data collected. This would potential cost a lot in compilation time so there would probably need to be some sort of annotation on the definition to indicate this type of check is necessary.

I might be crazy, but it seems like the compiler has all the information necessary to figure this out and it would make user code simpler, less error prone, and more efficient. So does anybody have any idea on how to actually achieve this?

-Dave

January 10, 2015

Re: Is it possible to collect object usage information during compilation?

Posted by Jacob Carlborg
in reply to DaveG

Permalink

Jacob Carlborg

Posted in reply to DaveG

Permalink

On 2015-01-10 07:46, DaveG wrote:
> Let me preface this by saying I only have a general conceptual
> understanding of compilers and know nothing about actual implementation.
>
> One common problem with Object-Relational Mapping (ORM) is what data to
> load and when. There is basically 2 options:
> 1. Load everything: This certainly works, but is very inefficient,
> particularly when you have a large number of fields or, even worse, have
> a derived field that causes a join on one or more tables. If you need
> all the data this is fine, but most of the time only a small subset is
> actually used. (lazy loading can mitigate some issues, but introduces
> other problems)
> 2. Specify what fields to populate: This can work, but makes more work
> for the user, adds complexity to user code, and often introduces bugs
> particularly over time as code changes. Implementation details are
> leaking into the interface.
>
> Basically, I'm looking for a way to "look ahead" to see what properties
> on an object are actually referenced (or not referenced) so we know what
> data needs to be loaded. Simple analysis for things like unused scope
> variables already exist, but this is needed for properties on each
> instance of a class (or struct). I guess this would require the compiler
> to make 2 passes once to trace each object and a second to do something
> with the data collected. This would potential cost a lot in compilation
> time so there would probably need to be some sort of annotation on the
> definition to indicate this type of check is necessary.
>
> I might be crazy, but it seems like the compiler has all the information
> necessary to figure this out and it would make user code simpler, less
> error prone, and more efficient. So does anybody have any idea on how to
> actually achieve this?

I'm not exactly sure if this is what you want but you can implement the "opDispatch" [1] method in a class or struct. This method will be called if no other method exists with the same name. There's also something called "alias this" [2] that allows you to do something similar.

class Foo
{
    void foo () {}
    void opDispatch (string name)() {}
}

auto f = new Foo;
f.foo(); // will call "foo"
f.bar(); // will be lowered to f.opDispatch!("bar")();

If you're implementing an ORM I would recommend executing queries lazily. You can do something like this:

class Person : ORM.Base
{
    String name;
    Int age;

// this method returns a range/proxy that implements the range api [3]
    static ORM.Range!(Person) all () {}
}

"String" would look something like this:

struct String
{
    alias get this;

    // this method will fetch the data from the database
    private string get ();
}

Using the interface would look something like this:

auto p = Person.all(); // no database query has been performed yet

// the range interface makes it possible to use a foreach
// when starting the foreach loop is when the first query will happen
foreach (e ; p)
{
    // this call will trigger a call to the "get" method in "String"
    // via the "alias this"
    string name = e.name;
    writeln(name);
}

[1] http://dlang.org/operatoroverloading.html#dispatch
[2] http://dlang.org/class.html#alias-this
[3] http://dlang.org/phobos/std_range.html#isInputRange

-- 
/Jacob Carlborg

January 10, 2015

Re: Is it possible to collect object usage information during compilation?

Posted by Martin Nowak
in reply to Jacob Carlborg

Permalink

Martin Nowak

Posted in reply to Jacob Carlborg

Permalink

On 01/10/2015 11:20 AM, Jacob Carlborg wrote:
> On 2015-01-10 07:46, DaveG wrote:
>> I might be crazy, but it seems like the compiler has all the information
>> necessary to figure this out and it would make user code simpler, less
>> error prone, and more efficient. So does anybody have any idea on how to
>> actually achieve this?
>
> I'm not exactly sure if this is what you want but you can implement the
> "opDispatch" [1] method in a class or struct. This method will be called
> if no other method exists with the same name. There's also something
> called "alias this" [2] that allows you to do something similar.
>
> class Foo
> {
>      void foo () {}
>      void opDispatch (string name)() {}
> }
>
> auto f = new Foo;
> f.foo(); // will call "foo"
> f.bar(); // will be lowered to f.opDispatch!("bar")();
>
> If you're implementing an ORM I would recommend executing queries
> lazily. You can do something like this:
>
> class Person : ORM.Base
> {
>      String name;
>      Int age;
>
> // this method returns a range/proxy that implements the range api [3]
>      static ORM.Range!(Person) all () {}
> }
>
> "String" would look something like this:
>
> struct String
> {
>      alias get this;
>
>      // this method will fetch the data from the database
>      private string get ();
> }
>
> Using the interface would look something like this:
>
> auto p = Person.all(); // no database query has been performed yet
>
> // the range interface makes it possible to use a foreach
> // when starting the foreach loop is when the first query will happen
> foreach (e ; p)
> {
>      // this call will trigger a call to the "get" method in "String"
>      // via the "alias this"
>      string name = e.name;
>      writeln(name);
> }

The idea isn't bad, but the performance will suck. This is generally known as N+1 query, only that this is even worse, as each field is queried individually.

Here is a sketch for an optimal solution. I'm actually eagerly waiting that someone finally implements it.

http://dpaste.dzfl.pl/cd375ac594cf

January 10, 2015

Re: Is it possible to collect object usage information during compilation?

Posted by Jacob Carlborg
in reply to Martin Nowak

Permalink

Jacob Carlborg

Posted in reply to Martin Nowak

Permalink

On 2015-01-10 13:36, Martin Nowak wrote:

> The idea isn't bad, but the performance will suck. This is generally
> known as N+1 query, only that this is even worse, as each field is
> queried individually.

Since the "all" method was called I would assume all rows in the person table are fetched in one single query. Although I don't know if that will work if not the whole row should be loaded.

> Here is a sketch for an optimal solution. I'm actually eagerly waiting
> that someone finally implements it.
>
> http://dpaste.dzfl.pl/cd375ac594cf

How would you handled fetching multiple rows and a foreach loop, i.e. my example?

Perhaps a detail but using a wrapped type instead of the raw types in Person you could handle things like null in the database.

-- 
/Jacob Carlborg

January 10, 2015

Re: Is it possible to collect object usage information during compilation?

Posted by Jacob Carlborg
in reply to Martin Nowak

Permalink

Jacob Carlborg

Posted in reply to Martin Nowak

Permalink

On 2015-01-10 13:36, Martin Nowak wrote:

> I'm actually eagerly waiting that someone finally implements it.

There are two ORM libraries at code.dlang.org [1] [2]. Although  I don't know how usable they are.

[1] http://code.dlang.org/packages/hibernated
[2] http://code.dlang.org/packages/dvorm

-- 
/Jacob Carlborg

January 10, 2015

Re: Is it possible to collect object usage information during compilation?

Posted by Martin Nowak
in reply to Martin Nowak

Permalink

Martin Nowak

Posted in reply to Martin Nowak

Permalink

On 01/10/2015 01:36 PM, Martin Nowak wrote:
>
> The idea isn't bad, but the performance will suck. This is generally
> known as N+1 query, only that this is even worse, as each field is
> queried individually.
>
> Here is a sketch for an optimal solution. I'm actually eagerly waiting
> that someone finally implements it.
>
> http://dpaste.dzfl.pl/cd375ac594cf

I also added a where clause, with a very simple expression template capture.

http://dpaste.dzfl.pl/cd375ac594cf#line-140

January 10, 2015

Re: Is it possible to collect object usage information during compilation?

Posted by Martin Nowak
in reply to Jacob Carlborg

Permalink

Martin Nowak

Posted in reply to Jacob Carlborg

Permalink

On 01/10/2015 01:52 PM, Jacob Carlborg wrote:
> On 2015-01-10 13:36, Martin Nowak wrote:
>
>> The idea isn't bad, but the performance will suck. This is generally
>> known as N+1 query, only that this is even worse, as each field is
>> queried individually.
>
> Since the "all" method was called I would assume all rows in the person
> table are fetched in one single query. Although I don't know if that
> will work if not the whole row should be loaded.

For row or document oriented databases you want to query all fields in parallel. For columnar stores it might be possible to efficiently query specific fields for many documents.

>
>> Here is a sketch for an optimal solution. I'm actually eagerly waiting
>> that someone finally implements it.
>>
>> http://dpaste.dzfl.pl/cd375ac594cf
>
> How would you handled fetching multiple rows and a foreach loop, i.e. my
> example?

I'd simple produce multiple rows, the principle remains the same.

> Perhaps a detail but using a wrapped type instead of the raw types in
> Person you could handle things like null in the database.
>
The example already uses Variant.

January 10, 2015

Re: Is it possible to collect object usage information during compilation?

Posted by Jacob Carlborg
in reply to Martin Nowak

Permalink

Jacob Carlborg

Posted in reply to Martin Nowak

Permalink

On 2015-01-10 14:19, Martin Nowak wrote:

> I'd simple produce multiple rows, the principle remains the same.

Ok, I think I understand the code now. You managed to register the fields at compile time. Pretty neat. I thought the query would need to be delayed to the first call to opDispatch.

> The example already uses Variant.

Yes, but when you get the value out of the of the variant. I think one also needs to be able to check if a field was is "null" or not. Or am I missing something?

-- 
/Jacob Carlborg

January 10, 2015

Re: Is it possible to collect object usage information during compilation?

Posted by DaveG
in reply to Martin Nowak

Permalink

DaveG

Posted in reply to Martin Nowak

Permalink

On Saturday, 10 January 2015 at 13:19:19 UTC, Martin Nowak wrote:
> On 01/10/2015 01:52 PM, Jacob Carlborg wrote:
>> On 2015-01-10 13:36, Martin Nowak wrote:
>>
>>> The idea isn't bad, but the performance will suck. This is generally
>>> known as N+1 query, only that this is even worse, as each field is
>>> queried individually.
>>
>> Since the "all" method was called I would assume all rows in the person
>> table are fetched in one single query. Although I don't know if that
>> will work if not the whole row should be loaded.
>
The issue is not with the rows returned, but the columns (or object properties - which may map to multiple tables or be derived in some other way). Which rows need to returned is determined by some type of filtering mechanism, which is not an issue because that (logically) has to be explicit. The issue is determining which properties (for each "row") actually need to be returned without the need to explicitly request them (the data is already implicit within the user code itself).
>
>>
>>> Here is a sketch for an optimal solution. I'm actually eagerly waiting
>>> that someone finally implements it.
>>>
>>> http://dpaste.dzfl.pl/cd375ac594cf
>>
Martin, that is brilliant! It seemed like all the pieces where there, I just couldn't put them together. I'm glad I'm not the only one thinking about this.

I have never been able to find an ORM (in any language) that comes close to working for us. We are currently looking into switching off PHP and the front runner is C# because it's a safe bet, we run Windows, and some people are sold on the concept of Entity Framework. Entity is (or was) built in to the .NET so they could theoretically do some neat tricks like compile query logic at compilation, and infer what data is actually needed by the program (the issue being discussed). Turns out they do query caching, but that's about it.

I'm not sure I can sell the idea of D (this is a very small and conservative group). I would also have to sell the idea of writing an ORM which is certainly not on the roadmap, but this will certainly help my argument.

Oh, we will also need a good SQL Server library which, to my knowledge, D is lacking. This is going to be a hard sell...

-Dave

January 10, 2015

Re: Is it possible to collect object usage information during compilation?

Posted by Paolo Invernizzi
in reply to DaveG

Permalink

Paolo Invernizzi

Posted in reply to DaveG

Permalink

On Saturday, 10 January 2015 at 17:31:42 UTC, DaveG wrote:
> On Saturday, 10 January 2015 at 13:19:19 UTC, Martin Nowak wrote:
>> Here is a sketch for an optimal solution. I'm actually eagerly waiting that someone finally implements it.
>>
>> http://dpaste.dzfl.pl/cd375ac594cf
>
> I would also have to sell the idea of writing an ORM which is certainly not on the roadmap, but this will certainly help my argument.

Maybe not, something simpler than a full ORM should be compelling also.

I guess you know about the ORM Vietnam [1], but also this [2] can be of some help in selling a simple D solution.

I would like to see, someday, something in D that:

 - can check at compile time the syntax of SQL;
 - can check at compile time the SQL query statement against the current DB schema;
 - can read the output of a DB schema dump at CT, and parse it into what is needed for the previous points (more complicated);

The first point should be easy today, the second and the last one involve more work...

[1] http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
[2] http://wozniak.ca/what-orms-have-taught-me-just-learn-sql
---
Paolo

Top | Forum index | About this forum

Forums