Interpolated strings and SQL (page 4)

Settings

Help

Index » General » Interpolated strings and SQL (page 4)

January 09, 2024

Re: Interpolated strings and SQL

Posted by Paolo Invernizzi
in reply to Walter Bright

Permalink

Paolo Invernizzi

Posted in reply to Walter Bright

Permalink

On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:

>> Constructing it at compile time is essential so that we can validate the generated SQL and abort compilation, as Paolo [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi@forum.dlang.org).
>
> That only checks one aspect of correctness - nested string interpolations.

<snip>

>> DIP1036 has no such limitation (demonstrated in point 2 [here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf@forum.dlang.org)).
>
> DIP1036 cannot detect other problems with the string literals. It seems like a lot of complexity to deal with only one issue with malformed strings at compile time rather than runtime.

You are underestimating what can be gained as value in catching SQL problems at compile time instead of runtime. And, believe me, it's not a matter of mocking the DB and relying on unittest and coverage.

CTFE capability is needed.

/P

January 09, 2024

Re: Interpolated strings and SQL

Posted by Walter Bright
in reply to Nickolay Bukreyev

Permalink

Walter Bright

Posted in reply to Nickolay Bukreyev

Permalink

On 1/9/2024 12:04 AM, Nickolay Bukreyev wrote:
> I’ve just realized DIP1036 has an excellent feature that is not evident right away. Look at the signature of `execi`:
> 
> ```d
> auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, InterpolationFooter footer) { ... }
> ```
> 
> `InterpolationHeader`/`InterpolationFooter` _require_ you to pass an istring. Consider this example:
> 
> ```d
> db.execi(i"INSERT INTO items VALUES ($(x))".text);
> ```
> 
> Here, we accidentally added `.text`. It would be an SQL injection… but the compiler rejects it! `typeof(i"...".text)` is `string`, and `execi` cannot be called with `(Sqlite, string)`.

The compiler will indeed reject it (The error message would be a bit baffling to those who don't know what Interpolation types are), along with any attempt to call execi() with a pre-constructed string.

The end result is that to do manipulation with istring tuples, the programmer is alternately faced with adding Interpolation elements or filtering them out. Is that really what we want? Will that impede the use of tuples generally, or just impede the use of istrings?

---

P.S. most keyboarding bugs result from neglecting to add needed syntax, not typing extra stuff. This is why:

    int* p;

is initialized to zero, while:

    int* p = void;

is left uninitialized. The user is unlikely to accidentally type "= void".

January 10, 2024

Re: Interpolated strings and SQL

Posted by Nickolay Bukreyev
in reply to Walter Bright

Permalink

Nickolay Bukreyev

Posted in reply to Walter Bright

Permalink

On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:

A compile time way is DIP1027 can be modified to reject any arguments that consist of tuples with other than one element. This would eliminate nested istring tuples at compile time.

To sum up, it works with nested istrings poorly; it may even be sensible to forbid them entirely for DIP1027. Glad we’ve reached a consensus on this point. This case doesn’t seem crucial at the moment though; now we can focus on more relevant questions.

DIP1036 cannot detect other problems with the string literals. It seems like a lot of complexity to deal with only one issue with malformed strings at compile time rather than runtime.

DIP1036 provides full CTFE capabilities at your disposal. You can validate anything about a format string; any compile-time-executable hypothetical validateSql(query) will fit. I guess none of the examples presented so far featured such validation because it usually tends to be long and not illustrative.

However, another Adam’s example does perform non-trivial compile-time validation. Here is how it is implemented.

> >

Constructing it at compile time is essential so that we can validate the generated SQL and abort compilation, as Paolo demonstrated.

That only checks one aspect of correctness - nested string interpolations.

They check a lot more. I agree it is hard to spot the error messages in the linked post so I’ll copy them here:

relation "snapshotsssss" does not exist. SQL: select size_mm, size_px from snapshotsssss where snapshot_id = $1

role "dummyuser" can't select on table "snapshots". SQL: select size_mm, size_px from snapshots where snapshot_id = $1

As you can see, they check sophisticated business logic expressed in terms of relational databases. And all of that happens at compile time. Isn’t that a miracle?

> >

I explained here why these two arguments are valuable. Aren’t free of cost—correct unless you enable inlining. execi may require some changes (like filterOutEmpty I showed above) to make them free of cost, but it is doable.

You'd have to also make every formatted writer a template,

Err… every formatted writer has to be a template anyway, doesn’t it? It needs to accept argument lists that may contain values of arbitrary types.

…and add the filter to them.

Yeah. I admit this is a problem. As a rule of thumb, the most obvious code should yield the best results. With DIP1036, this is not the case at the moment: when you pass an interpolation sequence to a function not specifically designed for it, it wastes more stack space than necessary and passes useless junk in registers.

Others have mentioned that DIP1027 performs much worse in terms of speed (due to runtime parsing). While that is undoubtable, I think DIP1036 should be tweaked to behave as good as possible.

There was an idea in this thread to improve the ABI so that it ignores empty structs, but I’m rather sceptical about it.

Instead, let us note there are basically two patterns of usage for istrings:

Passing to a function that processes an istring and does something non-trivial. execi is a good example.
Passing to a function that simply stringifies every fragment, one after another. writeln is a good example.

Something counterintuitive, case 1 is easier to address: the function already traverses the received sequence and transforms it. So it is only necessary to write it in such way that it is inline-friendly.

By the way, what functions do we have in Phobos that fall into the case-2 category? write/writeln, std.conv.text, std.logger.core.log, and… is that all? Must be something else!..

Turns out there are only a handful of relevant functions in the entire stdlib. It shouldn’t be hard to put a filter in each of them. It also hints they are probably not that common in the wild.

However, when one encounters a third-party write-like function that is unaware of InterpolationHeader/etc., they should have a means to fix it from outside, i.e., without touching its source and ideally without writing a wrapper by hand. Unfortunately, I could not come up with a satisfactory solution for this. Will keep thinking. Perhaps someone else manages to find it faster.

An idea in a different direction. Currently, InterpolationHeader/etc. structs interoperate with write-like functions seamlessly (at the expense of passing zero-sized arguments) due to the fact they all have an appropriate toString method. If we remove those methods (and do nothing else), then write(i"a$(x)b") would produce something like:

InterpolationHeader()InterpolatedLiteral!"a"()InterpolatedExpression!"x"()42InterpolatedLiteral!"b"()InterpolationFooter()

The program, rather than introducing a silent inefficiency, immediately tells the user they need to account for these types.

And one more idea. Current implementation of DIP1036 can emit empty chunks—i.e., InterpolatedLiteral!""—see for example i"$(x)". If I was making a guess why it does so, I would say it strives to produce consistent, regular sequences. On the one hand, it might ease the job of interpolation-sequence handlers: they can count on the fact that expressions and literals always alternate inside a sequence. On the other, they have to check if a literal is empty and drop it if it is so it actually makes their job harder.

I do not know whether not producing empty literals in the first place would be a positive or negative change. But it is something worth to consider.

Slightly off-topic: when I was thinking about this, I was astonished by the fact istrings can work with readf/formattedRead/scanf. Just wanted to share this observation.

readf(i" $(&x) $(&y)");

The compiler will indeed reject it (The error message would be a bit baffling to those who don't know what Interpolation types are)

This is true. I suppose the docs should mention InterpolationHeader and friends when talking about istrings, explain what an istring is lowered to, and show examples. Then a programmer who’ve read the docs will have a mental association between “istring” and “InterpolationHeader/Footer/etc.” Those who don’t read the docs—well, they won’t have. Only googling will save them.

To be honest, I’m not concerned about this point too much.

along with any attempt to call execi() with a pre-constructed string.

The end result is that to do manipulation with istring tuples, the programmer is alternately faced with adding Interpolation elements or filtering them out. Is that really what we want?

I’d argue it is wonderful that execi cannot be called with a pre-constructed string. The API should provide another function instead—say, execDynamicStatement(Sqlite, string, Args...). execi should be used for statically known SQL with interpolated arguments, and execDynamicStatement—for arbitrary SQL constructed at runtime. A verbose name is intentional to discourage its usage in favour of execi.

P.S. most keyboarding bugs result from neglecting to add needed syntax, not typing extra stuff.

That makes sense. Though you’ll never guess what beast can be spawned by uncareful refactoring. Extra protection won’t harm, especially if it’s zero-cost.

P.S. Zero-initialization of variables is one of D’s cool features, indeed.

January 10, 2024

Re: Interpolated strings and SQL

Posted by Nickolay Bukreyev
in reply to Walter Bright

Permalink

Nickolay Bukreyev

Posted in reply to Walter Bright

Permalink

On Monday, 8 January 2024 at 03:05:17 UTC, Walter Bright wrote:

On 1/7/2024 6:30 PM, Walter Bright wrote:

On 1/7/2024 3:50 PM, Timon Gehr wrote:

This cannot work:

int x=readln.strip.split.to!int;
db.execi(xxx!i"INSERT INTO sample VALUES ($(id), $(2*x))");

True, you got me there. It's the 2*x that is not turnable into an alias. I'm going to think about this a bit.

I wonder if what we're missing are functions that operate on tuples and return tuples. We almost have them in the form of:

template tuple(A ...) { alias tuple = A; }

but the compiler wants A to only consist of symbols, types and expressions that can be computed at compile time. This is so the name mangling will work. But what if we don't bother doing name mangling for this kind of template?

Yes! It would be brilliant if alias could refer to any Expression, not just symbols. If that was the case, we could just pass InterpolationHeader/Footer/etc. to template parameters (as opposed to runtime parameters, where they go now).

// Desired syntax:
db.execi!i"INSERT INTO sample VALUES ($(id), $(2*x))";
// Desugars to:
db.execi!(
    InterpolationHeader(),
    InterpolatedLiteral!"INSERT INTO sample VALUES ("(),
    InterpolatedExpression!"id"(),
    id,
    InterpolatedLiteral!", "(),
    InterpolatedExpression!"2*x"(),
    2*x, // Currently illegal (`2*x` is not aliasable).
    InterpolatedLiteral!")"(),
    InterpolationFooter(),
);
// `execi!(...)` would expand to:
db.execImpl("INSERT INTO sample VALUES (?1, ?2)", id, 2*x);

With this approach, they are processed entirely via compile-time sequence manipulations. Zero-sized structs are never passed as arguments. Inlining is not necessary to get rid of them.

An example with writeln (or just about any function alike):

writeln(interpolate!i"prefix $(baz + 4) suffix");
// Desugars to:
writeln(interpolate!(
    InterpolationHeader(),
    InterpolatedLiteral!"prefix "(),
    InterpolatedExpression!"baz + 4"(),
    baz + 4,
    InterpolatedLiteral!" suffix"(),
    InterpolationFooter(),
));
// `interpolate!(...)` would expand to:
writeln("prefix ", baz + 4, " suffix");

January 10, 2024

Re: Interpolated strings and SQL

Posted by Nickolay Bukreyev
in reply to Nickolay Bukreyev

Permalink

Nickolay Bukreyev

Posted in reply to Nickolay Bukreyev

Permalink

On Wednesday, 10 January 2024 at 15:07:42 UTC, Nickolay Bukreyev wrote:

writeln(interpolate!i"prefix $(baz + 4) suffix");
// Desugars to:
writeln(interpolate!(
    InterpolationHeader(),
    InterpolatedLiteral!"prefix "(),
    InterpolatedExpression!"baz + 4"(),
    baz + 4,
    InterpolatedLiteral!" suffix"(),
    InterpolationFooter(),
));
// `interpolate!(...)` would expand to:
writeln("prefix ", baz + 4, " suffix");

Well, InterpolatedLiteral and InterpolatedExpression don’t have to be templates anymore:

writeln(interpolate!i"prefix $(baz + 4) suffix");
// Desugars to:
writeln(interpolate!(
    InterpolationHeader(),
    InterpolatedLiteral("prefix "),
    InterpolatedExpression("baz + 4"),
    baz + 4,
    InterpolatedLiteral(" suffix"),
    InterpolationFooter(),
));
// `interpolate!(...)` would expand to:
writeln("prefix ", baz + 4, " suffix");

January 10, 2024

Re: enum Format

Posted by Walter Bright
in reply to Timon Gehr

Permalink

Walter Bright

Posted in reply to Timon Gehr

Permalink

On 1/9/2024 2:38 PM, Timon Gehr wrote:
> %s7 8 9

Yes, I used writeln instead of writefln. The similarity between the two names is a source of error, but if that was a festering problem we'd have seen a lot of complaints about it by now.

> And you can get rid of the runtime overhead by adding a `pragma(inline, true)` `writeln` overload. (I guess with DMD that will still bloat the executable,

Try it and see.

I didn't mention the other kind of bloat - the rather massive number and size of template names being generated that go into the object file, as well as all the uncalled functions generated only to be removed by the linker.

As far as I can tell, the only advantage of DIP1036 is the use of inserted templates to "key" the tuples to specific functions. Isn't that what the type system is supposed to do? Maybe the real issue is that a format string should be a different type than a conventional string. For example:

```d
extern (C) pragma(printf) int printf(const(char*), ...);

enum Format : string;

void foo(Format f) { printf("Format %s\n", f.ptr); }
void foo(string s) { printf("string %s\n", s.ptr); }

void main()
{
    Format f = cast(Format)"f";
    foo(f);
    string s = "s";
    foo(s);
}
```
which prints:

Format f
string s

If we comment out `foo(string s)`:

test2.d(14): Error: function `test2.foo(Format f)` is not callable using argument types `(string)`
test2.d(14):        cannot pass argument `s` of type `string` to parameter `Format f`

If we comment out `foo(Format s)`:

string f
string s

This means that if execi()'s first parameter is of type `Format`, and the istring generates the format string with type `Format`, this key will fit the lock. A string generated by other means, such as `.text`, will not fit that lock.

January 10, 2024

Re: Overhead of DIP1036

Posted by Walter Bright
in reply to Steven Schveighoffer

Permalink

Walter Bright

Posted in reply to Steven Schveighoffer

Permalink

On 1/9/2024 3:33 PM, Steven Schveighoffer wrote:
> I find it bizarre to be concerned about the call performance of zero-sized structs and empty strings to writeln or writef, like the function is some shining example of performance or efficient argument passing. If you do not have inlining or optimizations enabled, do you think the call tree of writefln is going to be compact? Not to mention it eventually just calls into C opaquely.
> 
> Note that you can write a simple wrapper that can be inlined, which will mitigate all of this via compile-time transformations.
> 
> If you like, I can write it up and you can try it out!

I've been aware for a long time that writeln and writefln are very inefficient, and could use a re-engineering.

A big part of the problem is the blizzard of templates resulting from using them. This issue doubles the number of templates. Even if they are optimized away, they sit in the object file.

Anyhow, see my other reply to Timon. I may have found a solution. I'm interested in your thoughts on it.

January 10, 2024

Re: Overhead of DIP1036

Posted by Hipreme
in reply to Walter Bright

Permalink

Hipreme

Posted in reply to Walter Bright

Permalink

On Wednesday, 10 January 2024 at 20:19:46 UTC, Walter Bright wrote:
> On 1/9/2024 3:33 PM, Steven Schveighoffer wrote:
>> I find it bizarre to be concerned about the call performance of zero-sized structs and empty strings to writeln or writef, like the function is some shining example of performance or efficient argument passing. If you do not have inlining or optimizations enabled, do you think the call tree of writefln is going to be compact? Not to mention it eventually just calls into C opaquely.
>> 
>> Note that you can write a simple wrapper that can be inlined, which will mitigate all of this via compile-time transformations.
>> 
>> If you like, I can write it up and you can try it out!
>
> I've been aware for a long time that writeln and writefln are very inefficient, and could use a re-engineering.
>
> A big part of the problem is the blizzard of templates resulting from using them. This issue doubles the number of templates. Even if they are optimized away, they sit in the object file.
>
> Anyhow, see my other reply to Timon. I may have found a solution. I'm interested in your thoughts on it.

Are you sure you really want to keep optimizing debug logging functionality? Come on. The only reason to keep using `printf` and `writeln` is for debug logging. If you're going to show your log function to a user, it is going to be completely different.

They are super easy to disable by simply creating a wrapper.
If you want to know what increases the compilation time on them, is `std.conv.to!float`. I have said this many times on forums already. I don't know about people's hobby, but caring about performance on logging is simply too much.

Do me a favor: Press F12 to open your browser's console, then write at it: `for(let i = 0; i < 10000; i ++) console.log(i);`

You'll notice how slot it is. And this is not JS problem. Logging is always slow, no matter how much you optimize. I personally find this a great loss of time that could be directed into a lot more useful tasks, such as:
- Improving debugging symbols in DMD and for macOS
- Improving importC until it actually works
- Listen to rikki's complaint about how slow it is to import UTF Tables
- Improving support for shared libraries on DMD (like not making it collect an interfaced object)
- Solve the problem with `init` property of structs containing memory reference which can be easily be corrupted
- Fix the problem when an abstract class implements an interface
- Make a D compiler daemon
- Help in the project of DMD as a library focused on helping WebFreak in code-d and serve-d
- Implement DMD support for Apple Silicon
- Revive newCTFE engine
- Implement ctfe caching


Those are the only thing I can take of my mind right now. Anyway, I'm not here to demand anything at all. I'm only giving examples on what could be done in fields I have no experience in how to make it better, but I know people out there can do it. But for me, it is just a pity to see such genius wasting time on improving a rather antiquated debug functionality

January 10, 2024

Re: Overhead of DIP1036

Posted by Walter Bright
in reply to Hipreme

Permalink

Walter Bright

Posted in reply to Hipreme

Permalink

On 1/10/2024 12:56 PM, Hipreme wrote:
> - Improving debugging symbols in DMD and for macOS
> - Improving importC until it actually works
> - Listen to rikki's complaint about how slow it is to import UTF Tables
> - Improving support for shared libraries on DMD (like not making it collect an interfaced object)
> - Solve the problem with `init` property of structs containing memory reference which can be easily be corrupted
> - Fix the problem when an abstract class implements an interface
> - Make a D compiler daemon
> - Help in the project of DMD as a library focused on helping WebFreak in code-d and serve-d
> - Implement DMD support for Apple Silicon
> - Revive newCTFE engine
> - Implement ctfe caching

I regularly work on many of those problems. For example, without looking it up, I think I've fixed maybe 20 ImportC issues in the last month. I've also done a number of recent PRs aimed at making D more tractable as a library. So has Razvan.

January 11, 2024

Re: enum Format

Posted by Nickolay Bukreyev
in reply to Walter Bright

Permalink

Nickolay Bukreyev

Posted in reply to Walter Bright

Permalink

On Wednesday, 10 January 2024 at 19:53:48 UTC, Walter Bright wrote:

I may have found a solution. I'm interested in your thoughts on it.

It looks very similar to what I presented in my later posts (this and one following). It’s inspiring: we are probably getting closer to common understanding of things.

As far as I can tell, the only advantage of DIP1036 is the use of inserted templates to "key" the tuples to specific functions. Isn't that what the type system is supposed to do? Maybe the real issue is that a format string should be a different type than a conventional string.

Exactly. Let me try to explain why DIP1036 is doing what it is doing. For illustrative purposes, I’ll be drastically simplifying code; please excuse me for that.

Let there be foo, a function that would like to receive an istring. Inside it, we would like to transform its argument list at compile time into a new argument list. So what we essentially want is to pass an istring to a template parameter so that it is available to foo at compile time:

int x;
foo!(cast(Format)"prefix ", 2 * x); // foo!(alias Format, alias int)()

Unfortunately, this does not work because 2 * x cannot be passed to an alias parameter. This is the root of the problem. The only way to do that is to pass them to runtime parameters:

int x;
foo(cast(Format)"prefix ", 2 * x); // foo!(Format, int)(Format, int)

However, now foo cannot access the format string at compile time—its type is simply Format, and its value becomes known only at runtime. So we encode the value into the type:

int x;
foo(Format!"prefix "(), 2 * x); // foo!(Format!"prefix ", int)(Format!"prefix ", int)

This is more or less what DIP1036 is doing at the moment. Hope it became clear now.

I’d say DIP1036, as we see it now, relies on a clever workaround of a limitation imposed by the language. If that limitation is gone, the DIP will become simpler.

Top | Forum index | About this forum

Forums