View mode: basic / threaded / horizontal-split · Log in · Help
October 02, 2012
Re: Proposal: clean up semantics of array literals vs string literals
On 10/2/12 7:11 AM, Don Clugston wrote:
> The problem
> -----------
>
> String literals in D are a little bit magical; they have a trailing \0.
[snip]

I don't mean to be Debbie Downer on this because I reckon it addresses 
an issue that some have, although I never do. With that warning, a few 
candid opinions follow.

First, I think zero-terminated strings shouldn't be needed frequently 
enough in D code to make this necessary.

Second, a simple and workable solution to this would be to address the 
matter dynamically: make toStringz opportunistically look whether 
there's a \0 beyond the end of the string, EXCEPT when the string 
happens to end exactly at a page boundary (in which case accessing 
memory beyond the end of the string may produce a page fault). With this 
simple dynamic test we don't need precise and stringent rules for the 
implementation.

Third, the complex set of rules proposed pushes the number of cases in 
which the \0 is guaranteed, but doesn't make for a clear and easy to 
remember boundary. Therefore people will need to remember some more 
rules to make sure they can, well, avoid a call to toStringz.

On 10/2/12 10:55 AM, Regan Heath wrote:
> Recent discussions on the zero terminated string problems and
> inconsistency of string literals has me, again, wondering why D
> doesn't have a 'type' to represent C's zero terminated strings.  It
> seems to me that having a type, and typing C functions with it would
> solve a lot of problems.
[snip]
> I am probably missing something obvious, or I have forgotten one of
> the array/slice complexities which makes this a nightmare.

You're not missing anything and defining a zero-terminated type is 
something I considered doing and have been highly interested in. My 
interest is motivated by the fact that sentinel-terminated structures 
are a very interesting example of forward ranges that are also 
contiguous. That sets them apart from both singly-linked lists and 
simple arrays, and gives them interesting properties.

I'd be interested in defining the more general:

struct SentinelTerminatedSlice(T, T terminator)
{
    private T* data;
    ...
}

That would be a forward range and the instantiation 
SentinelTerminatedSlice!(char, 0) would be CString.

However, so far I held off of defining such a range because C-strings 
are seldom useful in D code and there are not many other compelling 
examples of sentinel-terminated ranges. Maybe it's time to dust off that 
idea, I'd love it if we gathered enough motivation for it.


Andrei
October 02, 2012
Re: Proposal: clean up semantics of array literals vs string literals
On Tuesday, 2 October 2012 at 15:14:10 UTC, Andrei Alexandrescu 
wrote:
> However, so far I held off of defining such a range because 
> C-strings are seldom useful in D code [...]

I think your view of what is common in D code is not 
representative. You are primarily a library writer, which means 
you rarely have to interface with other code. Please correct me 
if I'm wrong, but I don't believe you've written much 
application-level D code.

For people that write applications, we have the unfortunate chore 
of having to call lots of C APIs to get things done. There's a 
long list of things for which there is no D interface (graphics, 
audio, input, GUI, database, platform APIs, various 3rd party 
libs). Invariably these interfaces require C strings. In short, 
if you write applications in D, you need C strings.

I don't know what the right decision is here, but please do not 
say that C-strings are seldom useful in D code.
October 04, 2012
Re: Proposal: clean up semantics of array literals vs string literals
On 02/10/12 17:14, Andrei Alexandrescu wrote:
> On 10/2/12 7:11 AM, Don Clugston wrote:
>> The problem
>> -----------
>>
>> String literals in D are a little bit magical; they have a trailing \0.
> [snip]
>
> I don't mean to be Debbie Downer on this because I reckon it addresses
> an issue that some have, although I never do. With that warning, a few
> candid opinions follow.
>
> First, I think zero-terminated strings shouldn't be needed frequently
> enough in D code to make this necessary.

[snip]

You're missing the point, a bit. The zero-terminator is only one symptom 
of the underlying problem: string literals and array literals have the 
same type but different semantics.
The other symptoms are:
* the implicit .dup that happens with array literals, but not string 
literals.
This is a silent performance killer. It's probably the most common 
performance bug we find in our code, and it's completely ungreppable.

* string literals are polysemous with width (c, w, d) but array literals 
are not (they are polysemous with constness).
For example,
"abc" ~ 'ü'
is legal, but
['a', 'b', 'c'] ~ 'ü'
is not.
This has nothing to do with the zero terminator.
October 04, 2012
Re: Proposal: clean up semantics of array literals vs string literals
On Tuesday, 2 October 2012 at 14:03:36 UTC, monarch_dodra wrote:
> If you want 0 termination, then make it explicit, that's my 
> opinion.

That ship has long since sailed. You'll break code in an
incredibly dangerous way if you were to change it now.
October 04, 2012
Re: Proposal: clean up semantics of array literals vs string literals
On Tuesday, 2 October 2012 at 15:14:10 UTC, Andrei Alexandrescu 
wrote:
> First, I think zero-terminated strings shouldn't be needed 
> frequently enough in D code to make this necessary.

My experience has been much different. Interfacing with C occurs
in nearly every D program I write, and I usually end up passing
a string literal. Anecdotes!
October 04, 2012
Re: Proposal: clean up semantics of array literals vs string literals
On Thursday, 4 October 2012 at 07:57:16 UTC, Bernard Helyer wrote:
> On Tuesday, 2 October 2012 at 15:14:10 UTC, Andrei Alexandrescu 
> wrote:
>> First, I think zero-terminated strings shouldn't be needed 
>> frequently enough in D code to make this necessary.
>
> My experience has been much different. Interfacing with C occurs
> in nearly every D program I write, and I usually end up passing
> a string literal. Anecdotes!

Agreed. I'm always happy when I find that the particular C API I 
am working with supports passing strings as a pointer/length pair 
:)

Anyway, toStringz (and the wchar and dchar equivalents in 
std.utf) needs to be fixed regardless - it currently does a 
dangerous optimization if the string is immutable, otherwise it 
unconditionally concatenates. We cannot rely on strings being GC 
allocated based on mutability. Memory is outside the scope of the 
D type system - we cannot make assumptions about memory based on 
types.
Next ›   Last »
1 2
Top | Discussion index | About this forum | D home