October 28, 2022
On 10/27/22 22:43, jmh530 wrote:
> On Thursday, 27 October 2022 at 17:37:59 UTC, Quirin Schroll wrote:
>> On Friday, 21 October 2022 at 15:13:44 UTC, Ola Fosheim Grøstad wrote:
>>> On Friday, 21 October 2022 at 13:53:28 UTC, Guillaume Piolat wrote:
>>>> But building and agreeing on abstractions is what define the ecosystem.
>>>
>>> One big weakness in C++ and D is that generic definitions cannot be checked without instantiation.
>>
>> There aren’t many languages that have both, templates and generics; C++/CLI and C++/CX are the only ones I know and I happen to have some experience with them. (In this nomenclature: *template* = independent copy of the implementation for every distinct instantiation, cf. C++ or D templates, especially class templates with mutable static variables; *generic* = one implementation, cf. Java or C#, and static variables cannot depend on the argument types.)
>>
>> [snip]
> 
> Timon argued on the C++ pattern matching thread [1] that we need "a way to parameterize functions and aggregates that is completely erased at runtime (generally useful, not only for lifetimes; e.g., this is how to fix `inout`.)". This bears some similarity with generics.
> 
> [1] https://forum.dlang.org/post/tj633h$1g4e$1@digitalmars.com

You can use that for generics if you allow type parameters, but I am not convinced that D is currently willing to add all the type system features (type constraints etc) and backend support [2] that would need to come with that to make it convenient to use. I agree that it would be pretty useful though (and it would enable type checking for some templates too). It can actually be designed to play nice with templates in general (which `inout` and lifetimes would also require). (If you instantiate a template with something that has generic arguments in it, implicitly add generic parameters to the template instance and instantiate it with the appropriate arguments.)


[2] Although interestingly, some of that is already implemented in typeinfo; the original non-templated built-in AA implementation was basically a generic type.

October 27, 2022
On 10/24/2022 1:04 PM, Dukc wrote:
> it's UTF-8 string type. Not only it is guaranteed to point to valid memory, it is statically guaranteed to point to valid UTF-8!
The trouble with that is much of the UTF-8 out there is not valid. You don't want, for example, your html page to refuse to display at all because there's a couple invalid UTF-8 sequences in it. You don't want your text editor to refuse to load a file with invalid UTF-8 in it, either. You don't want your forms processor to summarily reject anything with invalid UTF-8 in it.

A better approach is to have the string processing be tolerant of invalid UTF-8.
October 27, 2022
On 10/25/2022 10:16 AM, Don Allen wrote:
> Not all GCs are stop-the-world. And, in my opinion, the negativity about garbage collectors is exaggerated. There's an awful lot of software out there written in gc-ed languages, e.g., Python, Go, Javascript, that we all use every day, even on our phones, and that performs adequately or more than adequately.

The fact that D's GC is what enables advanced CTFE programming is often overlooked. It's a killer feature enabling a killer feature.

After all, whatcha gonna do with malloc/free in CTFE?
October 28, 2022
On 28/10/2022 12:40 PM, Walter Bright wrote:
> After all, whatcha gonna do with malloc/free in CTFE?

Allocate and free memory?

CTFE is just an application VM, even if we have limitations, they are not inherent to the theory ;)
October 27, 2022
On Thu, Oct 27, 2022 at 04:37:12PM -0700, Walter Bright via Digitalmars-d wrote:
> On 10/24/2022 1:04 PM, Dukc wrote:
> > it's UTF-8 string type. Not only it is guaranteed to point to valid memory, it is statically guaranteed to point to valid UTF-8!
> The trouble with that is much of the UTF-8 out there is not valid. You don't want, for example, your html page to refuse to display at all because there's a couple invalid UTF-8 sequences in it. You don't want your text editor to refuse to load a file with invalid UTF-8 in it, either. You don't want your forms processor to summarily reject anything with invalid UTF-8 in it.

You don't have to refuse anything.  Just substitute it with the Unicode replacement character in your standard library, and no downstream code will need to worry about it anymore.

And should you ever need to process invalid sequences (e.g., in a utility to repair broken encodings), just read it as binary and process it that way.


> A better approach is to have the string processing be tolerant of invalid UTF-8.

Which makes string-processing code more fragile and possibly more complex. Better to let the standard library replace all invalid sequences with the replacement character so that downstream code doesn't have to worry about it anymore.


T

-- 
Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop Frye
October 28, 2022
On 28/10/2022 12:55 PM, H. S. Teoh wrote:
>> A better approach is to have the string processing be tolerant of
>> invalid UTF-8.
> Which makes string-processing code more fragile and possibly more
> complex. Better to let the standard library replace all invalid
> sequences with the replacement character so that downstream code doesn't
> have to worry about it anymore.

Officially you are meant to support only well formed UTF and anything else you are expected to reject.

In practice yes, replacement character can be what you decode (which is what I do).
October 27, 2022
On 10/27/2022 4:51 PM, rikki cattermole wrote:
> On 28/10/2022 12:40 PM, Walter Bright wrote:
>> After all, whatcha gonna do with malloc/free in CTFE?
> 
> Allocate and free memory?

I meant this at a more meta level - people write their own allocators a lot.


> CTFE is just an application VM, even if we have limitations, they are not inherent to the theory ;)

Yes, it is technically possible. But there's no actual purpose to malloc/free in CTFE. The GC works just fine, and it's memory safe, and it's much more convenient.
October 27, 2022
On 10/27/2022 4:55 PM, H. S. Teoh wrote:
> You don't have to refuse anything.  Just substitute it with the Unicode
> replacement character in your standard library, and no downstream code
> will need to worry about it anymore.

That's one way to deal with it. But until it is so processed, it isn't a string if the string requires strict UTF-8.


> And should you ever need to process invalid sequences (e.g., in a
> utility to repair broken encodings), just read it as binary and process
> it that way.

Yes, but you can't do it with strings, if strings don't allow invalid sequences.


>> A better approach is to have the string processing be tolerant of
>> invalid UTF-8.
> 
> Which makes string-processing code more fragile and possibly more
> complex.

I've coded a lot of Phobos to be tolerant of invalid UTF-8. It turns out that it's *unusual* to need to decode UTF-8 at all. It's robust, not fragile.


> Better to let the standard library replace all invalid
> sequences with the replacement character so that downstream code doesn't
> have to worry about it anymore.

Then you have another processing step, and have to make a copy of the string. As I wrote, I have some experience with this. Being tolerant of invalid UTF-8 is a winning strategy.

October 28, 2022

On Thursday, 27 October 2022 at 23:40:11 UTC, Walter Bright wrote:

>

The fact that D's GC is what enables advanced CTFE programming is often overlooked. It's a killer feature enabling a killer feature.

It is clearly better to use high level code in CTFE than system-like code. Maybe you could consider adding more high level features such a comprehensions and generators for the purpose of improved CTFE?

October 28, 2022

On Friday, 28 October 2022 at 04:27:25 UTC, Walter Bright wrote:

> > >

A better approach is to have the string processing be tolerant of
invalid UTF-8.

Which makes string-processing code more fragile and possibly more
complex.

I've coded a lot of Phobos to be tolerant of invalid UTF-8. It turns out that it's unusual to need to decode UTF-8 at all. It's robust, not fragile.

Good point. But it could be easily solved by making the naturally tolerant functions to accept ubytes.

> >

Better to let the standard library replace all invalid
sequences with the replacement character so that downstream code doesn't
have to worry about it anymore.

Then you have another processing step, and have to make a copy of the string. As I wrote, I have some experience with this. Being tolerant of invalid UTF-8 is a winning strategy.

Don't you remember? Ranges are lazy. No copy needed. And IIRC Rust also has a lazy iterator over an unvalidated binary blob to accomplish the same.

And it's not an extra step. If you don't validate a string, then the string processing functions (that need to decode) have to do that anyway.

The Rust way has the advantages that:

  • No string handling function needs to throw anything. The could all be nothrow.
  • If two string handling functions that need to decode are chained to each other, they don't need to both reduntantly check for invalid UTF-8.
  • You don't accidently forget to check for invalid UTF-8, or recheck an already checked string.

The first two could also be accomplished by asserting on invalid UTF-8 instead of throwing an exception, but only static guarantees give the third advantage.