Thread overview
Re: D mentioned on Rust discussions site
May 24, 2020
H. S. Teoh
May 25, 2020
Walter Bright
May 25, 2020
H. S. Teoh
May 23, 2020
On Sat, May 23, 2020 at 09:05:47PM -0700, Walter Bright via Digitalmars-d wrote:
> On 5/23/2020 6:18 AM, Dibyendu Majumdar wrote:
> > > https://www.digitalmars.com/articles/C-biggest-mistake.html
> > 
> > I do not think it was a mistake at all. Treating pointers and arrays uniformly is one of the great innovations in C.
> 
> The amount of buffer overflows enabled by it is legendary.

On Sat, May 23, 2020 at 09:15:18PM -0700, Walter Bright via Digitalmars-d wrote:
> On 5/23/2020 4:50 PM, Andrei Alexandrescu wrote:
> > This "no fat pointers" decision cascaded into another disastruous choice - zero terminated strings. Worse than even autodecoding :o).
> 
> Ironically, a lot of C's speed advantage is lost in endlessly scanning strings to find their length.

Hear, hear.

Having worked for the past 25 years in the industry primarily with C code, I have to say that the amount of C code with hidden potential buffer overflows is staggering. In spite of a big push in the past 10 or so years towards safer C coding practices, I still regularly come across code that have potential buffer overflows hidden deep within a large codebase, and code being freshly written that *still* depend on unfounded assumptions about array length -- because manually passing the length is just too cumbersome, and people either don't bother with it, or make mistakes while doing it (nothing like passing the wrong length to screw up your day -- esp. when it gets missed by QA and proceeds to explode in the customer's production environment -- or worse, it *doesn't* get noticed by the customer until a hacker decides to exploit it).

Not to mention the gigantic pain in writing said code in C in the first place -- you constantly have to be burning mental calories trying to keep track of which pointer goes with which length, making sure to manually check bounds, making sure to think thrice about your loop to prevent overruns, endless calls to strlen() without even thinking about the performance consequences. I'm serious, I work with a huge codebase of almost 2 million LOC, and strlen calls are everywhere, including inside macros that nobody really takes the time to understand but just sprinkle everywhere in their code.  The complexity of managing all of this is just beyond any normal person's capacity given the tight deadlines we have to meet, so people just call strlen willy-nilly without even thinking about the performance consequences.  And of course, the huge bulk of code that expects null-terminated strings means it's just not worth trying to encapsulate the length in any sort of way -- you make yourself incompatible with just about everything else, and end up spending endless effort converting to/from char* instead, which is not any better.  So the problem perpetuates itself, and becomes an accumulated cost that nobody can feasibly improve without rewriting the entire darned thing from ground up. (And even then, all it takes is for *someone* to start using char* again, and suddenly you're back to square one. No sane C coder uses anything *but* char* for strings in "normal" C code. Esp. code that needs to talk to other C libraries not written by your team. It's inextricably ingrained in C culture.)

And strlen isn't even the end of it; because of manual memory management, everyone defensively strdup's every darned string that they intend to keep, because there's just no other sane way of ensuring the pointer won't get invalidated later (and no guarantee, thanks to default mutability, that someone won't change the contents of the string down the line and screw up the assumptions in your code). So in addition to strlen, your average typical C code is plagued with endless calls to strdup or equivalent, even for simple things like taking substrings.

Some people complain about D strings being immutable(char)[], but I tell you, encapsulated in that seemingly-trivial construct is a ton of experience-backed insight about string management in C-like programming paradigms that's worth careful study.  People like bashing D because of the GC, but nobody talks about how *not* needing to call strlen() or strdup() endlessly is such a performance booster, not to mention it makes your code a LOT simpler and your APIs a lot cleaner to work with.

Considering just how much work has been poured into mitigating the horrible consequences of "no fat pointers" and null-terminated strings -- think about how much effort has been poured into dealing with buffer overflow bugs over the past 10 years: entire cottage industries have grown up around developing tools for detecting and fixing these sorts of things, and who knows how much money poured into cleaning up the consequences of the countless security exploits enabled by said buffer overflows -- the laughable "benefits" of saving a couple of bytes by using only lean pointers hardly measures up to anything less than a colossal design mistake in retrospect. If you can even call them "benefits": think of all the costly workarounds people have had to put up with: everyone inventing their own way of passing array length instead of having a standardized API, and inevitably doing it poorly / with costly slip-ups, and the memory cost of needing to invent, store, and manage data structures needed to manage all of this -- I surmise this in itself already counteracts any meager savings one may have gained by avoiding fat pointers. (Just think: in any persistent struct that carries pointers to arrays: you're already paying for the cost of a fat pointer because you need to store the length somehow anyway, except everyone invents their own way of doing this so you're essentially already paying for fat pointers but with none of the benefits of having a standardized fat pointer type that's been tested to do it correctly: you run the risk of human error at every turn.)

D arrays being fat pointers is a HUGE step at getting rid of the nonsensical churn C's array-pointers lead to. And immutable(char)[] is a huge saver in terms of performance.  The two taken together is one of D's strengths.


T

-- 
"I suspect the best way to deal with procrastination is to put off the procrastination itself until later. I've been meaning to try this, but haven't gotten around to it yet. " -- swr
May 24, 2020
You should write this up and post it on Reddit / Hackernews!
May 24, 2020
On Sun, May 24, 2020 at 06:02:53PM -0700, Walter Bright via Digitalmars-d wrote:
> You should write this up and post it on Reddit / Hackernews!

Feel free to copy-n-paste it, you have my full permission.  I'm done with Reddit and have no interest in investing in another timesink like Hackernews.  I have too much on my hands and not enough time to do it.


T

-- 
Never ascribe to malice that which is adequately explained by incompetence. -- Napoleon Bonaparte