Jump to page: 1 2
Thread overview
Why Bloat Is Still Software’s Biggest Vulnerability
Feb 12
tim
Feb 12
tim
Feb 12
M. M.
Feb 12
deadalnix
Feb 13
monkyyy
Feb 12
deadalnix
Feb 13
Kagamin
February 12

I thought I would get a discussion started on software bloat.

Maybe D can be part of the solution to this problem?

February 12

On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:

>

I thought I would get a discussion started on software bloat.

Maybe D can be part of the solution to this problem?

oops forgot link to article,

https://spectrum.ieee.org/lean-software-development

February 12

On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:

>

On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:

>

I thought I would get a discussion started on software bloat.

Maybe D can be part of the solution to this problem?

oops forgot link to article,

https://spectrum.ieee.org/lean-software-development

Agreed .. two days ago I needed to pull a 13GB docker image from Nvidia repository ... a totally out of control mess.

/P

February 12
On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via Digitalmars-d wrote:
> On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:
> > On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:
> > > I thought I would get a discussion started on software bloat.
> > > 
> > > Maybe D can be part of the solution to this problem?

No amount of D innovation is going to stop programmers infected with the madness of dynamic remote dependencies that pull in an arbitrary number of external modules. Potentially a different set of them every time you build.  Tools like cargo or dub actively encourage this model of software development.

Which is utterly crazy, if you think about it. Unless you pin every dependency to exact versions (who even does that?!), every time you build your code you're potentially getting a (subtly) different set of dependencies. That means the program you've been trying to debug 5 mins ago may not even be the same program you're debugging now.  Now of course it's possible to turn off this behaviour while debugging, but still, the fact that that's the default behaviour is just nuts.

Over the long term, this means that you cannot reliably reproduce older versions of your software -- because the versions of dependencies version 1.0 depended on may not even exist anymore, now that your program is at version 2.0.  If your customer reports a problem, you have no way of debugging it; you can't even reproduce the exact image your customer is running anymore, let alone make any fixes to it. The only thing left to do is to tell them "just upgrade to the latest version". Which is the kind of insanity that's familiar to everyone of us these days.  Nevermind the fallacy that "newer == better". Especially not in the current atmosphere of software development, where so-called "patch" releases are not patch releases at all, but full-featured new releases complete with full-fledged new, untested features (because why waste resources making a patch release + a separate new feature release, when you can just bundle the two together, save development costs, and give Marketing all the more excuse to push new features onto customers and thereby making more money).  The number of bugs introduced with each "patch" release may well exceed the number of bugs fixed.

All this not even to mention the insanity that sometimes specifying just *one* dependency will pull in tens or even hundreds of recursive dependencies. A hello world program depends on a standard I/O package, which in turn depends on a date-formatting package, which in turn depends on the locales package, which in turn depends on the internet timeserver client package, which depends on the crytography package, ad nauseaum.  And so it takes a totally insane amount of packages just to print Hello World on the screen.

Not to mention the whole concept of depending on some 3rd party code that exists on some remote server somewhere out there on the wild wild west (www) of the 'net is just crazy.  The article linked below alludes to obsolete NPM / Node packages being taken over by malicious actors in order to inject malicious code into unwitting software.  There's also the problem that your code is not compilable if for whatever reason you lost network connectivity. Which means if you suddenly find yourself in an emergency and have to make a small fix to your program, you won't be able to recompile it. Good luck.


> > https://spectrum.ieee.org/lean-software-development
> 
> Agreed .. two days ago I needed to pull a 13GB docker image from Nvidia repository ... a totally out of control mess.
[...]

Reducing code size is, to paraphrase Walter, to plug one hole in a cheese grater. There are so other many things wrong with the present state of software that code size doesn't even begin to address.

Today's web app scene is exemplary of the insanity in software development. It takes GBs of memory and multicore GHz CPUs to run a ridiculously complex web browser in order to be able to run some bloated web app with tons of external dependencies at the *same speed* as an equivalent lean native program in the 80's used to run on 64 KB of memory and a 16 kHz single-core CPU.  What's wrong with the picture here?

And don't even get me started on the IoT scene, which is a mind-bogglingly insane concept in and of itself. Why does my toaster need to run a million LoC operating system sporting an *internet connection*?!  Or indeed, a *stuffed animal toy* that some well-meaning parent give my son as a "gift", that has a built-in internet interface that can be used for downloading audio clips (it's cute, it downloaded a clip of my son's name so that the toy could address him by name -- WHY OH WHY... argh).  I betcha said OS running on this thing has not been updated (and isn't ever going to be) for at least 5 years, and carries who knows how many unpatched security vulnerabilities. I wouldn't be surprised if a good chunk of today's botnets consist of exploited household appliances running far too much more software than they actually require for their primary operations. Perhaps this internet-"enabled" stuffed animal is among the esteemed members of such a botnet. (Thankfully the battery has run out since -- and I'm not planning to replace it, ever. Sorry, botnet.)  These are just milder examples of the IoT madness.  Don't get me started on internet-enabled webcams that can be (and have been) used for far more nefarious purposes than running some script kiddie's botnet.

Years ago, if somebody had told me that some random car driving by the house could hack into my babycam and make it emit a scary noise to scare the baby, I'd have laughed them out of the house as some delusive paranoid.  Unfortunately, today this is actual reality, no thanks to insecure misconfigured WiFi routers whose OS haven't been updated in eons and household appliances having internet access that they have no business to.

In principle, the same thing applies to Docker images that contain far more stuff than they rightly should.  No thanks to these non-solutions to security issues, nowadays it's no longer enough to keep up with your OS's security patches, because patching the host OS does not patch the OSes bundled with each Docker image. And for many applications, nobody's gonna patch their Docker images (the whole reason they went the route of Docker is because they can't be bothered with actual, proper integration with their host OS, they just want to target a static known OS that works for their broken code, and therefore have zero incentive to make any changes at all now that their code works).  So your host OS may very well be completely patched, but thanks to these needlessly bloated Docker images your PC still has as many security holes as a cheese grater.

//

And there's the totally insane concept of running arbitrary code from unknown, untrusted online sources. Javascript, ActiveX, scripting in emails, in documents, etc..  Eye-candy for the customer, completely unnecessary functionally-speaking, and an absolute catastrophe security-wise. The entire concept is flawed to begin with, and things like sandboxing, etc., are merely afterthoughts, bandages that don't actually fix the festering wound underneath.  Sooner or later something will give.  And the past 20 or so years of internet history proves this over and over again, to this very day.  But in spite of the countless arbitrary-code execution vulnerabilities, nobody is ready to tackle the root of the problem: 3rd party code from unknown, untrusted online sources have NO BUSINESS running on my PC. But almost every major application these days are literally dying in their eagerness to run such code -- by default. Your browser, your email reader, your word processor, your spreadsheet app, just about everything, really, just can't wait to get their hands on some fresh unknown 3rd party code in order to run it at the user's expense.

And the usual anemic response when a major exploit happens shows that what the security community is doing -- all they can do given the circumstances, really -- is, to quote Walter again, merely plugging individual holes in a cheese grater.

//

The underlying problem is that the incentives in software development are all wrong these days. Instead of incentivising code quality, security, and conservation of resources, the primary incentive is money. I.e., ship software as early as possible in order to beat your competitors, which in practice means do as little work as you can possibly get away with in order to get the product out the door. Code quality is a secondary concern (we're gonna throw it all out by next release anyway), conservation of resources is a non-issue (resources are cheap, just tell the customer to buy the latest and greatest hardware, our hardware partners will give us a kick-back for the free promotion), and security isn't even on the list.  Developing software the "right" way is not profitable; questionable practices like importing millions of LoC from dynamic remote dependencies get the job done faster and leads to more profit, therefore that's what people will do.

And of course, this state of incentives is good for big companies that are making huge profits off it, so they're not going to let things change for the better as long as they have a say in it. And they're the ones that are employing and paying programmers to produce this trash, so anyone who doesn't agree with them won't last very long in this career. Therefore guess what kind of code the majority of programmers are producing every day.  Definitely not lean, security-conscious code.

As someone once joked, the most profitable software venture is a business of two departments: virus writers and anti-virus development. Welcome to software development hell.


T

-- 
Life is complex. It consists of real and imaginary parts. -- YHL
February 12
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
> On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via Digitalmars-d wrote:
>> On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:
>> > On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:

<snips>

>> > https://spectrum.ieee.org/lean-software-development
>> 
>> Agreed .. two days ago I needed to pull a 13GB docker image from Nvidia repository ... a totally out of control mess.
> [...]
>
> Reducing code size is, to paraphrase Walter, to plug one hole in a cheese grater. There are so other many things wrong with the present state of software that code size doesn't even begin to address.

Hey, at the end the title of the post is: Why Bloat Is Still Software’s __Biggest__ Vulnerability
Let's start to plug the biggest! :-P

Long story short, the docker images was the last resource after having lost a three hours battle against PIP and conflicting dependencies, trying run 2 years old code (python ML environments sometimes is just crazy). Note that also using PIP involved GB of download, tensorflow, keras, etc

February 12
On Mon, Feb 12, 2024 at 05:48:32PM +0000, Paolo Invernizzi via Digitalmars-d wrote:
> On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
[...]
> > Reducing code size is, to paraphrase Walter, to plug one hole in a cheese grater. There are so other many things wrong with the present state of software that code size doesn't even begin to address.
> 
> Hey, at the end the title of the post is: Why Bloat Is Still Software’s __Biggest__ Vulnerability Let's start to plug the biggest! :-P

I'm skeptical whether it's the biggest. There are many holes in a cheese grater; plugging each one individually will always leave you with more holes afterwards. And they are all more-or-less the same size. :-D

However nobody seems willing to entertain the possibility of removing the cheese grater altogether, which would be a much better solution.


> Long story short, the docker images was the last resource after having lost a three hours battle against PIP and conflicting dependencies, trying run 2 years old code (python ML environments sometimes is just crazy). Note that also using PIP involved GB of download, tensorflow, keras, etc

Which is why I said that these are all just holes in a cheese grater. Conflicting dependencies and the inability to compile old code are well-known (to me) symptoms of today's model of software development. I won't go so far as to say that anything requiring GBs of downloads is inherently broken -- perhaps for some applications, large amounts of code / data *is* unavoidable. But I can't believe that the *majority* of dependencies would require such incommensurate amounts of resources.  At the most I'd expect one or two specialised dependencies that might need this, not every other package in your typical online code repo.

//

When I was in college in the 90's code reuse was a big topic. Everyone was talking about coding for libraries so that you don't have to reinvent the wheel. Eventually that led to DLL hell in the Windows world and .so hell in the Posix world.  After 30 years, people are moving away from OS-level dependencies (DLLs and shared libs) to the likes of cargo, npm, dub, and the like. However, the underlying problem of dependency hell has not been solved.  I'm at the point where I'm ready to call BS on the whole concept of code reuse.

So I've gradually come to the conclusion that code reuse, i.e., dependencies, is inherently evil, and should be avoided like the plague unless you absolutely have no other choice. And where it can't be avoided, it should be as shallow as possible. The best dependencies are single-file dependencies like Adam's arsd libs, where you can literally copy the file into your workspace and just compile.  The second best dependency is the single package, where you copy/clone the files into some subdir in your workspace and off you go.  The worst kind of dependency is the one that recursively depends on other packages.  These should be avoided as much as possible, because it's here that NP-completeness and dependency hell begin, and it's here where madness like multi-GB docker images is born.

Copy-pasta is oft-maligned, and I agree that it's evil when it happens within a project.  But I'm at the point where I'm almost ready to declare that copy-pasta is actually good and beneficial when it happens across projects. Much better to just copy the darned code into your local repo and modify it to whatever you need it to do, than to declare a dreaded dependency that's the beginning of the slippery slope into dependency hell and the inclusion of millions of lines of code bloat into your project.


T

-- 
What are you when you run out of Monet? Baroque.
February 12
> The European Union has launched three pieces of legislation to this effect

Well, that'll fix it!
February 12
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
> On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via Digitalmars-d wrote:
>> > > [...]
>
> No amount of D innovation is going to stop programmers infected with the madness of dynamic remote dependencies that pull in an arbitrary number of external modules. Potentially a different set of them every time you build.  Tools like cargo or dub actively encourage this model of software development.
>
> [...]

I enjoyed reading this. I largely agree with what you said. I also agree with your later post about ideal dependencies (like single files from arsd or single packages).
February 12
On Monday, 12 February 2024 at 18:45:26 UTC, Walter Bright wrote:
> > The European Union has launched three pieces of legislation
> to this effect
>
> Well, that'll fix it!

This software has dependencies, do you agree?
February 12
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
> All this not even to mention the insanity that sometimes specifying just *one* dependency will pull in tens or even hundreds of recursive dependencies. A hello world program depends on a standard I/O package, which in turn depends on a date-formatting package, which in turn depends on the locales package, which in turn depends on the internet timeserver client package, which depends on the crytography package, ad nauseaum.  And so it takes a totally insane amount of packages just to print Hello World on the screen.
>

"Funny" example of that.

I wanted to learn of to do a react project from scratch. Not using a framework or anything, just pieces the stuff together to make it work myself.

So babel, webpack, react, jest for testing and stylex for CSS. That's it. Arguably a lot by some standard, but by no means something wild, the JS equivalent of a build system and a test framework.

The project currently has 1103 dependencies. Voila. Pure madness.
« First   ‹ Prev
1 2