malloc and buffer overflow attacks (page 3)

On Friday, 31 December 2021 at 00:13:56 UTC, Walter Bright wrote:

What if len*T.sizeof overflows? malloc() will succeed, but the result will be too small for the data.

Makes me wish access to the raw multiplication result and carry flags may have been useful here in some way, although the language might not have liked it.

As a quick overview for non-ASM savvy types, x86 will take 2 arguments, you put one argument in AX and the other you pass to mul. So...

mov AX, 0x155;
mov BX, 0x123;
mul BX; //result: AX=839F, DX=0001

The result of which is going to be in AX:DX, where the overflow is put in DX (if any), letting you get a 32bit result from 16bit registers (or in 64 bit machines you'd get a 128bit result).

Most modern languages just ignore the upper result though, which brings us to the following topic. In the 16bit example above you'd be short 64k.

Too bad we don't have the cent type yet. Otherwise I'd think using ulong and calculating the result and passing that would be the safest, (assuming malloc would take a 64bit result), otherwise checking the upper bits for if it's too big.

January 01, 2022

Re: malloc and buffer overflow attacks

Posted by max haughton
in reply to Era Scarecrow

Permalink

max haughton

Posted in reply to Era Scarecrow

Permalink

On Saturday, 1 January 2022 at 01:44:42 UTC, Era Scarecrow wrote:

On Friday, 31 December 2021 at 00:13:56 UTC, Walter Bright wrote:

What if len*T.sizeof overflows? malloc() will succeed, but the result will be too small for the data.

Makes me wish access to the raw multiplication result and carry flags may have been useful here in some way, although the language might not have liked it.

As a quick overview for non-ASM savvy types, x86 will take 2 arguments, you put one argument in AX and the other you pass to mul. So...

mov AX, 0x155;
mov BX, 0x123;
mul BX; //result: AX=839F, DX=0001

The result of which is going to be in AX:DX, where the overflow is put in DX (if any), letting you get a 32bit result from 16bit registers (or in 64 bit machines you'd get a 128bit result).

Most modern languages just ignore the upper result though, which brings us to the following topic. In the 16bit example above you'd be short 64k.

Cleanly expressing access to the flags sounds quite hard, also note that some architectures don't have flags, and some other architectures make writing to the flags optional.

ARM chose optional, it might be one of a few things that could lead them beating RISC-V in the long run (both as a design decision but also as a point about embracing pragmatism)

On Saturday, 1 January 2022 at 01:44:42 UTC, Era Scarecrow wrote: > Makes me wish access to the raw multiplication result and carry flags may have been useful here in some way, although the language might not have liked it. You can check carry using core.checkedint (https://dlang.org/phobos/core_checkedint.html).

On Friday, 31 December 2021 at 22:10:10 UTC, H. S. Teoh wrote: > On Fri, Dec 31, 2021 at 12:12:45PM -0800, Walter Bright via Digitalmars-d wrote: >> On 12/30/2021 4:37 PM, sarn wrote: >> > Good thing to do, but Walter's talking about integer overflow with the `len * T.sizeof` calculation itself. >> > >> > calloc() doesn't have this problem. >> >> The calculation of `len` can also have overflow problems. `calloc` is not sufficient. The provenance of `len` needs to be carefully checked. > > At my day job we use Coverity to identify potential issues with our C codebase. One of the issues it reports is using external inputs as the length of a memory allocation, the typical case being reading an int or long from a file/socket and then passing that to malloc, et al (potentially with a sizeof multipler). So imagine a carefully-crafted malicious input designed to overflow a 64-bit integer just a little -- the malloc call would end up allocating just a tiny amount of memory while the rest of the code thinks that the buffer has more memory than can be addressed. > > Of course, that tempts the following check (obviously wrong, but I've seen this scarily often in "professional" code): > > size_t n = ... /* read from file/socket */; > size_t nbytes = n * sizeof(Element); // oops > if (nbytes > INT64_MAX) // never true if(__builtin_clz(nbytes)+__builtin_clz(sizeof Element)<64) this is the best way to find overflow as it avoids all the undefined behaviour shenanigans of the compilers optimizers. The inconvenience is that not all CPU implement CLZ and some do but quite slowly. > error(); // dead code > void *p = malloc(n * sizeof(Element)); // uh-oh > for (i=0; i < n; i++) { > ... /* use p: kaboom */

On Friday, 31 December 2021 at 00:13:56 UTC, Walter Bright wrote: > I post this as I've recently seen reports on malware injection being enabled by presenting specially crafted input data to a program that causes an overflow on the allocation, then overwrites the data beyond the truncated allocated memory. I guess you are talking about this [1] ... [1] https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html

On 1/1/2022 9:29 AM, Paolo Invernizzi wrote: > I guess you are talking about this [1] ... > > [1] https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html Yes, thank you, that was it. The overflow in the computation of numSyms.

On Friday, 31 December 2021 at 00:13:56 UTC, Walter Bright wrote: > While D offers buffer overflow detection, it does not protect against buffer overflows resulting from an array size calculation overflow: > > T* p = cast(T*)malloc(len * T.sizeof); > > What if `len*T.sizeof` overflows? malloc() will succeed, but the result will be too small for the data. > but the idea that code is only as safe as the functions it calls, is not a new idea...right?

OpenBSD has had a function for a long time to deal with this exact problem. It's called reallocarray: https://man.openbsd.org/reallocarray Don't let the name fool you--it handles both the initial allocation and reallocation. Perhaps D should provide a similar function (not saying it has to be reallocarray). Asking people to fix their own code is a recipe for everyone creating different, subtly different and potentially incorrect, versions of a problem that should be solved once and then used by everyone. ~Brian

Forums