| |
 | Posted by Brother Bill in reply to H. S. Teoh | Permalink Reply |
|
Brother Bill 
Posted in reply to H. S. Teoh
| On Saturday, 16 August 2025 at 14:43:25 UTC, H. S. Teoh wrote:
> On Sat, Aug 16, 2025 at 11:56:43AM +0000, Brother Bill via Digitalmars-d-learn wrote:
>> It is obvious that reading or writing to invalid memory can result in
>> "undefined behavior".
>> But is merely pointing to invalid memory "harmful"?
>>
>> The documentation states that going one past the last element of a slice is acceptable.
>
> Where does it say this? This is wrong.
>
>
>> But is it also safe to go 10, 100 or 1000 items past the last element of a slice?
>
> Of course not.
>
> //
>
> It all depends on the interpretation you're using.
>
> Technically speaking, a pointer is just a memory address. An unsigned integer. There's nothing inherently "harmful" about an integer.
>
> The problem arises when you interpret it as a memory address. Once you interpret it as an address, you're likely to pass it around to code that expects to be able to read or write to memory at that address. And that's where the problem arises. There are expectations placed upon an unsigned integer that's to be interpreted as an address, such as that you can read memory from that address. The set of integers that are valid addresses is a subset of the set of all (representable) integers.
>
> It's not just about pointing to "invalid memory" either; it's also about not breaking the expectations of the type system. When handed a pointer to a string, for example, the expectation is that when you read memory at that address, you will find a valid sequence of values that represents a string. If you treat a random unsigned integer as a poitner to a string, you may end up reading a sequence of values that *aren't* a string, thereby obtaining invalid data. Or worse, if you write to that address, then somebody else (i.e. some other code) that put the previous data there may try to read it later, expecting valid data of the previous type, and get instead something that's no longer a valid value of that type. The set of memory addresses containing data of the correct type is narrower than the set of valid addresses (addresses assigned to you by the OS), and the set of valid addresses is narrower than the set of all addresses, most of which will trigger an invalid memory access from your OS because that address wasn't assigned to your program and the OS will step in to terminate your program if you try to access it.
>
> //
>
> Now in theory you can allow arbitrary values in your pointers, and only check for validity when you actually dereference it, analogous to how, given a street address handed to you on a piece of paper, you'd check whether that address actually exists before actually heading out there. In practice, though, this is impractical, because that means every pointer dereference your program makes would have to run through some global registry of valid addresses and check whether data currently stored there is of the correct type. This would be extremely slow and the simplest of operations would take forever to run. (Not to mention the issues of keeping said global registry up-to-date as the program runs and modifies its data.)
>
> To eliminate this onerous overhead every time you dereference a pointer, programming languages make the simplification that *all* pointers must always contain a valid address of the correct type (or a special null value, that indicates that there is no address at all). The idea being that before even assigning a given integer value to a pointer, you'd ensure that it was a valid address to begin with, so that by the time you try to dereference the pointer, you can be confident that it's a valid address and simply dereference it without further verification.
>
> This is essentially the whole point of a type system -- to ensure that a given piece of data is a valid representation of its intended type, so that you can safely manipulate it.
>
> Doing things like assigning non-pointer values to a pointer breaks the guarantees that the type system gives you, because the assumptions made by all those places in your code that dereferences this pointer are now invalid, and all bets are off what will happen when you run that code. This is why it's invalid to point to "invalid" memory. The act of pointing itself is "harmless" -- since it's just some integer address -- but the harm comes from the broken assumptions of the rest of the code that assumes that the address contained in the pointer is a valid address containing data of the expected type.
>
>
> T
I'm not sure we are on the same page.
My question is whether having a pointer to invalid memory causes problems.
I am fully aware that reading or writing to an invalid memory address causes problems.
I am grasping that even having a pointer pointing to an invalid memory address is a huge code smell. But will it cause any undefined behavior merely having a pointer with an invalid memory address.
|