Jump to page: 1 2
Thread overview
How ptr arithmitic works??? It doesn't make any sense....
Dec 04, 2022
rempas
Dec 04, 2022
ag0aep6g
Dec 05, 2022
rempas
Dec 05, 2022
bauss
Dec 05, 2022
rempas
Dec 05, 2022
ag0aep6g
Dec 05, 2022
Salih Dincer
Dec 06, 2022
rempas
Dec 06, 2022
rempas
Dec 04, 2022
Nick Treleaven
Dec 05, 2022
rempas
Dec 04, 2022
H. S. Teoh
Dec 05, 2022
rempas
December 04, 2022

First a little bit of theory. A pointer just points to a memory address which is a number. So when I add "10" to this pointer, it will point ten bytes after the place it was pointing to, right? Another thing with pointers is that it doesn't have "types". A pointer always just points to a location so types are created for the compiler so we can catch bugs when pointing to places and trying to manipulate the bytes to a size we probably wouldn't want to. For example: if you have allocated 4 bytes and then you try to point to it with a type of "short" for example, then you could only manipulate 2 of these 4 bytes but you probably wouldn't and you did something wrong so we do have types and the compiler requires explicit pointer type casting (in contrast to C) so it can protect you from these bugs.

This type-casting brings some problem however. So, I played around it and I figured it out than to get the right location you expect when returning from a function, you need to do the math and then cast the whole expression (so the result) and return that. If you only cast the first value (that is of the different type) an then do that addition (or whatever expression you want), it will return a wrong address. But WAIT!!! This doesn't work in a different example. And I'm braking my head to understand why and I thought about asking if anyone can help and explain to me why. Btw, all the testing was made with ldc in the BetterC "mode". Code:

import core.stdc.stdio;
import core.stdc.stdlib;

struct MemoryBlock {
  char* ptr;
  ulong length;
}

void* ptr = cast(void*)0x7a7;

void* right() {
  return cast(MemoryBlock*)(ptr + MemoryBlock.sizeof); // Cast the whole expression between paranthesis. Got the right value!
}

void* wrong() {
  return cast(MemoryBlock*)ptr + MemoryBlock.sizeof; // First cast the `ptr` variable and then add the number. Got a wronge value...
}

char* return_address_wrong() {
  MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
  return cast(char*)(local_ptr + MemoryBlock.sizeof); // Casted the whole expression. BUT GOT THE WRONG VALUE!!!! Why???
}

char* return_address_right() {
  MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
  return cast(char*)local_ptr + MemoryBlock.sizeof; // Now I first casted the `local_ptr` variable and then added the number but this time this gave me the right value....
}

extern (C) void main() {
  printf("EXPECTED LOCATION: %p\n", ptr + MemoryBlock.sizeof);
  printf("RIGHT LOCATION: %p\n", right());
  printf("WRONG LOCATION: %p\n", wrong());

  printf("RETURNED ADDRESS (wrong): %p\n", return_address_wrong());
  printf("RETURNED ADDRESS (right): %p\n", return_address_right());
}
December 04, 2022

On Sunday, 4 December 2022 at 16:33:35 UTC, rempas wrote:

>

First a little bit of theory. A pointer just points to a memory address which is a number. So when I add "10" to this pointer, it will point ten bytes after the place it was pointing to, right?

Not quite. Adding 10 to a T* means adding 10 * T.sizeof.

December 04, 2022

On Sunday, 4 December 2022 at 16:33:35 UTC, rempas wrote:

>

struct MemoryBlock {
char* ptr;
ulong length;
}

(MemoryBlock.sizeof is 16 on my 64-bit system).

>

void* ptr = cast(void*)0x7a7;

void* right() {
return cast(MemoryBlock*)(ptr + MemoryBlock.sizeof); // Cast the whole expression between paranthesis. Got the right value!
}

The above adds 16 bytes to ptr.

>

void* wrong() {
return cast(MemoryBlock*)ptr + MemoryBlock.sizeof; // First cast the ptr variable and then add the number. Got a wronge value...
}

The above adds 16 * MemoryBlock.sizeof bytes (16 * 16) to ptr, because ptr is cast first. Should be + 1 to be equivalent.

https://dlang.org/spec/expression.html#pointer_arithmetic

"the resulting value is the pointer plus (or minus) the second operand multiplied by the size of the type pointed to by the first operand."

>

char* return_address_wrong() {
MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
return cast(char*)(local_ptr + MemoryBlock.sizeof); // Casted the whole expression. BUT GOT THE WRONG VALUE!!!! Why???
}

Because you are adding to a pointer that points to a 16-byte block, rather than a void* which points to a single byte.

>

char* return_address_right() {
MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
return cast(char*)local_ptr + MemoryBlock.sizeof; // Now I first casted the local_ptr variable and then added the number but this time this gave me the right value....
}

The casted pointer points to a single byte.

December 04, 2022
On Sun, Dec 04, 2022 at 04:33:35PM +0000, rempas via Digitalmars-d-learn wrote:
> First a little bit of theory. A pointer just points to a memory address which is a number. So when I add "10" to this pointer, it will point ten bytes after the place it was pointing to, right?

This is true only if you're talking about pointers in the sense of pointers in assembly language.  Languages like C and D add another layer of abstraction over this.


> Another thing with pointers is that it doesn't have "types".

This is where you went wrong.  In assembly language, yes, a pointer value is just a number, and there's no type associated with it. However, experience has shown that manipulating pointers at this raw, untyped level is extremely error-prone.  Therefore, in languages like C or D, a pointer *does* have a type.  It's a way of preventing the programmer from making silly mistakes, by associating a type (at compile-time only, of course) to the pointer value.  It's a way of keeping track that address 1234 points to a short, and not to a float, for example.  At the assembly level, of course, this type information is erased, and the pointers are just integer addresses.  However, at compile-type, this type exists to prevent, or at least warn, the programmer from treating the value at the pointed-to address as the wrong type.  This is not only because of data sizes, but the interpretation of data.  A 32-bit value interpreted as an int is completely different from a 32-bit value interpreted as a float, for example.  You wouldn't want to perform integer arithmetic on something that's supposed to be a float; the result would be garbage.

In addition, although in theory memory is byte-addressable, many architectures impose alignment restrictions on values larger than a byte. For example, the CPU may require that 32-bit values (ints or floats) must be aligned to an address that's a multiple of 4 bytes.  If you add 1 to an int* address and try to access the result, it may cause performance issues (the CPU may have to load 2 32-bit values and reassemble parts of them to form the misaligned 32-bit value) or a fault (the CPU may refuse to load a non-aligned address), which could be a silent failure or may cause your program to be forcefully terminated. Therefore, typed pointers like short* and int* may not be entirely an artifact that only exists in the compiler; it may not actually be legal to add a non-aligned value to an int*, depending on the hardware you're running on.

Because of this, C and D implement pointer arithmetic in terms of the underlying value type. I.e., adding 1 to a char* will add 1 to the underlying address, but adding 1 to an int* will add int.sizeof to the underlying address instead of 1. I.e.:

	int[2] x;
	int* p = &x[0];	// let's say this is address 1234
	p++;		// p is now 1238, *not* 1235 (int.sizeof == 4)

As a consequence, when you cast a raw pointer value to a typed pointer, you are responsible to respect any underlying alignment requirements that the machine may have. Casting a non-aligned address like 1235 to a possibly-aligned pointer like int* may cause problems if you're not careful.  Also, the value type of the pointer *does* matter; you will get different results depending on the size of the type and any alignment requirements it may have.  Pointer arithmetic involving T* operate in units of T.sizeof, *not* in terms of the raw pointer value.


T

-- 
Change is inevitable, except from a vending machine.
December 05, 2022

On Sunday, 4 December 2022 at 16:40:17 UTC, ag0aep6g wrote:

>

Not quite. Adding 10 to a T* means adding 10 * T.sizeof.

Oh! I thought it was addition. Is there a specific reasoning for that if you are aware of?

December 05, 2022

On Sunday, 4 December 2022 at 17:27:39 UTC, Nick Treleaven wrote:

>

On Sunday, 4 December 2022 at 16:33:35 UTC, rempas wrote:

(MemoryBlock.sizeof is 16 on my 64-bit system).

The above adds 16 bytes to ptr.

The above adds 16 * MemoryBlock.sizeof bytes (16 * 16) to ptr, because ptr is cast first. Should be + 1 to be equivalent.

https://dlang.org/spec/expression.html#pointer_arithmetic

"the resulting value is the pointer plus (or minus) the second operand multiplied by the size of the type pointed to by the first operand."

Thanks! This explains it. And I have tried and I can only use "+" or "-" with a pointer so it explains it.

> >

char* return_address_wrong() {
MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
return cast(char*)(local_ptr + MemoryBlock.sizeof); // Casted the whole expression. BUT GOT THE WRONG VALUE!!!! Why???
}

Because you are adding to a pointer that points to a 16-byte block, rather than a void* which points to a single byte.

>

char* return_address_right() {
MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
return cast(char*)local_ptr + MemoryBlock.sizeof; // Now I first casted the local_ptr variable and then added the number but this time this gave me the right value....
}

The casted pointer points to a single byte.

I think I get it! The first part about the arithmetic explains it all well. I was also able to fix my program. They way I see it, you return from a function by first casting the first operand and when you want to get a variable (or pass one to a function), you cast the whole expression. At least that's how it worked with my program.

December 05, 2022

On Monday, 5 December 2022 at 06:12:44 UTC, rempas wrote:

>

On Sunday, 4 December 2022 at 16:40:17 UTC, ag0aep6g wrote:

>

Not quite. Adding 10 to a T* means adding 10 * T.sizeof.

Oh! I thought it was addition. Is there a specific reasoning for that if you are aware of?

Because it's much easier to work with.

Ex. if you have an array of 4 signed 32 bit integers that you're pointing to then you can simply just increment the pointer by 1.

If it was raw bytes then you'd have to increment the pointer by 4 to move to the next element.

This is counter-intuitive if you're moving to the next element in a loop ex.

This is how you'd do it idiomatically:

foreach (i; 0 .. list.length)
{
    (*cast(int*)(ptr + i)) = i;
}

Compared to:


foreach (i; 0 .. list.length)
{
    (*cast(int*)(ptr + (i * 4))) = i;
}
December 05, 2022
On Sunday, 4 December 2022 at 19:00:15 UTC, H. S. Teoh wrote:
> This is true only if you're talking about pointers in the sense of pointers in assembly language.  Languages like C and D add another layer of abstraction over this.
>
>
>> Another thing with pointers is that it doesn't have "types".
>
> This is where you went wrong.  In assembly language, yes, a pointer value is just a number, and there's no type associated with it. However, experience has shown that manipulating pointers at this raw, untyped level is extremely error-prone.  Therefore, in languages like C or D, a pointer *does* have a type.  It's a way of preventing the programmer from making silly mistakes, by associating a type (at compile-time only, of course) to the pointer value.  It's a way of keeping track that address 1234 points to a short, and not to a float, for example.  At the assembly level, of course, this type information is erased, and the pointers are just integer addresses.  However, at compile-type, this type exists to prevent, or at least warn, the programmer from treating the value at the pointed-to address as the wrong type.  This is not only because of data sizes, but the interpretation of data.  A 32-bit value interpreted as an int is completely different from a 32-bit value interpreted as a float, for example.  You wouldn't want to perform integer arithmetic on something that's supposed to be a float; the result would be garbage.
>
> In addition, although in theory memory is byte-addressable, many architectures impose alignment restrictions on values larger than a byte. For example, the CPU may require that 32-bit values (ints or floats) must be aligned to an address that's a multiple of 4 bytes.  If you add 1 to an int* address and try to access the result, it may cause performance issues (the CPU may have to load 2 32-bit values and reassemble parts of them to form the misaligned 32-bit value) or a fault (the CPU may refuse to load a non-aligned address), which could be a silent failure or may cause your program to be forcefully terminated. Therefore, typed pointers like short* and int* may not be entirely an artifact that only exists in the compiler; it may not actually be legal to add a non-aligned value to an int*, depending on the hardware you're running on.
>
> Because of this, C and D implement pointer arithmetic in terms of the underlying value type. I.e., adding 1 to a char* will add 1 to the underlying address, but adding 1 to an int* will add int.sizeof to the underlying address instead of 1. I.e.:
>
> 	int[2] x;
> 	int* p = &x[0];	// let's say this is address 1234
> 	p++;		// p is now 1238, *not* 1235 (int.sizeof == 4)
>
> As a consequence, when you cast a raw pointer value to a typed pointer, you are responsible to respect any underlying alignment requirements that the machine may have. Casting a non-aligned address like 1235 to a possibly-aligned pointer like int* may cause problems if you're not careful.  Also, the value type of the pointer *does* matter; you will get different results depending on the size of the type and any alignment requirements it may have.  Pointer arithmetic involving T* operate in units of T.sizeof, *not* in terms of the raw pointer value.
>
>
> T

Wow! Seriously, thanks a lot for this detailed explanation! I want to write a compiler and this type of explanations that not only give me the answer but explain me in detail why something happens are a gift for me! I wish I could meet you in person and buy you a coffee. Maybe one day, you never know! Thanks a lot and have an amazing day!
December 05, 2022

On Monday, 5 December 2022 at 08:21:44 UTC, bauss wrote:

>

Because it's much easier to work with.

Ex. if you have an array of 4 signed 32 bit integers that you're pointing to then you can simply just increment the pointer by 1.

If it was raw bytes then you'd have to increment the pointer by 4 to move to the next element.

This is counter-intuitive if you're moving to the next element in a loop ex.

This is how you'd do it idiomatically:

foreach (i; 0 .. list.length)
{
    (*cast(int*)(ptr + i)) = i;
}

Is this (*cast(int*)(ptr + i)) = i; or you did a mistake and wanted to write: (*cast(int*)ptr + i) = i;? Cause like we said before, the first operand must be a cast to the type for this to work right.

>

Compared to:


foreach (i; 0 .. list.length)
{
    (*cast(int*)(ptr + (i * 4))) = i;
}

Got it! I guess they could also just allow us to use bracket notation to do the same thing. So something like:

foreach (i; 0 .. list.length) {
  (cast(int*)ptr[i]) = i;
}

This is what happens with arrays anyways. And arrays ARE pointers to a contiguous memory block anyways so they could do the same with regular pointers. The example also looks more readable.

December 05, 2022

On Monday, 5 December 2022 at 15:08:41 UTC, rempas wrote:

>

Got it! I guess they could also just allow us to use bracket notation to do the same thing. So something like:

foreach (i; 0 .. list.length) {
  (cast(int*)ptr[i]) = i;
}

This is what happens with arrays anyways. And arrays ARE pointers to a contiguous memory block anyways so they could do the same with regular pointers. The example also looks more readable.

You can use bracket notation with pointers. You just need to move your closing parenthesis a bit.

Assuming that ptr is a void*, these are all equivalent:

(cast(int*) ptr)[i] = whatever;
*((cast(int*) ptr) + i) = whatever;
*(cast(int*) (ptr + i * int.sizeof)) = whatever;
« First   ‹ Prev
1 2