February 19, 2012
On 19/02/2012 17:04, Manu wrote:
<snip>
>     Indeed, how does code rely on the concept of native int at all?
>
> You can't imagine any situation where you might want to know how big an int is?
<snip>

Hang on ... are we talking here about some "native int", or the int type?

If you want to know the size of a "native int", I think you would need to know more about how the particular machine works internally in order to take advantage of it.

If you want to know the size of an int, you would just use int.sizeof.  Problem solved.

Stewart.
February 19, 2012
On 19 February 2012 19:26, Stewart Gordon <smjg_1998@yahoo.com> wrote:

> On 19/02/2012 17:04, Manu wrote:
> <snip>
>
>>    Indeed, how does code rely on the concept of native int at all?
>>
>> You can't imagine any situation where you might want to know how big an int is?
>>
> <snip>
>
> Hang on ... are we talking here about some "native int", or the int type?
>
> If you want to know the size of a "native int", I think you would need to know more about how the particular machine works internally in order to take advantage of it.


In many cases you wouldn't need to know anything other than that.

I don't really understand your resistance? I'm going to have the type when
I need it, the question is, will it be standardised, or will I (& everyone
else) invent the name and introduce a block of (probably not very portable)
version() bullshit at the top of their module?
Almost nobody here puts any thought towards portability, or any
architecture other than x86+linux. I don't trust individuals writing their
own version mess to define it correctly in their libraries... or to do it
at all for that matter, they'll just be writing less portable code.

Try this: Build phobos for x64, but define size_t as 32bits, and see what happens. You'll see what I'm talking about.


February 19, 2012
On 02/19/2012 03:59 PM, Manu wrote:
> Okay, so it came up a couple of times, but the questions is, what are we
> going to do about it?
>
> size_t and ptrdiff_t are incomplete, and represent non-complimentary
> signed/unsigned halves of the requirement.
> There are TWO types needed, register size, and pointer size. Currently,
> these are assumed to be the same, which is a false assumption.
>
> I propose size_t + ssize_t should both exist, and represent the native
> integer size. Also something like ptr_t, and ptrdiff_t should also
> exist, and represent the size of the pointer.
>
> Personally, I don't like the _t notation at all. It doesn't fit the rest
> of the D types, but it's established, so I don't expect it can change.
> But we do need the 2 missing types.
>
> There is also the problem that there is lots of code written using the
> incorrect types. Some time needs to be taken to correct phobos too I guess.

Currently, size_t is defined to be what you call ptr_t, ptrdiff_t is present, and what you call size_t/ssize_t does not exist. Under which circumstances is it important to have a distinct type that denotes the register size? What kind of code requires such a type? It is unportable.
February 19, 2012
On 19 February 2012 20:07, Timon Gehr <timon.gehr@gmx.ch> wrote:

> On 02/19/2012 03:59 PM, Manu wrote:
>
>> Okay, so it came up a couple of times, but the questions is, what are we going to do about it?
>>
>> size_t and ptrdiff_t are incomplete, and represent non-complimentary
>> signed/unsigned halves of the requirement.
>> There are TWO types needed, register size, and pointer size. Currently,
>> these are assumed to be the same, which is a false assumption.
>>
>> I propose size_t + ssize_t should both exist, and represent the native integer size. Also something like ptr_t, and ptrdiff_t should also exist, and represent the size of the pointer.
>>
>> Personally, I don't like the _t notation at all. It doesn't fit the rest of the D types, but it's established, so I don't expect it can change. But we do need the 2 missing types.
>>
>> There is also the problem that there is lots of code written using the incorrect types. Some time needs to be taken to correct phobos too I guess.
>>
>
> Currently, size_t is defined to be what you call ptr_t, ptrdiff_t is present, and what you call size_t/ssize_t does not exist. Under which circumstances is it important to have a distinct type that denotes the register size? What kind of code requires such a type? It is unportable.
>

It is just as unportable as size_t its self. The reason you need it is to
improve portability, otherwise people need to create arbitrary version
mess, which will inevitably be incorrect.
Anything from calling convention code, structure layout/packing, copying
memory, basically optimising for 64bits at all... I can imagine static
branches on the width of that type to select different paths.
Even just basic efficiency, using 32bit ints on many 64bit machines require
extra sign-extend opcodes after every single load... total waste of cpu
time.

Currently, if you're running a 64bit system with 32bit pointers, there is absolutely nothing that exists at compile time to tell you you're running a 64bit system, or to declare a variable of the machines native type, which you're crazy if you say is not important information. What's the point of a 64bit machine, if you treat it exactly like a 32bit machine in every aspect?


February 19, 2012
On 2/19/2012 8:23 AM, Manu wrote:
> On 19 February 2012 18:03, Vladimir Panteleev <vladimir@thecybershadow.net
> <mailto:vladimir@thecybershadow.net>> wrote:
>
>     On Sunday, 19 February 2012 at 15:26:27 UTC, Manu wrote:
>
>         There is code that assumes size_t is the width of the pointer
>
>
>     When is this not true? I can only think of 16-bit far pointers.
>
>
> Ignoring small embedded systems (for which it is almost always true), some that
> immediately come to mind:
>
>   NaCl (Google Native Client) - x64 arch, 32bit pointers ... <- of immediate
> concern to me
>   PPC based consoles; PS3, X360, Wii, WiiU (not released yet) - 64bit, all 32bit
> pointers
>   Android, and probably iOS; 64bit ARM chips - will certainly not fork the OS to
> use 64bit pointers

On these it appears that size_t will be (and should be) the pointer width.

> word/pointer width mismatch does happen, even if you try to argue it's uncommon,
> the language MUST be able to express these architectures. It's not an optional
> fix. Just need to name them properly, and correct existing code.

What I think you're arguing for is a "most efficient" int size, which probably would be core.stdc.config.c_int, c_uint.
February 19, 2012
On 02/19/2012 07:27 PM, Manu wrote:
> On 19 February 2012 20:07, Timon Gehr <timon.gehr@gmx.ch
> <mailto:timon.gehr@gmx.ch>> wrote:
>
>     On 02/19/2012 03:59 PM, Manu wrote:
>
>         Okay, so it came up a couple of times, but the questions is,
>         what are we
>         going to do about it?
>
>         size_t and ptrdiff_t are incomplete, and represent non-complimentary
>         signed/unsigned halves of the requirement.
>         There are TWO types needed, register size, and pointer size.
>         Currently,
>         these are assumed to be the same, which is a false assumption.
>
>         I propose size_t + ssize_t should both exist, and represent the
>         native
>         integer size. Also something like ptr_t, and ptrdiff_t should also
>         exist, and represent the size of the pointer.
>
>         Personally, I don't like the _t notation at all. It doesn't fit
>         the rest
>         of the D types, but it's established, so I don't expect it can
>         change.
>         But we do need the 2 missing types.
>
>         There is also the problem that there is lots of code written
>         using the
>         incorrect types. Some time needs to be taken to correct phobos
>         too I guess.
>
>
>     Currently, size_t is defined to be what you call ptr_t, ptrdiff_t is
>     present, and what you call size_t/ssize_t does not exist. Under
>     which circumstances is it important to have a distinct type that
>     denotes the register size? What kind of code requires such a type?
>     It is unportable.
>
>

Note that I agree that getting the terminology straight would be an overall improvement.

> It is just as unportable as size_t its self.

Currently, size_t is typeof(array.length). This is portable, and is basically the only place size_t commonly occurs in D code.

> The reason you need it is to improve portability, otherwise people need to create arbitrary
> version mess, which will inevitably be incorrect.
> Anything from calling convention code, structure layout/packing, copying
> memory, basically optimising for 64bits at all... I can imagine static
> branches on the width of that type to select different paths.

That is not a very valid use case. In every static branch you'll know exactly what the width is.

> Even just basic efficiency, using 32bit ints on many 64bit machines
> require extra sign-extend opcodes after every single load... total waste
> of cpu time.
>

Using 64bit ints everywhere to represent 32bit ints won't make your program go faster. Cache lines fill up faster when the data contains large amounts of unnecessary padding. Furthermore, the compiler should be able to eliminate unneeded sign-extend operations. Anyway, extra sign-extend opcodes are not worth caring about if you get up to twice the number of conflict cache misses.

> Currently, if you're running a 64bit system with 32bit pointers, there
> is absolutely nothing that exists at compile time to tell you you're
> running a 64bit system,

Isn't there some version identifier for this? If there is not, such an identifier could be introduced trivially and this must be done.

> or to declare a variable of the machines native
> type, which you're crazy if you say is not important information.

What do you do with the machine's native type other than checking its size in a static if declaration? If you don't, then the code is unportable, and using the proper fixed size types would make it portable. If you do, then you could have checked a built-in version instead. What you effectively want for optimization is the most efficient type that is at least a certain number of bits wide. And even then, it is a moot point, because storing such variables in memory will add unnecessary padding to your data structures.

> What's the point of a 64bit machine, if you treat it exactly like a 32bit
> machine in every aspect?

There is none.
February 19, 2012
On 02/19/12 17:23, Manu wrote:
> On 19 February 2012 18:03, Vladimir Panteleev <vladimir@thecybershadow.net <mailto:vladimir@thecybershadow.net>> wrote:
> 
>     On Sunday, 19 February 2012 at 15:26:27 UTC, Manu wrote:
> 
>         There is code that assumes size_t is the width of the pointer
> 
> 
>     When is this not true? I can only think of 16-bit far pointers.
> 
> 
> Ignoring small embedded systems (for which it is almost always true), some that immediately come to mind:
> 
>  NaCl (Google Native Client) - x64 arch, 32bit pointers ... <- of immediate concern to me
>  PPC based consoles; PS3, X360, Wii, WiiU (not released yet) - 64bit, all 32bit pointers
>  Android, and probably iOS; 64bit ARM chips - will certainly not fork the OS to use 64bit pointers

not to mention linux - x32 (https://sites.google.com/site/x32abi/)

But 'size_t' is the size of an object -- so sizeof(size_t)==sizeof(void*) is
a pretty safe assumption. It would be a bit hard to work with objects that are
larger than the address space covered by the pointer... Is any of the above
platforms using segmentation tricks and is sizeof(char*-char*)>sizeof(char*)?
I think you mean "native_int" - something that D is missing.

On 02/19/12 19:07, Timon Gehr wrote:
> Under which circumstances is it important to have a distinct type that denotes the register size? What kind of code requires such a type? It is unportable.

eg any time you don't want to artificially restrict the size to less than the
native one, not use a type wider than the hw efficiently handles or need C
compatibility.
Yes, you can use 'static if' and 'version' tricks, but that's inconvenient and
often obfuscates the code, so you end up reinventing c_int/c_long...
And that's not ideal either; having the right types [1] always predefined would
be much better.

On 02/19/12 18:26, Stewart Gordon wrote:
> If you want to know the size of an int, you would just use int.sizeof.  Problem solved.

Exactly. Except doing this for D's int would be kind of pointless, wouldn't it?... With a native_int type you *can* write generic code and switch on native_int.sizeof.

artur

[1] ie a signed/unsigned int that is large as the CPU registers allow. [2]
[2] and note that using anything smaller can result in performance degradation,
    if the values need to be converted to a full-width format.
February 19, 2012
On 19 February 2012 21:21, Timon Gehr <timon.gehr@gmx.ch> wrote:

>
>> It is just as unportable as size_t its self.
>>
>
> Currently, size_t is typeof(array.length). This is portable, and is basically the only place size_t commonly occurs in D code.


What about pointer arithmetic? Interaction with C/C++ code? Writing OS
level code? Hitting the hardware?
And how do you define 'portable' in this context? What makes size_t more
portable than a native int? A data structure containing a size_t is not
'portable' in the direct sense...


> The reason you need it is to improve portability, otherwise people need to
>> create arbitrary
>> version mess, which will inevitably be incorrect.
>> Anything from calling convention code, structure layout/packing, copying
>> memory, basically optimising for 64bits at all... I can imagine static
>> branches on the width of that type to select different paths.
>>
>
> That is not a very valid use case. In every static branch you'll know exactly what the width is.


That's the point.
Branches can each implement an efficient path for the different cases.


> Even just basic efficiency, using 32bit ints on many 64bit machines
>> require extra sign-extend opcodes after every single load... total waste of cpu time.
>>
>>
> Using 64bit ints everywhere to represent 32bit ints won't make your program go faster. Cache lines fill up faster when the data contains large amounts of unnecessary padding. Furthermore, the compiler should be able to eliminate unneeded sign-extend operations. Anyway, extra sign-extend opcodes are not worth caring about if you get up to twice the number of conflict cache misses.


I'm talking about the stack, passing args etc. Data structures should obviously be as tight as possible.


> Currently, if you're running a 64bit system with 32bit pointers, there
>> is absolutely nothing that exists at compile time to tell you you're running a 64bit system,
>>
>
> Isn't there some version identifier for this? If there is not, such an identifier could be introduced trivially and this must be done.


Why introduce a version identifier, when a type would be so much more useful, and also neater? (usable directly rather than ugly version blocks)


> or to declare a variable of the machines native
>> type, which you're crazy if you say is not important information.
>>
>
> What do you do with the machine's native type other than checking its size in a static if declaration? If you don't, then the code is unportable, and using the proper fixed size types would make it portable. If you do, then you could have checked a built-in version instead. What you effectively want for optimization is the most efficient type that is at least a certain number of bits wide. And even then, it is a moot point, because storing such variables in memory will add unnecessary padding to your data structures.


If that's all you do with it, then it's already proven its worth. There's a
major added bonus that you could USE it...
I don't like this argument that it's not portable, it's exactly as portable
as size_t is already, and there's no call to remove that.


> What's the point of a 64bit machine, if you treat it exactly like a 32bit
>> machine in every aspect?
>>
>
> There is none.
>

Then why do so many hardware vendors feel the need to create 64bit chips
which are used in 32bit memspace platforms?
It's useful to have double width registers. Some algorithms are easier with
wider registers, you can move more data faster, it extends your range for
intermediate values during calculations, etc. These are still real
advantages, even on a 32bit memspace platform.


February 19, 2012
On 19/02/2012 17:51, Manu wrote:
<snip>
>     Hang on ... are we talking here about some "native int", or the int type?
<snip>
> I don't really understand your resistance? I'm going to have the type when I need it, the
> question is, will it be standardised, or will I (& everyone else) invent the name and
> introduce a block of (probably not very portable) version() bullshit at the top of their
> module?

I'm not saying we shouldn't have native integer types in D.  Maybe what's needed is to be clearer on what's meant exactly by a native integer type.  And then add the types to Phobos/druntime.

I particularly don't understand your suggestion of the names "size_t" and "ssize_t" for these types.  They don't seem to me to denote the size of anything.  I guess "nativeInt" and "nativeUint", defined in whatever module would be suitable, would be one possibility.  Maybe someone has a better idea.

But knowing what code in Phobos/druntime should be using native integer types might help put the problem in better perspective.

Stewart.
February 19, 2012
Manu <turkeyman@gmail.com> wrote:
> I propose size_t + ssize_t should both exist, and represent the native integer size.

sizediff_t (currently just aliased to ptrdiff_t but the link could be
broken).

> Also something like ptr_t, and ptrdiff_t should also exist, and represent the size of the pointer.
> 

core.stdc.stdint.uintptr_t