Why is size_t unsigned? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Why is size_t unsigned?

Thread overview

Why is size_t unsigned?
Jul 22, 2013 JS
Jul 22, 2013 Ali Çehreli
Jul 22, 2013 JS
Jul 22, 2013 Ali Çehreli
Jul 22, 2013 Andrej Mitrovic
Jul 22, 2013 JS
Jul 22, 2013 monarch_dodra
Jul 22, 2013 Ali Çehreli
Jul 22, 2013 Maxim Fomin
Jul 22, 2013 monarch_dodra
Jul 22, 2013 Maxim Fomin
Jul 22, 2013 John Colvin
Jul 22, 2013 Andrej Mitrovic
Jul 22, 2013 bearophile
Jul 23, 2013 Maxim Fomin
Jul 22, 2013 H. S. Teoh
Jul 22, 2013 JS
Jul 22, 2013 H. S. Teoh
Jul 22, 2013 monarch_dodra
Jul 22, 2013 JS
Jul 22, 2013 monarch_dodra
Jul 22, 2013 David
Jul 22, 2013 Regan Heath
Jul 22, 2013 John Colvin
Jul 22, 2013 Marco Leise

July 22, 2013

Why is size_t unsigned?

Posted by JS

JS

Doing simple stuff like

for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is empty. To make right one has to reduce performance by writing extra checks.

There seems to be no real good reason why size_t is unsigned... Surely one doesn't require too many strings larger than 2^63 bits on an x64 os...

I running into a lot of trouble because of the way D deals with implicit casting of between signed and unsigned.

please don't tell me to use foreach... isn't not a panacea.

July 22, 2013

Re: Why is size_t unsigned?

Posted by Ali Çehreli
in reply to JS

Ali Çehreli

Posted in reply to JS

On 07/21/2013 08:47 PM, JS wrote:

> Doing simple stuff like
>
> for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is
> empty. To make right one has to reduce performance by writing extra checks.

Checks are needed for program correctness. If not in source code, in compiler generated code, or the microprocessor itself. The compiler and the microprocessor would not do such things for performance reasons. It is because sometimes only the programmer knows that the check is unnecessary.

> There seems to be no real good reason why size_t is unsigned...

How about, every addressable memory locations must be countable?

> Surely one doesn't require too many strings larger than 2^63 bits on an x64
> os...

Agreed.

> I running into a lot of trouble because of the way D deals with implicit
> casting of between signed and unsigned.

D is behaving the same way as C and C++ there.

> please don't tell me to use foreach... isn't not a panacea.

I would still prefer foreach because it is more convenient and safer because of needing less code.

Ali

July 22, 2013

Re: Why is size_t unsigned?

Posted by H. S. Teoh
in reply to JS

H. S. Teoh

Posted in reply to JS

On Mon, Jul 22, 2013 at 05:47:34AM +0200, JS wrote:
> Doing simple stuff like
> 
> for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is empty. To make right one has to reduce performance by writing extra checks.

I'm not sure if it's your intention, but your code above has an off-by-1 error (unless you were planning on iterating over one less element than there are).

> There seems to be no real good reason why size_t is unsigned...
[...]

The reason is because it must span the range of CPU-addressable memory addresses. Note that due to way virtual memory works, that may have nothing to do with the actual size of your data (e.g. on Linux, it's possible to allocate more memory than you actually have, as long as you don't actually use it all -- the kernel simply maps the addresses in your page tables into a single zeroed-out page, and marks it as copy-on-write, so you can actually have an array bigger than available memory as long as most of the elements are binary zeroes (though I don't know if druntime currently actually supports such a thing)).

T

-- 
MASM = Mana Ada Sistem, Man!

July 22, 2013

Re: Why is size_t unsigned?

Posted by JS
in reply to Ali Çehreli

JS

Posted in reply to Ali Çehreli

On Monday, 22 July 2013 at 03:58:31 UTC, Ali Çehreli wrote:
> On 07/21/2013 08:47 PM, JS wrote:
>
> > Doing simple stuff like
> >
> > for(int i = 0; i < s.length - 1; i++) fails catastrophically
> if s is
> > empty. To make right one has to reduce performance by writing
> extra checks.
>
> Checks are needed for program correctness. If not in source code, in compiler generated code, or the microprocessor itself. The compiler and the microprocessor would not do such things for performance reasons. It is because sometimes only the programmer knows that the check is unnecessary.
>
> > There seems to be no real good reason why size_t is
> unsigned...
>
> How about, every addressable memory locations must be countable?

for strings themselves, I would prefer an int to be returned. The size of a string has nothing to do with it's location in memory.

>
> > Surely one doesn't require too many strings larger than 2^63
> bits on an x64
> > os...
>
> Agreed.
>
> > I running into a lot of trouble because of the way D deals
> with implicit
> > casting of between signed and unsigned.
>
> D is behaving the same way as C and C++ there.

No, surely not... Well, at least, I never had this trouble in C#.

> > please don't tell me to use foreach... isn't not a panacea.
>
> I would still prefer foreach because it is more convenient and safer because of needing less code.
>
> Ali

foreach doesn't allow you to modify the index to skip over elements.

July 22, 2013

Re: Why is size_t unsigned?

Posted by JS
in reply to H. S. Teoh

JS

Posted in reply to H. S. Teoh

On Monday, 22 July 2013 at 04:31:12 UTC, H. S. Teoh wrote:
> On Mon, Jul 22, 2013 at 05:47:34AM +0200, JS wrote:
>> Doing simple stuff like
>> 
>> for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is
>> empty. To make right one has to reduce performance by writing extra
>> checks.
>
> I'm not sure if it's your intention, but your code above has an off-by-1
> error (unless you were planning on iterating over one less element than
> there are).

yeah, I know...
>
>> There seems to be no real good reason why size_t is unsigned...
> [...]
>
> The reason is because it must span the range of CPU-addressable memory
> addresses. Note that due to way virtual memory works, that may have
> nothing to do with the actual size of your data (e.g. on Linux, it's
> possible to allocate more memory than you actually have, as long as you
> don't actually use it all -- the kernel simply maps the addresses in
> your page tables into a single zeroed-out page, and marks it as
> copy-on-write, so you can actually have an array bigger than available
> memory as long as most of the elements are binary zeroes (though I don't
> know if druntime currently actually supports such a thing)).
>
>
> T

but a size has nothing to do with an address. Sure in x86 we may need to allocate 3GB of data and this would require size_t > 2^31 ==> it must be unsigned. But strings really don't need to have an unsigned length. If you really need a string of length > size_t/2 then have the string type implement a different length property.

string s;

s.length <== a signed size_t
s.size <= an unsigned size_t

this way, for 99.99999999% of the cases where strings are actually < 1/2 size_t, one doesn't have to waste cycles doing extra comparing or typing extra code... or better, spending hours looking for some obscure bug because one compared an int to a uint and no warning was thrown.

Alternatively,

for(int i = 0; i < s.length - 1; i++) could at lease check for underflow on the cmp and break the loop.

July 22, 2013

Re: Why is size_t unsigned?

Posted by Ali Çehreli
in reply to JS

Ali Çehreli

Posted in reply to JS

On 07/21/2013 09:36 PM, JS wrote:

> On Monday, 22 July 2013 at 03:58:31 UTC, Ali Çehreli wrote:

>> > There seems to be no real good reason why size_t is
>> unsigned...
>>
>> How about, every addressable memory locations must be countable?
>
> for strings themselves, I would prefer an int to be returned. The size
> of a string has nothing to do with it's location in memory.

So, you agree with the answer to the question in the subject line but you want to change the topic to strings. Fair enough...

>> D is behaving the same way as C and C++ there.
>
> No, surely not... Well, at least, I never had this trouble in C#.

C# is a completely different language from C and C++.

>> > please don't tell me to use foreach... isn't not a panacea.
>>
>> I would still prefer foreach because it is more convenient and safer
>> because of needing less code.
>>
>> Ali
>
> foreach doesn't allow you to modify the index to skip over elements.

I did not claim otherwise. I said "more convenient", which is indisputable; and I said "safer", which your original code has become an example of.

Ali

July 22, 2013

Re: Why is size_t unsigned?

Posted by H. S. Teoh
in reply to JS

H. S. Teoh

Posted in reply to JS

On Mon, Jul 22, 2013 at 06:43:47AM +0200, JS wrote:
> On Monday, 22 July 2013 at 04:31:12 UTC, H. S. Teoh wrote:
> >On Mon, Jul 22, 2013 at 05:47:34AM +0200, JS wrote:
[...]
> >>There seems to be no real good reason why size_t is unsigned...
> >[...]
> >
> >The reason is because it must span the range of CPU-addressable memory addresses. Note that due to way virtual memory works, that may have nothing to do with the actual size of your data (e.g. on Linux, it's possible to allocate more memory than you actually have, as long as you don't actually use it all -- the kernel simply maps the addresses in your page tables into a single zeroed-out page, and marks it as copy-on-write, so you can actually have an array bigger than available memory as long as most of the elements are binary zeroes (though I don't know if druntime currently actually supports such a thing)).
> >
> >
> >T
> 
> but a size has nothing to do with an address.

Size is the absolute difference between two addresses.  So it must be able to represent up to diff(0, maxAddress).

Besides, the whole thing about size being unsigned is because negative size makes no sense.

Basically, you have to know that size_t is unsigned, and so you should be aware of the pitfalls of underflow.

> Sure in x86 we may need to allocate 3GB of data and this would require size_t > 2^31 ==> it must be unsigned. But strings really don't need to have an unsigned length. If you really need a string of length > size_t/2 then have the string type implement a different length property.

It would add too much complication to have some types use unsigned size and others use signed size.

[...]
> this way, for 99.99999999% of the cases where strings are actually < 1/2 size_t, one doesn't have to waste cycles doing extra comparing or typing extra code... or better, spending hours looking for some obscure bug because one compared an int to a uint and no warning was thrown.

The real issue here is not whether size_t is signed or unsigned, but the implicit conversion between them.  This, arguably, is a flaw in the language design.  Bearophile has been clamoring for a long time about not allowing implicit signed/unsigned conversion. If you search in bugzilla you should find the issues he filed for this. :)

Once implicit conversion between signed/unsigned is removed, the root problem disappears -- mistakes like (i < array.length-1) where i is an int will cause a compile error (comparing signed with unsigned). In the cases where you actually want wraparound behaviour, an explicit cast will be required, which is self-documenting and makes the programmer aware of the potential pitfalls.

> Alternatively,
> 
> for(int i = 0; i < s.length - 1; i++) could at lease check for underflow on the cmp and break the loop.

If you're bent on subtracting array lengths, do this:

	assert(s.length <= int.max);
	int len = cast(int)s.length;
	for (int i=0; i < len-1; i++) {
		...
	}

The optimizer should be able to reduce len to whatever it does when you write s.length inside the loop condition. The cast incurs no runtime penalty, because 2's complement representation for signed/unsigned numbers are identical when the numbers concerned are positive.

This way, you make the intent of the code clear, and force it to fail if your assumptions didn't hold. Self-documenting code is always a good thing.

T

-- 
Век живи - век учись. А дураком помрёшь.

July 22, 2013

Re: Why is size_t unsigned?

Posted by monarch_dodra
in reply to JS

monarch_dodra

Posted in reply to JS

On Monday, 22 July 2013 at 03:47:36 UTC, JS wrote:
> Doing simple stuff like
>
> for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is empty. To make right one has to reduce performance by writing extra checks.

Not really, you could instead just write your loop correctly.
1. Don't loop on int, you are handling a size_t.
2. Avoid substractions when handling unsigned.

for(size_t i = 0; i + 1 < s.length; i++)

Problem solved?

July 22, 2013

Re: Why is size_t unsigned?

Posted by JS
in reply to monarch_dodra

JS

Posted in reply to monarch_dodra

On Monday, 22 July 2013 at 07:12:07 UTC, monarch_dodra wrote:
> On Monday, 22 July 2013 at 03:47:36 UTC, JS wrote:
>> Doing simple stuff like
>>
>> for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is empty. To make right one has to reduce performance by writing extra checks.
>
> Not really, you could instead just write your loop correctly.
> 1. Don't loop on int, you are handling a size_t.
> 2. Avoid substractions when handling unsigned.
>
> for(size_t i = 0; i + 1 < s.length; i++)
>
> Problem solved?

Oh sure... problem solved... rriiiighhhtt.....

how about s[i - 1..n]?

You going to go throw some ifs around the statement that uses that? Use a ternary if? So I'm forced to use a longer more verbose method, and also introduce bugs, because the most obvious, simplest, and logical solution, s[max(0, i-1)..n] won't work.

July 22, 2013

Re: Why is size_t unsigned?

Posted by monarch_dodra
in reply to JS

monarch_dodra

Posted in reply to JS

On Monday, 22 July 2013 at 09:34:35 UTC, JS wrote:
> On Monday, 22 July 2013 at 07:12:07 UTC, monarch_dodra wrote:
>> On Monday, 22 July 2013 at 03:47:36 UTC, JS wrote:
>>> Doing simple stuff like
>>>
>>> for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is empty. To make right one has to reduce performance by writing extra checks.
>>
>> Not really, you could instead just write your loop correctly.
>> 1. Don't loop on int, you are handling a size_t.
>> 2. Avoid substractions when handling unsigned.
>>
>> for(size_t i = 0; i + 1 < s.length; i++)
>>
>> Problem solved?
>
> Oh sure... problem solved... rriiiighhhtt.....
>
> how about s[i - 1..n]?
>
> You going to go throw some ifs around the statement that uses that? Use a ternary if? So I'm forced to use a longer more verbose method, and also introduce bugs, because the most obvious, simplest, and logical solution, s[max(0, i-1)..n] won't work.

What about "s[i - 1..n]"? I don't see how having your "i" be signed save your ass in any shape, way or form. What is your point?

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation