February 16, 2011
spir wrote:
> On 02/16/2011 03:07 AM, Jonathan M Davis wrote:
>> On Tuesday, February 15, 2011 15:13:33 spir wrote:
>>> On 02/15/2011 11:24 PM, Jonathan M Davis wrote:
>>>> Is there some low level reason why size_t should be signed or something
>>>> I'm completely missing?
>>>
>>> My personal issue with unsigned ints in general as implemented in C-like
>>> languages is that the range of non-negative signed integers is half of the
>>> range of corresponding unsigned integers (for same size).
>>> * practically: known issues, and bugs if not checked by the language
>>> * conceptually: contradicts the "obvious" idea that unsigned (aka naturals)
>>> is a subset of signed (aka integers)
>>
>> It's inevitable in any systems language. What are you going to do, throw away a
>> bit for unsigned integers? That's not acceptable for a systems language. On some
>> level, you must live with the fact that you're running code on a specific machine
>> with a specific set of constraints. Trying to do otherwise will pretty much
>> always harm efficiency. True, there are common bugs that might be better
>> prevented, but part of it ultimately comes down to the programmer having some
>> clue as to what they're doing. On some level, we want to prevent common bugs,
>> but the programmer can't have their hand held all the time either.
> 
> I cannot prove it, but I really think you're wrong on that.
> 
> First, the question of 1 bit. Think at this -- speaking of 64 bit size:
> * 99.999% of all uses of unsigned fit under 2^63
> * To benefit from the last bit, you must have the need to store a value 2^63 <= v < 2^64
> * Not only this, you must step on a case where /any/ possible value for v (depending on execution data) could be >= 2^63, but /all/ possible values for v are guaranteed < 2^64
> This can only be a very small fraction of cases where your value does not fit in 63 bits, don't you think. Has it ever happened to you (even in 32 bits)? Something like: "what a luck! this value would not (always) fit in 31 bits, but (due to this constraint), I can be sure it will fit in 32 bits (always, whatever input data it depends on).
> In fact, n bits do the job because (1) nearly all unsigned values are very small (2) the size used at a time covers the memory range at the same time.
> 
> Upon efficiency, if unsigned is not a subset of signed, then at a low level you may be forced to add checks in numerous utility routines, the kind constantly used, everywhere one type may play with the other. I'm not sure where the gain is.
> Upon correctness, intuitively I guess (just a wild guess indeed) if unigned values form a subset of signed ones programmers will more easily reason correctly about them.
> 
> Now, I perfectly understand the "sacrifice" of one bit sounds like a sacrilege ;-)
> (*)
> 
> Denis
> 

> (*) But you know, when as a young guy you have coded for 8 & 16-bit machines, having 63 or 64...

Exactly. It is NOT the same as the 8 & 16 bit case. The thing is, the fraction of cases where the MSB is important has been decreasing *exponentially* from the 8-bit days. It really was necessary to use the entire address space (or even more, in the case of segmented architecture on the 286![1]) to measure the size of anything. D only supports 32 bit and higher, so it isn't hamstrung in the way that C is.

Yes, there are still cases where you need every bit. But they are very, very exceptional -- rare enough that I think the type could be called __uint, __ulong.

[1] What was size_t on the 286 ?
Note that in the small memory model (all pointers 16 bits) it really was possible to have an object of size 0xFFFF_FFFF, because the code was in a different address space.
February 16, 2011
On Feb 16, 11 11:49, Michel Fortin wrote:
> On 2011-02-15 22:41:32 -0500, "Nick Sabalausky" <a@a.a> said:
>
>> I like "nint".
>
> But is it unsigned or signed? Do we need 'unint' too?
>
> I think 'word' & 'uword' would be a better choice. I can't say I'm too
> displeased with 'size_t', but it's true that the 'size_t' feels out of
> place in D code because of its name.
>
>

'word' may be confusing to Windows programmers because in WinAPI a 'WORD' means an unsigned 16-bit integer (aka 'ushort').

http://msdn.microsoft.com/en-us/library/cc230402(v=PROT.10).aspx
February 16, 2011
On Tue, 15 Feb 2011 18:18:22 -0500, Rainer Schuetze <r.sagitario@gmx.de> wrote:

>
> Steven Schveighoffer wrote:
>>  In addition size_t isn't actually defined by the compiler.  So the library controls the size of size_t, not the compiler.  This should make it extremely portable.
>>
>
> I do not consider the language and the runtime as completely seperate when it comes to writing code.

You are right, in some cases the runtime just extends the compiler features.  However, I believe the runtime is meant to be used in multiple compilers.  I would expect object.di to remain the same.  Probably core too.  This should be easily checkable with the newer gdc, which I believe uses a port of druntime.

> BTW, though defined in object.di, size_t is tied to some compiler internals:
>
> 	alias typeof(int.sizeof) size_t;
>
> and the compiler will make assumptions about this when creating array literals.

This is true.  This makes it depend on the compiler.  However, I believe the spec is concrete about what the sizeof type should be (if not, it should be made concrete).

>>> I don't have a perfect solution, but maybe builtin arrays could be limited to 2^^32-1 elements (or maybe 2^^31-1 to get rid of endless signed/unsigned conversions), so the normal type to be used is still "int". Ranges should adopt the type sizes of the underlying objects.
>>  No, this is too limiting.  If I have 64GB of memory (not out of the question), and I want to have a 5GB array, I think I should be allowed to.  This is one of the main reasons to go to 64-bit in the first place.
>
> Yes, that's the imperfect part of the proposal. An array of ints could still use up to 16 GB, though.

Unless you cast it to void[].  What would exactly happen there, a runtime error?  Which means a runtime check for an implicit cast? I don't think it's really an option to make array length always be uint (or int).

I wouldn't have a problem with using signed words for length.  using more than 2GB for one array in 32-bit land would be so rare that having to jump through special hoops would be fine by me.  Obviously for now, 2^63-1 sized arrays is plenty room for todays machines in 64-bit land.

> What bothers me is that you have to deal with these "portability issues" from the very moment you store the length of an array elsewhere. Not a really big deal, and I don't think it will change, but still feels a bit awkward.

Java defines everything to be the same regardless of architecture, and the result is you just can't do certain things (like have a 5GB array).  A system-level language should support the full range of architecture capabilities, so you necessarily have to deal with portability issues.

If you want a super-portable language that runs the same everywhere, use an interpreted/bytecode language like Java, .Net or Python.  D is for getting close to the metal.

I see size_t as a way to *mostly* make things portable.  It is not perfect, and really cannot be.  It's necessary to expose the architecture so you can adapt to it, there's no getting around taht.

Really, it's rare that you have to use it anyways, most should use auto.

-Steve
February 16, 2011
On Tue, 15 Feb 2011 16:50:21 -0500, Nick Sabalausky <a@a.a> wrote:

> "Nick Sabalausky" <a@a.a> wrote in message
> news:ijesem$brd$1@digitalmars.com...
>> "Steven Schveighoffer" <schveiguy@yahoo.com> wrote in message
>> news:op.vqx78nkceav7ka@steve-laptop...
>>>
>>> size_t works,  it has a precedent, it's already *there*, just use it, or
>>> alias it if you  don't like it.
>>>
>>
>> One could make much the same argument about the whole of C++. It works, it
>> has a precedent, it's already *there*, just use it.
>>
>
> The whole reason I came to D was because, at the time, D was more interested
> in fixing C++'s idiocy than just merely aping C++ as the theme seems to be
> now.

Nick, this isn't a feature, it's not a design, it's not a whole language, it's a *single name*, one which is easily changed if you want to change it.

module nick;

alias size_t wordsize;

Now you can use it anywhere, it's sooo freaking simple, I don't understand the outrage.

BTW, what I meant about it's already there is that any change to the size_t name would have to have some benefit besides "it's a different name" because it will break any code that currently uses it.  If this whole argument is to just add another alias, then I'll just stop reading this thread since it has no point.

-Steve
February 16, 2011
On 02/16/2011 12:21 PM, Don wrote:
> spir wrote:
>> On 02/16/2011 03:07 AM, Jonathan M Davis wrote:
>>> On Tuesday, February 15, 2011 15:13:33 spir wrote:
>>>> On 02/15/2011 11:24 PM, Jonathan M Davis wrote:
>>>>> Is there some low level reason why size_t should be signed or something
>>>>> I'm completely missing?
>>>>
>>>> My personal issue with unsigned ints in general as implemented in C-like
>>>> languages is that the range of non-negative signed integers is half of the
>>>> range of corresponding unsigned integers (for same size).
>>>> * practically: known issues, and bugs if not checked by the language
>>>> * conceptually: contradicts the "obvious" idea that unsigned (aka naturals)
>>>> is a subset of signed (aka integers)
>>>
>>> It's inevitable in any systems language. What are you going to do, throw away a
>>> bit for unsigned integers? That's not acceptable for a systems language. On
>>> some
>>> level, you must live with the fact that you're running code on a specific
>>> machine
>>> with a specific set of constraints. Trying to do otherwise will pretty much
>>> always harm efficiency. True, there are common bugs that might be better
>>> prevented, but part of it ultimately comes down to the programmer having some
>>> clue as to what they're doing. On some level, we want to prevent common bugs,
>>> but the programmer can't have their hand held all the time either.
>>
>> I cannot prove it, but I really think you're wrong on that.
>>
>> First, the question of 1 bit. Think at this -- speaking of 64 bit size:
>> * 99.999% of all uses of unsigned fit under 2^63
>> * To benefit from the last bit, you must have the need to store a value 2^63
>> <= v < 2^64
>> * Not only this, you must step on a case where /any/ possible value for v
>> (depending on execution data) could be >= 2^63, but /all/ possible values for
>> v are guaranteed < 2^64
>> This can only be a very small fraction of cases where your value does not fit
>> in 63 bits, don't you think. Has it ever happened to you (even in 32 bits)?
>> Something like: "what a luck! this value would not (always) fit in 31 bits,
>> but (due to this constraint), I can be sure it will fit in 32 bits (always,
>> whatever input data it depends on).
>> In fact, n bits do the job because (1) nearly all unsigned values are very
>> small (2) the size used at a time covers the memory range at the same time.
>>
>> Upon efficiency, if unsigned is not a subset of signed, then at a low level
>> you may be forced to add checks in numerous utility routines, the kind
>> constantly used, everywhere one type may play with the other. I'm not sure
>> where the gain is.
>> Upon correctness, intuitively I guess (just a wild guess indeed) if unigned
>> values form a subset of signed ones programmers will more easily reason
>> correctly about them.
>>
>> Now, I perfectly understand the "sacrifice" of one bit sounds like a
>> sacrilege ;-)
>> (*)
>>
>> Denis
>>
>
>> (*) But you know, when as a young guy you have coded for 8 & 16-bit machines,
>> having 63 or 64...
>
> Exactly. It is NOT the same as the 8 & 16 bit case. The thing is, the fraction
> of cases where the MSB is important has been decreasing *exponentially* from
> the 8-bit days. It really was necessary to use the entire address space (or
> even more, in the case of segmented architecture on the 286![1]) to measure the
> size of anything. D only supports 32 bit and higher, so it isn't hamstrung in
> the way that C is.
>
> Yes, there are still cases where you need every bit. But they are very, very
> exceptional -- rare enough that I think the type could be called __uint, __ulong.

Add this: in the case where one needs exactly all 64 bits, then the proper type to use is exactly ulong.

> [1] What was size_t on the 286 ?
> Note that in the small memory model (all pointers 16 bits) it really was
> possible to have an object of size 0xFFFF_FFFF, because the code was in a
> different address space.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

February 16, 2011
On 2/16/11 9:09 AM, Steven Schveighoffer wrote:
> On Tue, 15 Feb 2011 16:50:21 -0500, Nick Sabalausky <a@a.a> wrote:
>
>> "Nick Sabalausky" <a@a.a> wrote in message

> module nick;
>
> alias size_t wordsize;
>
> Now you can use it anywhere, it's sooo freaking simple, I don't
> understand the outrage.

But that is somewhat selfish. Given size_t causes dissatisfaction with a lot of people, people will start create their won aliases and then you end up having 5 different versions of it around. If this type is an important one for writing architecture independent code that can take advantage of architectural limits, then we better don't have 5 different names for it in common code.

I don't think changing stuff like this should be distruptive. size_t can be marked deprecated and could be removed in a future release, giving people enough time to adapt.

Furthermore, with the 64-bit support in dmd approaching, this is the time to do it, if ever.

February 16, 2011
On Wed, 16 Feb 2011 09:23:09 -0500, gölgeliyele <usuldan@gmail.com> wrote:

> On 2/16/11 9:09 AM, Steven Schveighoffer wrote:
>> On Tue, 15 Feb 2011 16:50:21 -0500, Nick Sabalausky <a@a.a> wrote:
>>
>>> "Nick Sabalausky" <a@a.a> wrote in message
>
>> module nick;
>>
>> alias size_t wordsize;
>>
>> Now you can use it anywhere, it's sooo freaking simple, I don't
>> understand the outrage.
>
> But that is somewhat selfish. Given size_t causes dissatisfaction with a lot of people, people will start create their won aliases and then you end up having 5 different versions of it around. If this type is an important one for writing architecture independent code that can take advantage of architectural limits, then we better don't have 5 different names for it in common code.

Sir, you've heard from the men who don't like size_t.  But what about the silent masses who do?

So we change it.  And then people don't like what it's changed to, for example, I might like size_t or already have lots of code that uses size_t.  So I alias your new name to size_t in my code.  How does this make things better/different?

bearophile doesn't like writeln.  He uses something else in his libs, it's just an alias.  Does that mean we should change writeln?

IT'S A NAME!!! one which many are used to using/knowing.  Whatever name it is, you just learn it, and once you know it, you just use it.  If we hadn't been using it for the last 10 years, I'd say, sure, let's have a vote and decide on a name.  You can't please everyone with every name.  size_t isn't so terrible that it needs to be changed, so can we focus efforts on actual important things?  This is the sheddiest bikeshed argument I've seen in a while.

I'm done with this thread...

-Steve
February 16, 2011
On 2/16/11 9:45 AM, Steven Schveighoffer wrote:

>
> I'm done with this thread...
>
> -Steve

Ok, I don't want to drag on. But there is a reason why we have a style. size_t is against the D style and obviously does not match. I use size_t as much as Walter does in my day job, and I even like it. It just does not fit into D's type names. That is all.
February 16, 2011
On Wednesday, February 16, 2011 06:51:21 gölgeliyele wrote:
> On 2/16/11 9:45 AM, Steven Schveighoffer wrote:
> > I'm done with this thread...
> > 
> > -Steve
> 
> Ok, I don't want to drag on. But there is a reason why we have a style. size_t is against the D style and obviously does not match. I use size_t as much as Walter does in my day job, and I even like it. It just does not fit into D's type names. That is all.

If we were much earlier in the D development process, then perhaps it would make some sense to change the name. But as it is, it's going to break a lot of code for a simple name change. Lots of C, C++, and D programmers are fine with size_t. I see no reason to break a ton of code just because a few people complain about a name on the mailing list.

Not to mention, size_t isn't exactly normal anyway. Virtually every type in D has a fixed size, but size_t is different. It's an alias whose size varies depending on the architecture you're compiling on. As such, perhaps that fact that it doesn't follow the normal naming scheme is a _good_ thing.

I tend to agree with Steve on this. This is core language stuff that's been the way that it is since the beginning. Changing it is just going to break code and cause even more headaches for porting code from C or C++ to D. This definitely comes across as bikeshedding. If we were way earlier in the development process of D, then I think that there would be a much better argument. But at this point, the language spec is supposed to be essentially stable. And just because the name doesn't quite fit in with the others is _not_ a good enough reason to go and change the language spec.

- Jonathan M Davis
February 16, 2011
Am 16.02.2011 19:20, schrieb Jonathan M Davis:
> On Wednesday, February 16, 2011 06:51:21 gölgeliyele wrote:
>> On 2/16/11 9:45 AM, Steven Schveighoffer wrote:
>>> I'm done with this thread...
>>>
>>> -Steve
>>
>> Ok, I don't want to drag on. But there is a reason why we have a style. size_t is against the D style and obviously does not match. I use size_t as much as Walter does in my day job, and I even like it. It just does not fit into D's type names. That is all.
> 
> If we were much earlier in the D development process, then perhaps it would make some sense to change the name. But as it is, it's going to break a lot of code for a simple name change. Lots of C, C++, and D programmers are fine with size_t. I see no reason to break a ton of code just because a few people complain about a name on the mailing list.
> 
> Not to mention, size_t isn't exactly normal anyway. Virtually every type in D has a fixed size, but size_t is different. It's an alias whose size varies depending on the architecture you're compiling on. As such, perhaps that fact that it doesn't follow the normal naming scheme is a _good_ thing.
> 
> I tend to agree with Steve on this. This is core language stuff that's been the way that it is since the beginning. Changing it is just going to break code and cause even more headaches for porting code from C or C++ to D. This definitely comes across as bikeshedding. If we were way earlier in the development process of D, then I think that there would be a much better argument. But at this point, the language spec is supposed to be essentially stable. And just because the name doesn't quite fit in with the others is _not_ a good enough reason to go and change the language spec.
> 
> - Jonathan M Davis

Well IMHO it would be feasible to add another alias (keeping size_t), update
phobos to use the new alias and to recommend to use the new alias instead of size_t.
Or, even better, add a new *type* that behaves like size_t but prevents
non-portable use without explicit casting, use it throughout phobos and keep
size_t for compatibility reasons (and for interfacing with C).

But I really don't care much.. size_t is okay for me the way it is.
The best argument I've heard so far was from Michel Fortin, that having a more
D-ish name may encourage the use of size_t instead of uint - but hopefully
people will be more portability-aware once 64bit DMD is out anyway.

IMHO it's definitely too late (for D2) to add a better type that is signed etc, like Don proposed. Also I'm not sure how well that would work when interfacing with C.

It may make sense for the compiler to handle unsigned/signed comparisons and operations more strictly or more securely (=> implicit casting to the next bigger unsigned type before comparing or stuff like that), though.

Cheers,
- Daniel