View mode: basic / threaded / horizontal-split · Log in · Help
December 11, 2012
Re: OT (partially): about promotion of integers
On Tuesday, 11 December 2012 at 22:08:15 UTC, Walter Bright wrote:
> On 12/11/2012 10:44 AM, foobar wrote:
>> All of the above relies on the assumption that the safety 
>> problem is due to the
>> memory layout. There are many other programming languages that 
>> solve this by
>> using a different point of view - the problem lies in the 
>> implicit casts and not
>> the memory layout. In other words, the culprit is code such as:
>> uint a = -1;
>> which compiles under C's implicit coercion rules but _really 
>> shouldn't_.
>> The semantically correct way would be something like:
>> uint a = 0xFFFF_FFFF;
>> but C/C++ programmers tend to think the "-1" trick is less 
>> verbose and "better".
>
> Trick? Not at all.
>
> 1. -1 is the size of an int, which varies in C.
>
> 2. -i means "complement and then increment".
>
> 3. Would you allow 2-1? How about 1-1? (1-1)-1?
>
> Arithmetic in computers is different from the math you learned 
> in school. It's 2's complement, and it's best to always keep 
> that in mind when writing programs.

Thanks for proving my point. after all , you are a C++ developer, 
aren't you? :)
Seriously though, it _is_ a trick and a code smell.
I'm fully aware that computers used 2's complement. I'm also am 
aware of the fact that the type has an "unsigned" label all over 
it. You see it right there in that 'u' prefix of 'int'. An 
unsigned type should semantically entail _no sign_ in its 
operations. You are calling a cat a dog and arguing that dogs 
barf? Yeah, I completely agree with that notion, except, we are 
still talking about _a cat_.

To answer you question, yes, I would enforce overflow and 
underflow checking semantics. Any negative result assigned to an 
unsigned type _is_ a logic error.
you can claim that:
uint a = -1;
is perfectly safe and has a well defined meaning (well, for C 
programmers that is), but what about:
uint a = b - c;
what if that calculation results in a negative number? What 
should the compiler do? well, there are _two_ equally possible 
solutions:
a. The overflow was intended as in the mask = -1 case; or
b. The overflow is a _bug_.

The user should be made aware of this and should make the 
decision how to handle this. This should _not_ be implicitly 
handled by the compiler and allow bugs go unnoticed.

I think C# solved this _way_ better than C/D. Another data point 
would be (S)ML which is a compiled language which requires 
_explicit conversions_ and has a very strong typing system. Its 
programs are compiled to efficient native executables and the 
strong typing allows both the compiler and the programmer better 
reasoning of the code. Thus programs are more correct and can be 
optimized by the compiler. In fact, several languages are 
implemented in ML because of its higher guaranties.
December 11, 2012
Re: OT (partially): about promotion of integers
Walter Bright:

> I don't notice anyone reaching for Lisp or Ocaml for high 
> performance applications.

Nowadays CommonLisp is not used much for anything (people at ITA 
use it to plan flights, their code is efficient, algorithmically 
complex, and used for heavy loads).

OCaML on the other hand is regarded as quite fast (but it's not 
much used in general), it's sometimes used for its high 
performance united to its greater safety, so someone uses it in 
automatic high-speed trading:

https://ocaml.janestreet.com/?q=node/61
https://ocaml.janestreet.com/?q=node/82


>> I think the compiler doesn't perform on BigInts the 
>> optimizations it does on
>> ints, because it doesn't know about bigint properties.
>
> I think the general lack of interest in bigints indicate that 
> the builtin types work well enough for most work.

Where do you see this general lack of interest in bigints? In D 
or in other languages?

I use bigints often in D. In Python we use only bigints. In 
Scheme, OcaML and Lisp-like languages multi-precison numbers are 
the default ones. I think if you give programmers better bigints 
(this means efficient and usable as naturally as ints), they will 
use them.

I think currently in D there is no way to make bigints as 
efficient as ints because there is no ways to express in D the 
full semantics of integral numbers, that ints have. This is a 
language limitation. One way to solve this problem, and keep 
BigInts as Phobos code, is to introduce a built-in attribute 
that's usable to mark user-defined structs as int-like.

----------------------

deadalnix:

>But OCaml is really very performant.<

It's fast considering it's a mostly functional language.

OCaML Vs C++ in the Shootout:

http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=ocaml&lang2=gpp

Versus Haskell:
http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=ocaml&lang2=ghc

But as usual you have to take such comparisons cum grano salis, 
because there are a lot more people working on the GHC compiler 
and because the Shootout Haskell solutions are quite un-idiomatic 
(you can see it also from the Shootout site itself, taking a look 
at the length of the solutions) and they come from several years 
of maniac-level discussions (they have patched the Haskell 
compiler and its library several times to improve the results of 
those benchmarks):

http://www.haskell.org/haskellwiki/Shootout


>I don't know how it handle integer internally.<

It uses tagged integers, that are 31 or 63 bits long, the tag on 
the less significant side:

http://stackoverflow.com/questions/3773985/why-is-an-int-in-ocaml-only-31-bits

Bye,
bearophile
December 12, 2012
Re: OT (partially): about promotion of integers
foobar:

>I would enforce overflow and underflow checking semantics.<

Plus one or two switches to disable such checking, if/when 
someone wants it, to regain the C performance. (Plus some syntax 
way to disable/enable such checking in a small piece of code).

Maybe someday Walter will change his mind about this topic :-)

Bye,
bearophile
December 12, 2012
Re: OT (partially): about promotion of integers
On Wednesday, 12 December 2012 at 00:06:53 UTC, bearophile wrote:
> foobar:
>
>>I would enforce overflow and underflow checking semantics.<
>
> Plus one or two switches to disable such checking, if/when 
> someone wants it, to regain the C performance. (Plus some 
> syntax way to disable/enable such checking in a small piece of 
> code).
>
> Maybe someday Walter will change his mind about this topic :-)
>
> Bye,
> bearophile

Yeah, of course, that's why I said the C# semantics are _way_ 
better. (That's a self quote)

btw, here's the link for SML which does not use tagged ints -
http://www.standardml.org/Basis/word.html#Word8:STR:SPEC

"Instances of the signature WORD provide a type of unsigned 
integer with modular arithmetic and logical operations and 
conversion operations. They are also meant to give efficient 
access to the primitive machine word types of the underlying 
hardware, and support bit-level operations on integers. They are 
not meant to be a ``larger'' int. "
December 12, 2012
Re: OT (partially): about promotion of integers
On 12/11/2012 3:15 PM, deadalnix wrote:
>> That's irrelevant to this discussion. It is not a problem with the language.
>> Anyone can improve the library one if they desire, or do their own.
> The library is part of the language. What is a language with no vocabulary ?

I think it is useful to draw a distinction.


>>> I think the compiler doesn't perform on BigInts the optimizations it does on
>>> ints, because it doesn't know about bigint properties.
>>
>> I think the general lack of interest in bigints indicate that the builtin
>> types work well enough for most work.
>
> That argument is fallacious. Something more used don't really mean better. OR
> PHP and C++ are some of the best languages ever made.

I'm interested in crafting D to be a language that people will like and use. 
Therefore, what things make a language popular are of significant interest.

I.e. it's meaningless to create the best language evar and be the only user of it.

Now, if we have int with terrible problems, and bigint that solves those 
problems, and yet people still prefer int by a 1000:1 margin, that makes me very 
skeptical that those problems actually matter.

We need to be solving the *right* problems with D.
December 12, 2012
Re: OT (partially): about promotion of integers
On Wed, Dec 12, 2012 at 01:26:08AM +0100, foobar wrote:
> On Wednesday, 12 December 2012 at 00:06:53 UTC, bearophile wrote:
> >foobar:
> >
> >>I would enforce overflow and underflow checking semantics.<
> >
> >Plus one or two switches to disable such checking, if/when someone
> >wants it, to regain the C performance. (Plus some syntax way to
> >disable/enable such checking in a small piece of code).
> >
> >Maybe someday Walter will change his mind about this topic :-)

I don't agree that compiler switches should change language semantics.
Just because you specify a certain compiler switch, it can cause
unrelated breakage in some obscure library somewhere, that assumes
modular arithmetic with C/C++ semantics. And this breakage will in all
likelihood go *unnoticed* until your software is running on the
customer's site and then it crashes horribly. And good luck debugging
that, because the breakage can be very subtle, plus it's *not* in your
own code, but in some obscure library code that you're not familiar
with.

I think a much better approach is to introduce a new type (or new types)
that *does* have the requisite bounds checking and static analysis.
That's what a type system is for.


[...]
> Yeah, of course, that's why I said the C# semantics are _way_
> better. (That's a self quote)
> 
> btw, here's the link for SML which does not use tagged ints -
> http://www.standardml.org/Basis/word.html#Word8:STR:SPEC
> 
> "Instances of the signature WORD provide a type of unsigned integer
> with modular arithmetic and logical operations and conversion
> operations. They are also meant to give efficient access to the
> primitive machine word types of the underlying hardware, and support
> bit-level operations on integers. They are not meant to be a
> ``larger'' int. "

It's kinda too late for D to rename int to word, say, but it's not too
late to introduce a new checked int type, say 'number' or something like
that (you can probably think of a better name).

In fact, Andrei describes a CheckedInt type that uses operator
overloading, etc., to implement an in-library solution to bounds checks.
You can probably expand that into a workable lightweight int
replacement. By wrapping an int in a struct with custom operators, you
can pretty much have an int-sized type (with value semantics, just like
"native" ints, no less!) that does what you want, instead of the usual
C/C++ int semantics.


T

-- 
In a world without fences, who needs Windows and Gates? -- Christian Surchi
December 12, 2012
Re: OT (partially): about promotion of integers
On 12/11/2012 3:44 PM, foobar wrote:
> Thanks for proving my point. after all , you are a C++ developer, aren't you? :)

No, I'm an assembler programmer. I know how the machine works, and C, C++, and D 
map onto that, quite deliberately. It's one reason why D supports the vector 
types directly.


> Seriously though, it _is_ a trick and a code smell.

Not to me. There is no trick or "smell" to anyone familiar with how computers work.


> I'm fully aware that computers used 2's complement. I'm also am aware of the
> fact that the type has an "unsigned" label all over it. You see it right there
> in that 'u' prefix of 'int'. An unsigned type should semantically entail _no
> sign_ in its operations. You are calling a cat a dog and arguing that dogs barf?
> Yeah, I completely agree with that notion, except, we are still talking about _a
> cat_.

Andrei and I have endlessly talked about this (he argued your side). The 
inevitable result is that signed and unsigned types *are* conflated in D, and 
have to be, otherwise many things stop working.

For example, p[x]. What type is x?

Integer signedness in D is not really a property of the data, it is only how one 
happens to interpret the data in a specific context.


> To answer you question, yes, I would enforce overflow and underflow checking
> semantics. Any negative result assigned to an unsigned type _is_ a logic error.
> you can claim that:
> uint a = -1;
> is perfectly safe and has a well defined meaning (well, for C programmers that
> is), but what about:
> uint a = b - c;
> what if that calculation results in a negative number? What should the compiler
> do? well, there are _two_ equally possible solutions:
> a. The overflow was intended as in the mask = -1 case; or
> b. The overflow is a _bug_.
>
> The user should be made aware of this and should make the decision how to handle
> this. This should _not_ be implicitly handled by the compiler and allow bugs go
> unnoticed.
>
> I think C# solved this _way_ better than C/D.

C# has overflow checking off by default. It is enabled by either using a checked 
{ } block, or with a compiler switch. I don't see that as "solving" the issue in 
any elegant or natural way, it's more of a clumsy hack.

But also consider that C# does not allow pointer arithmetic, or array slicing. 
Both of these rely on wraparound 2's complement arithmetic.


> Another data point would be (S)ML
> which is a compiled language which requires _explicit conversions_ and has a
> very strong typing system. Its programs are compiled to efficient native
> executables and the strong typing allows both the compiler and the programmer
> better reasoning of the code. Thus programs are more correct and can be
> optimized by the compiler. In fact, several languages are implemented in ML
> because of its higher guaranties.

ML has been around for 30-40 years, and has failed to catch on.
December 12, 2012
Re: OT (partially): about promotion of integers
On 12/11/2012 4:06 PM, bearophile wrote:
> Plus one or two switches to disable such checking, if/when someone wants it, to
> regain the C performance. (Plus some syntax way to disable/enable such checking
> in a small piece of code).

I.e. the C# "solution".

1. The global switch "solution": What I hate about this was discussed earlier 
today in another thread. Global switches that change the semantics of the 
language are a disaster. It means you cannot write a piece of code and have 
confidence that it will behave in a certain way. It means your testing becomes a 
combinatorial explosion of cases - how many modules do you have, and you must 
(to be thorough) test every combination of switches across your whole project. 
If you have a 2 way switch, and 8 modules, that's 256 test runs.

2. The checked block "solution": This is a blunt club that affects everything 
inside a block. What happens with template instantiations, inlined functions, 
and mixins, for starters? What if you want one part of the expression checked 
and not another? What a mess.


> Maybe someday Walter will change his mind about this topic :-)

Not likely :-)

What you (and anyone else) *can* do, today, is write a SafeInt struct that acts 
just like an int, but checks for overflow. It's very doable (one exists for 
C++). Write it, use it, and prove its worth. Then you'll have a far better case. 
Write a good one, and we'll consider it for Phobos.
December 12, 2012
Re: OT (partially): about promotion of integers
Walter Bright:

> ML has been around for 30-40 years, and has failed to catch on.

OcaML, Haskell, F#, and so on are all languages derived more or 
less directly from ML, that share many of its ideas. Has Haskell 
caught on? :-)

Bye,
bearophile
December 12, 2012
Re: OT (partially): about promotion of integers
H. S. Teoh:

> Just because you specify a certain compiler switch, it can cause
> unrelated breakage in some obscure library somewhere, that 
> assumes modular arithmetic with C/C++ semantics.

The idea was about two switches, one for signed integrals, and 
the other for both signed and unsigned. But from other posts I 
guess Walter doesn't think this is a viable possibility.

So the solutions I see now are stop using D for some kind of more 
important programs, or using some kind of safeInt, and then work 
with the compiler writers to allow user-defined structs to be 
usable as naturally as possible as ints (and possibly 
efficiently).

Regarding safeInt I think today there is no way to write it 
efficiently in D, because the overflow flags are not accessible 
from D, and if you use inlined asm, you lose inlining in DMD. 
This is just one of the problems. The other problems are syntax 
incompatibilities of user-defined structs compared to built-in 
ints. Other problems are the probable lack of high-level 
optimizations done on such user defined type.

We are very far from a good solution to such problems.

Bye,
bearophile
1 2 3 4 5 6 7
Top | Discussion index | About this forum | D home