December 11, 2012
On Tuesday, 11 December 2012 at 22:08:15 UTC, Walter Bright wrote:
> On 12/11/2012 10:44 AM, foobar wrote:
>> All of the above relies on the assumption that the safety problem is due to the
>> memory layout. There are many other programming languages that solve this by
>> using a different point of view - the problem lies in the implicit casts and not
>> the memory layout. In other words, the culprit is code such as:
>> uint a = -1;
>> which compiles under C's implicit coercion rules but _really shouldn't_.
>> The semantically correct way would be something like:
>> uint a = 0xFFFF_FFFF;
>> but C/C++ programmers tend to think the "-1" trick is less verbose and "better".
>
> Trick? Not at all.
>
> 1. -1 is the size of an int, which varies in C.
>
> 2. -i means "complement and then increment".
>
> 3. Would you allow 2-1? How about 1-1? (1-1)-1?
>
> Arithmetic in computers is different from the math you learned in school. It's 2's complement, and it's best to always keep that in mind when writing programs.

Thanks for proving my point. after all , you are a C++ developer, aren't you? :)
Seriously though, it _is_ a trick and a code smell.
I'm fully aware that computers used 2's complement. I'm also am aware of the fact that the type has an "unsigned" label all over it. You see it right there in that 'u' prefix of 'int'. An unsigned type should semantically entail _no sign_ in its operations. You are calling a cat a dog and arguing that dogs barf? Yeah, I completely agree with that notion, except, we are still talking about _a cat_.

To answer you question, yes, I would enforce overflow and underflow checking semantics. Any negative result assigned to an unsigned type _is_ a logic error.
you can claim that:
uint a = -1;
is perfectly safe and has a well defined meaning (well, for C programmers that is), but what about:
uint a = b - c;
what if that calculation results in a negative number? What should the compiler do? well, there are _two_ equally possible solutions:
a. The overflow was intended as in the mask = -1 case; or
b. The overflow is a _bug_.

The user should be made aware of this and should make the decision how to handle this. This should _not_ be implicitly handled by the compiler and allow bugs go unnoticed.

I think C# solved this _way_ better than C/D. Another data point would be (S)ML which is a compiled language which requires _explicit conversions_ and has a very strong typing system. Its programs are compiled to efficient native executables and the strong typing allows both the compiler and the programmer better reasoning of the code. Thus programs are more correct and can be optimized by the compiler. In fact, several languages are implemented in ML because of its higher guaranties.
December 11, 2012
Walter Bright:

> I don't notice anyone reaching for Lisp or Ocaml for high performance applications.

Nowadays CommonLisp is not used much for anything (people at ITA use it to plan flights, their code is efficient, algorithmically complex, and used for heavy loads).

OCaML on the other hand is regarded as quite fast (but it's not much used in general), it's sometimes used for its high performance united to its greater safety, so someone uses it in automatic high-speed trading:

https://ocaml.janestreet.com/?q=node/61
https://ocaml.janestreet.com/?q=node/82


>> I think the compiler doesn't perform on BigInts the optimizations it does on
>> ints, because it doesn't know about bigint properties.
>
> I think the general lack of interest in bigints indicate that the builtin types work well enough for most work.

Where do you see this general lack of interest in bigints? In D or in other languages?

I use bigints often in D. In Python we use only bigints. In Scheme, OcaML and Lisp-like languages multi-precison numbers are the default ones. I think if you give programmers better bigints (this means efficient and usable as naturally as ints), they will use them.

I think currently in D there is no way to make bigints as efficient as ints because there is no ways to express in D the full semantics of integral numbers, that ints have. This is a language limitation. One way to solve this problem, and keep BigInts as Phobos code, is to introduce a built-in attribute that's usable to mark user-defined structs as int-like.

----------------------

deadalnix:

>But OCaml is really very performant.<

It's fast considering it's a mostly functional language.

OCaML Vs C++ in the Shootout:

http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=ocaml&lang2=gpp

Versus Haskell:
http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=ocaml&lang2=ghc

But as usual you have to take such comparisons cum grano salis, because there are a lot more people working on the GHC compiler and because the Shootout Haskell solutions are quite un-idiomatic (you can see it also from the Shootout site itself, taking a look at the length of the solutions) and they come from several years of maniac-level discussions (they have patched the Haskell compiler and its library several times to improve the results of those benchmarks):

http://www.haskell.org/haskellwiki/Shootout


>I don't know how it handle integer internally.<

It uses tagged integers, that are 31 or 63 bits long, the tag on the less significant side:

http://stackoverflow.com/questions/3773985/why-is-an-int-in-ocaml-only-31-bits

Bye,
bearophile
December 12, 2012
foobar:

>I would enforce overflow and underflow checking semantics.<

Plus one or two switches to disable such checking, if/when someone wants it, to regain the C performance. (Plus some syntax way to disable/enable such checking in a small piece of code).

Maybe someday Walter will change his mind about this topic :-)

Bye,
bearophile
December 12, 2012
On Wednesday, 12 December 2012 at 00:06:53 UTC, bearophile wrote:
> foobar:
>
>>I would enforce overflow and underflow checking semantics.<
>
> Plus one or two switches to disable such checking, if/when someone wants it, to regain the C performance. (Plus some syntax way to disable/enable such checking in a small piece of code).
>
> Maybe someday Walter will change his mind about this topic :-)
>
> Bye,
> bearophile

Yeah, of course, that's why I said the C# semantics are _way_ better. (That's a self quote)

btw, here's the link for SML which does not use tagged ints -
http://www.standardml.org/Basis/word.html#Word8:STR:SPEC

"Instances of the signature WORD provide a type of unsigned integer with modular arithmetic and logical operations and conversion operations. They are also meant to give efficient access to the primitive machine word types of the underlying hardware, and support bit-level operations on integers. They are not meant to be a ``larger'' int. "
December 12, 2012
On 12/11/2012 3:15 PM, deadalnix wrote:
>> That's irrelevant to this discussion. It is not a problem with the language.
>> Anyone can improve the library one if they desire, or do their own.
> The library is part of the language. What is a language with no vocabulary ?

I think it is useful to draw a distinction.


>>> I think the compiler doesn't perform on BigInts the optimizations it does on
>>> ints, because it doesn't know about bigint properties.
>>
>> I think the general lack of interest in bigints indicate that the builtin
>> types work well enough for most work.
>
> That argument is fallacious. Something more used don't really mean better. OR
> PHP and C++ are some of the best languages ever made.

I'm interested in crafting D to be a language that people will like and use. Therefore, what things make a language popular are of significant interest.

I.e. it's meaningless to create the best language evar and be the only user of it.

Now, if we have int with terrible problems, and bigint that solves those problems, and yet people still prefer int by a 1000:1 margin, that makes me very skeptical that those problems actually matter.

We need to be solving the *right* problems with D.

December 12, 2012
On Wed, Dec 12, 2012 at 01:26:08AM +0100, foobar wrote:
> On Wednesday, 12 December 2012 at 00:06:53 UTC, bearophile wrote:
> >foobar:
> >
> >>I would enforce overflow and underflow checking semantics.<
> >
> >Plus one or two switches to disable such checking, if/when someone wants it, to regain the C performance. (Plus some syntax way to disable/enable such checking in a small piece of code).
> >
> >Maybe someday Walter will change his mind about this topic :-)

I don't agree that compiler switches should change language semantics. Just because you specify a certain compiler switch, it can cause unrelated breakage in some obscure library somewhere, that assumes modular arithmetic with C/C++ semantics. And this breakage will in all likelihood go *unnoticed* until your software is running on the customer's site and then it crashes horribly. And good luck debugging that, because the breakage can be very subtle, plus it's *not* in your own code, but in some obscure library code that you're not familiar with.

I think a much better approach is to introduce a new type (or new types) that *does* have the requisite bounds checking and static analysis. That's what a type system is for.


[...]
> Yeah, of course, that's why I said the C# semantics are _way_
> better. (That's a self quote)
> 
> btw, here's the link for SML which does not use tagged ints - http://www.standardml.org/Basis/word.html#Word8:STR:SPEC
> 
> "Instances of the signature WORD provide a type of unsigned integer with modular arithmetic and logical operations and conversion operations. They are also meant to give efficient access to the primitive machine word types of the underlying hardware, and support bit-level operations on integers. They are not meant to be a ``larger'' int. "

It's kinda too late for D to rename int to word, say, but it's not too late to introduce a new checked int type, say 'number' or something like that (you can probably think of a better name).

In fact, Andrei describes a CheckedInt type that uses operator overloading, etc., to implement an in-library solution to bounds checks. You can probably expand that into a workable lightweight int replacement. By wrapping an int in a struct with custom operators, you can pretty much have an int-sized type (with value semantics, just like "native" ints, no less!) that does what you want, instead of the usual C/C++ int semantics.


T

-- 
In a world without fences, who needs Windows and Gates? -- Christian Surchi
December 12, 2012
On 12/11/2012 3:44 PM, foobar wrote:
> Thanks for proving my point. after all , you are a C++ developer, aren't you? :)

No, I'm an assembler programmer. I know how the machine works, and C, C++, and D map onto that, quite deliberately. It's one reason why D supports the vector types directly.


> Seriously though, it _is_ a trick and a code smell.

Not to me. There is no trick or "smell" to anyone familiar with how computers work.


> I'm fully aware that computers used 2's complement. I'm also am aware of the
> fact that the type has an "unsigned" label all over it. You see it right there
> in that 'u' prefix of 'int'. An unsigned type should semantically entail _no
> sign_ in its operations. You are calling a cat a dog and arguing that dogs barf?
> Yeah, I completely agree with that notion, except, we are still talking about _a
> cat_.

Andrei and I have endlessly talked about this (he argued your side). The inevitable result is that signed and unsigned types *are* conflated in D, and have to be, otherwise many things stop working.

For example, p[x]. What type is x?

Integer signedness in D is not really a property of the data, it is only how one happens to interpret the data in a specific context.


> To answer you question, yes, I would enforce overflow and underflow checking
> semantics. Any negative result assigned to an unsigned type _is_ a logic error.
> you can claim that:
> uint a = -1;
> is perfectly safe and has a well defined meaning (well, for C programmers that
> is), but what about:
> uint a = b - c;
> what if that calculation results in a negative number? What should the compiler
> do? well, there are _two_ equally possible solutions:
> a. The overflow was intended as in the mask = -1 case; or
> b. The overflow is a _bug_.
>
> The user should be made aware of this and should make the decision how to handle
> this. This should _not_ be implicitly handled by the compiler and allow bugs go
> unnoticed.
>
> I think C# solved this _way_ better than C/D.

C# has overflow checking off by default. It is enabled by either using a checked { } block, or with a compiler switch. I don't see that as "solving" the issue in any elegant or natural way, it's more of a clumsy hack.

But also consider that C# does not allow pointer arithmetic, or array slicing. Both of these rely on wraparound 2's complement arithmetic.


> Another data point would be (S)ML
> which is a compiled language which requires _explicit conversions_ and has a
> very strong typing system. Its programs are compiled to efficient native
> executables and the strong typing allows both the compiler and the programmer
> better reasoning of the code. Thus programs are more correct and can be
> optimized by the compiler. In fact, several languages are implemented in ML
> because of its higher guaranties.

ML has been around for 30-40 years, and has failed to catch on.
December 12, 2012
On 12/11/2012 4:06 PM, bearophile wrote:
> Plus one or two switches to disable such checking, if/when someone wants it, to
> regain the C performance. (Plus some syntax way to disable/enable such checking
> in a small piece of code).

I.e. the C# "solution".

1. The global switch "solution": What I hate about this was discussed earlier today in another thread. Global switches that change the semantics of the language are a disaster. It means you cannot write a piece of code and have confidence that it will behave in a certain way. It means your testing becomes a combinatorial explosion of cases - how many modules do you have, and you must (to be thorough) test every combination of switches across your whole project. If you have a 2 way switch, and 8 modules, that's 256 test runs.

2. The checked block "solution": This is a blunt club that affects everything inside a block. What happens with template instantiations, inlined functions, and mixins, for starters? What if you want one part of the expression checked and not another? What a mess.


> Maybe someday Walter will change his mind about this topic :-)

Not likely :-)

What you (and anyone else) *can* do, today, is write a SafeInt struct that acts just like an int, but checks for overflow. It's very doable (one exists for C++). Write it, use it, and prove its worth. Then you'll have a far better case. Write a good one, and we'll consider it for Phobos.

December 12, 2012
Walter Bright:

> ML has been around for 30-40 years, and has failed to catch on.

OcaML, Haskell, F#, and so on are all languages derived more or less directly from ML, that share many of its ideas. Has Haskell caught on? :-)

Bye,
bearophile
December 12, 2012
H. S. Teoh:

> Just because you specify a certain compiler switch, it can cause
> unrelated breakage in some obscure library somewhere, that assumes modular arithmetic with C/C++ semantics.

The idea was about two switches, one for signed integrals, and the other for both signed and unsigned. But from other posts I guess Walter doesn't think this is a viable possibility.

So the solutions I see now are stop using D for some kind of more important programs, or using some kind of safeInt, and then work with the compiler writers to allow user-defined structs to be usable as naturally as possible as ints (and possibly efficiently).

Regarding safeInt I think today there is no way to write it efficiently in D, because the overflow flags are not accessible from D, and if you use inlined asm, you lose inlining in DMD. This is just one of the problems. The other problems are syntax incompatibilities of user-defined structs compared to built-in ints. Other problems are the probable lack of high-level optimizations done on such user defined type.

We are very far from a good solution to such problems.

Bye,
bearophile