Jump to page: 1 2
Thread overview
Casts and some suggestions to avoid them
Apr 08, 2014
bearophile
Apr 08, 2014
bearophile
Apr 08, 2014
H. S. Teoh
Apr 08, 2014
bearophile
Apr 08, 2014
bearophile
Apr 08, 2014
Colden Cullen
Apr 09, 2014
Marco Leise
Apr 15, 2014
Colden Cullen
Apr 09, 2014
Rikki Cattermole
Apr 09, 2014
bearophile
Apr 09, 2014
Meta
April 08, 2014
In D (and other languages) casts are dangerous because often they punch holes in the type system, and they shut up the compiler, so nothing catches your mistakes. And even if you write correct code the first time, later you can change some types in your code and introduce some incongruity that casts will not complain about.

Phobos and D help avoid casts in several ways, like value range analysis, the new double(x) syntax, functions and templates like std.conv.signed and std.traits.Signed, the powerful converter to!, using "cast()" to convert to mutable without writing the type, using strongly pure functions to convert mutable results to immutable implicitly, using CTFE to initialize immutable data, using Unqual!, or using std.string.representation, using std.exception.assumeUnique, etc. D and Phobos are doing a lot to avoid the need to cast, but perhaps more can be done.

I've done a little statistics on about 208 casts in code I have written. The relative frequency of the various casts changes according to the kind of D code you write, if you do a lot of OOP with dynamic casts, or if you do lot of low-level programming (or lot of interfacing with C code), that often requires some casts.

Here beside the usage frequencies, I also show some examples of each kind, and some ideas to reduce the need to cast, usually with Phobos code.

- - - - - - - -

Of those casts about 73 casts are conversions from a floating point value to integral value, like:
cast(uint)(x * 1.75)
cast(int)sqrt(real(ns))

In some cases you can use the to! template instead of cast.

- - - - - - - -

About 20 casts are conversions from a floating point value returned by floor/round/ceil to integral, like:
cast(ubyte)round(x)
cast(int)floor(y)

At first looking at std.math I was a bit puzzled by those functions returning a floating point value. 99% of the times I need to cast their result to an integral value. But what type of integral type? So I think I'd like those functions (or similar functions) to accept a template type argument to specify what type I want the result:
round!ubyte(x)
floor!int(y)

- - - - - - - -

About 20 casts are for the return type of malloc/calloc/realloc/alloca, like:
cast(ubyte*)alloca(ubyte.sizeof * x);
cast(T*)malloc(typeof(T).sizeof * 10);

A set of 3 little wrappers around those functions in Phobos can remove those casts (this can't be done with alloca), they are safer than using the raw C functions:
cMalloc!T(n)
cCalloc!T(n)
cRealloc(ptr, n)

- - - - - - - -

About 14 are reinterpret casts, sometimes to see an uint as a sequence of ubytes, array casts, etc:
cast(ubyte*)&x;
cast(ubyte[4]*)&data;
cast(uint[])text.to!(dchar[])
cast(ubyte[3])[x % 256, y % 256, x % 256]

- - - - - - - -

About 8 casts are needed by the opposite of std.string.representation, so they replace a unrepresentation function.

See:
https://d.puremagic.com/issues/show_bug.cgi?id=10162

With such function in Phobos all or most of such casts are not needed.

- - - - - - - -

About 7 are caused by feqrel, that requires mutable arguments:
const double x, y;
feqrel(cast()x, cast()y)
I presume this is just a Phobos bug, so such casts can eventually be removed.

https://d.puremagic.com/issues/show_bug.cgi?id=6586

- - - - - - - -

About 6 casts are used to convert an array of enums to an array of the underlying type, like:

enum C : char { A='a', B='b' }
C[50] arr;
cast(char[])arr

Keeping 'arr' as an array of C is handy for safety or for other reasons, but perhaps you need to print arr compactly or you need the char[] for other reasons.

I think you can't use to! in this case.

- - - - - - - -

About 5 casts are used to convert the result of std.file.read to an usable array type (because in some cases readText is not the right function to use), like:
cast(char[])"data1.txt".read
cast(ubyte[])"data2.txt".read

The cast can be avoided with  similar function that accepts a template type (there are perhaps ways to this with already present Phobos functions, suggestions are welcome):
read!(char[])("data1.txt")

- - - - - - - -

About 4 casts are needed because the D compiler misses some "obvious" value range propagations, like:

void foo(immutable ulong x) {
    if (x <= uint.max)
        uint y = x;

    char['z' - 'a' + 1] arr;
    foreach (immutable i, ref c; arr)
        c = 'a' + i;
}


struct Foo {
    immutable char c;

    this(in int c_)
    in {
        assert(c_ >= '0' && c_ <= '9');
    } body {
        this.c = c_;
    }
}


See:
https://d.puremagic.com/issues/show_bug.cgi?id=9570
https://d.puremagic.com/issues/show_bug.cgi?id=10594
https://d.puremagic.com/issues/show_bug.cgi?id=10685
https://d.puremagic.com/issues/show_bug.cgi?id=12514

- - - - - - - -

About 4 casts are used by hex strings, like:

ubyte[] data = cast(ubyte[])x"00 11 22 33 AB";

I think hex strings should be implicitly castable to ubyte[], avoiding the need to a cast, or if you don't like implicit casts then I think they should be of type ubyte[], because in about 100% of the cases I don't want a char[].

There are many cases of such useless cast in Phobos:
https://d.puremagic.com/issues/show_bug.cgi?id=10453

- - - - - - - -

In about 4 cases I have used a cast to take part of a number, like taking the lower 32 bits of a ulong, and so on.

In some cases you can remove such casts using a union (like a union of one ulong and a uint[2]).

- - - - - - - -

In 2 cases I have used cast because despite array concatenations generate a new array, if you concatenate two const/immutable arrays the result can't be a mutable (and I needed a mutable result):

void main() {
    const char[] a, b;
    char[] c = a ~ b;
    char d;
    char[] e = a ~ d;
}


This is an old issue:
https://d.puremagic.com/issues/show_bug.cgi?id=1654

- - - - - - - -

In 2 cases I have had to cast to convert an array length to type uint to allow the code compile on both a 32 and 64 bit system, to assign such length to some uint value.

- - - - - - - -

In 1 case I've had to use a dynamic cast on class instances. In theory in Phobos you can add specialized upcasts, downcasts, etc, that are more explicit and safer.

- - - - - - - -

I have also counted about 38 unsorted casts that don't easily fit in the precedent categories. They are so varied that it's not easy to find ways to avoid them.

Bye,
bearophile
April 08, 2014
> round!ubyte(x)
> floor!int(y)

https://d.puremagic.com/issues/show_bug.cgi?id=12547


> cMalloc!T(n)
> cCalloc!T(n)
> cRealloc(ptr, n)

https://d.puremagic.com/issues/show_bug.cgi?id=12548

Bye,
bearophile
April 08, 2014
On Tue, Apr 08, 2014 at 06:38:46PM +0000, bearophile wrote: [...]
> I've done a little statistics on about 208 casts in code I have written.
[...]
> Of those casts about 73 casts are conversions from a floating point
> value to integral value, like:
> cast(uint)(x * 1.75)
> cast(int)sqrt(real(ns))
> 
> In some cases you can use the to! template instead of cast.

Which cases don't work? My impression is that to! should be preferred to casts in this case, because it will actually check runtime value ranges and throw an error if, say, the float exceeds the range of int. Using a cast will silently ignore overflowed values, leading to hard-to-find bugs.


[...]
> About 20 casts are for the return type of malloc/calloc/realloc/alloca,
> like:
> cast(ubyte*)alloca(ubyte.sizeof * x);
> cast(T*)malloc(typeof(T).sizeof * 10);
> 
> A set of 3 little wrappers around those functions in Phobos can remove
> those casts (this can't be done with alloca), they are safer than
> using the raw C functions:
> cMalloc!T(n)
> cCalloc!T(n)
> cRealloc(ptr, n)

This issue will (hopefully?) be addressed when Andrei finalizes his
allocators, perhaps?


[...]
> About 14 are reinterpret casts, sometimes to see an uint as a sequence
> of ubytes, array casts, etc:
> cast(ubyte*)&x;
> cast(ubyte[4]*)&data;
> cast(uint[])text.to!(dchar[])
> cast(ubyte[3])[x % 256, y % 256, x % 256]

Reinterpret casts are probably irreplaceable, because often they are used when you want to directly access the raw representation of some piece of data (e.g., to transmit a struct over the network, or serialize it to file, etc.). D does give some useful tools to do this with minimal risks (e.g., .sizeof), but still, this kind of cast is inherently dangerous and prone to breakage when you redefine your types.


[...]
> About 6 casts are used to convert an array of enums to an array of the underlying type, like:
> 
> enum C : char { A='a', B='b' }
> C[50] arr;
> cast(char[])arr
> 
> Keeping 'arr' as an array of C is handy for safety or for other reasons, but perhaps you need to print arr compactly or you need the char[] for other reasons.
> 
> I think you can't use to! in this case.

I think to! can probably be extended to perform this conversion.


> About 5 casts are used to convert the result of std.file.read to an
> usable array type (because in some cases readText is not the right
> function to use), like:
> cast(char[])"data1.txt".read
> cast(ubyte[])"data2.txt".read
> 
> The cast can be avoided with  similar function that accepts a template
> type (there are perhaps ways to this with already present Phobos
> functions, suggestions are welcome):
> read!(char[])("data1.txt")

Agreed.


[...]
> About 4 casts are used by hex strings, like:
> 
> ubyte[] data = cast(ubyte[])x"00 11 22 33 AB";
> 
> I think hex strings should be implicitly castable to ubyte[], avoiding the need to a cast, or if you don't like implicit casts then I think they should be of type ubyte[], because in about 100% of the cases I don't want a char[].

Agreed, I can't think of any common use case where you'd want a hex string to be char[] instead of ubyte[]. The only case I can think of, (which is not common at all) is when you want to explicitly construct test cases for UTF strings with specific code point sequences (e.g., invalid sequences to test UTF error-catching code).


[...]
> In about 4 cases I have used a cast to take part of a number, like taking the lower 32 bits of a ulong, and so on.
> 
> In some cases you can remove such casts using a union (like a union of
> one ulong and a uint[2]).

Using a union here is not a good idea, because the results depend on the
endianness of the machine! It's better to just use (a & 0xFFFF) or (a >>
16) instead.


[...]
> In 2 cases I have had to cast to convert an array length to type uint to allow the code compile on both a 32 and 64 bit system, to assign such length to some uint value.

This is inherently unsafe, since it risks silent truncation of very large arrays. Admittedly, that's unlikely on a 32-bit machine, but still... I think a cast is justified here (as a warning sign that the code may have fragile behaviour -- e.g., while running on a 64-bit machine).


[...]
> In 1 case I've had to use a dynamic cast on class instances. In theory in Phobos you can add specialized upcasts, downcasts, etc, that are more explicit and safer.

In OO, explicit downcasting is usually frowned upon as the sign of bad design (due to the Liskov Substitution Principle). Nevertheless, AFAIK, downcasting in D is actually safe:

	BaseClass b;
	auto d = cast(DerivedClass) b;
	if (d is null)
	{
		// b was not an instance of DerivedClass
	}
	else
	{
		// d is safe to use
	}

So I don't think this case counts. The cast operator was explicitly designed to handle this case (among other cases).


T

-- 
If creativity is stifled by rigid discipline, then it is not true creativity.
April 08, 2014
H. S. Teoh:

> Which cases don't work?

Example: in a nothrow function. Unless you catch the exception locally.

To solve this in Bugzilla I have proposed a nothrow function maybeTo that returns a Nullable!T:

https://d.puremagic.com/issues/show_bug.cgi?id=6840

Also a cast is faster and lighter than to! so in some cases it's needed.


> This issue will (hopefully?) be addressed when Andrei finalizes his allocators, perhaps?

Andrei allocators are very nice, and they help, but I think they can't replace the C allocation functions in every case.


> I think to! can probably be extended to perform this conversion.

It's not so simple, there are some constraints.


> Using a union here is not a good idea, because the results depend on the endianness of the machine! It's better to
> just use (a & 0xFFFF) or (a >> 16) instead.

Right.


> This is inherently unsafe, since it risks silent truncation of very large arrays.

In some cases you can assume to not have huge arrays. And you can even test the hugeness of the length before the cast or inside the function precondition.

Bye,
bearophile
April 08, 2014
H. S. Teoh:

>> In some cases you can remove such casts using a union (like a union of one ulong and a uint[2]).
>
> Using a union here is not a good idea, because the results depend on the
> endianness of the machine! It's better to just use (a & 0xFFFF) or (a >> 16) instead.

Better to avoid magic constants, you can forget one F or something. In this case you have to use 0xFFFF_FFFFu. This is safer and more readable:

a & uint.max

Bye,
bearophile
April 08, 2014
One issue I've had huge amounts of trouble with is casting to and from shared. The primary problem is that most of phobos doesn't handle shared values at all.

If there was some inout style thing but for shared/unshared instead of mutable/immutable/const that would be super helpful.
April 09, 2014
Am Tue, 08 Apr 2014 21:30:08 +0000
schrieb "Colden Cullen" <ColdenCullen@gmail.com>:

> One issue I've had huge amounts of trouble with is casting to and from shared. The primary problem is that most of phobos doesn't handle shared values at all.
> 
> If there was some inout style thing but for shared/unshared instead of mutable/immutable/const that would be super helpful.

Can you explain what level of atomicity you expect?

1) what atomicity?
2) atomic operations on single instructions
3) the whole Phobos function should be atomic with respect to
   the shared values passed to it
4) some mutex in your "business logic" will make sure there
   are no race conditions

Shared currently does two things I know of (besides
circumventing TLS):
- simply tag a variable as "multi-threaded" so you don't
  forget that fact
- the compiler will not reorder or cache access to it

So what would it add to Phobos if everything accepted shared?
In particular how would that improve thread-safety, which is
the aim of marking things shared?
It doesn't, because only the functions in core.atomic make
sense to accept shared. The reason is simply that they are
running a single instruction on a single shared operand and not
a complete algorithm. Anything longer needs to be implemented
with thought put into race conditions.

Example:

x = min(a, b);

Say a == 1 and b == 2. The  function would load a from memory into a CPU register, then some other thread changes a to 3, then the function compares the register content with b and returns 1, which is no longer correct at this point in time.

It is not that it can never be what you want, but that min() alone cannot decide what is right for YOUR code.

So instead of passing shared values to generic algorithms, we only really need UNSHARED!

-- 
Marco

April 09, 2014
On Tuesday, 8 April 2014 at 18:38:47 UTC, bearophile wrote:
...
> In 2 cases I have had to cast to convert an array length to type uint to allow the code compile on both a 32 and 64 bit system, to assign such length to some uint value.
...
> Bye,
> bearophile

Personally I design my code around size_t/ptrdiff_t to eliminate
these issues as much as possible. Yeah its more memory but it
does mean less issues with 32/64bit.
April 09, 2014
> I have also counted about 38 unsorted casts that don't easily fit in the precedent categories. They are so varied that it's not easy to find ways to avoid them.

In my post I have not shown examples of the casts for the this "unsorted" category. They are sometimes needed to work around compiler bugs, like this one (the code doesn't compile if you remove the cast):

void main() {
    enum E { a, b }
    int[E][E] foo =
        cast()[E.a: [E.a: 1, E.b: 2],
               E.b: [E.a: 3, E.b: 4]];
}


Bye,
bearophile
April 09, 2014
On Wednesday, 9 April 2014 at 21:18:38 UTC, bearophile wrote:
>> I have also counted about 38 unsorted casts that don't easily fit in the precedent categories. They are so varied that it's not easy to find ways to avoid them.
>
> In my post I have not shown examples of the casts for the this "unsorted" category. They are sometimes needed to work around compiler bugs, like this one (the code doesn't compile if you remove the cast):
>
> void main() {
>     enum E { a, b }
>     int[E][E] foo =
>         cast()[E.a: [E.a: 1, E.b: 2],
>                E.b: [E.a: 3, E.b: 4]];
> }
>
>
> Bye,
> bearophile

I forgot that nested AAs were even possible. I was thinking about this yesterday and was positive that they weren't.
« First   ‹ Prev
1 2