Jump to page: 1 2
Thread overview
Comparing D vs C++ (wierd behaviour of C++)
Jul 24, 2018
Daniel Kozak
Jul 24, 2018
Ecstatic Coder
Jul 24, 2018
Patrick Schluter
Jul 24, 2018
Ecstatic Coder
Jul 24, 2018
Patrick Schluter
Jul 24, 2018
Patrick Schluter
Jul 24, 2018
Ecstatic Coder
Jul 24, 2018
Patrick Schluter
Jul 24, 2018
Ecstatic Coder
Jul 24, 2018
Patrick Schluter
Jul 24, 2018
Caspar Kielwein
July 24, 2018
I am not C++ expert so this seems wierd to me:

#include <iostream>
#include <string>

using namespace std;

int main(int argc, char **argv)
{
	char c = 0xFF;
	std::string sData = {c,c,c,c};
	unsigned int i = (((((sData[0]&0xFF)*256
					+ (sData[1]&0xFF))*256)
					+ (sData[2]&0xFF))*256
					+ (sData[3]&0xFF));
					
	if (i != 0xFFFFFFFF) { // it is true why?
		// this print 18446744073709551615 wow
		std::cout << "WTF: " << i  << std::endl;
	}	    	
	return 0;
}

compiled with:
g++ -O2 -Wall  -o "test" "test.cxx"
when compiled with -O0 it works as expected

Vs. D:

import std.stdio;

void main(string[] args)
{
	char c = 0xFF;
	string sData = [c,c,c,c];
	uint i = (((((sData[0]&0xFF)*256
					+ (sData[1]&0xFF))*256)
					+ (sData[2]&0xFF))*256
					+ (sData[3]&0xFF));
	if (i != 0xFFFFFFFF) { // is false - make sense
		writefln("WTF: %d", i);
	}			
}

compiled with:
dmd -release -inline -boundscheck=off -w -of"test" "test.d"

So it is code gen bug on c++ side, or there is something wrong with that code.





July 24, 2018
On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
> I am not C++ expert so this seems wierd to me:
>
> #include <iostream>
> #include <string>
>
> using namespace std;
>
> int main(int argc, char **argv)
> {
> 	char c = 0xFF;
> 	std::string sData = {c,c,c,c};
> 	unsigned int i = (((((sData[0]&0xFF)*256
> 					+ (sData[1]&0xFF))*256)
> 					+ (sData[2]&0xFF))*256
> 					+ (sData[3]&0xFF));
> 					
> 	if (i != 0xFFFFFFFF) { // it is true why?
> 		// this print 18446744073709551615 wow
> 		std::cout << "WTF: " << i  << std::endl;
> 	}	    	
> 	return 0;
> }
>
> compiled with:
> g++ -O2 -Wall  -o "test" "test.cxx"
> when compiled with -O0 it works as expected
>
> Vs. D:
>
> import std.stdio;
>
> void main(string[] args)
> {
> 	char c = 0xFF;
> 	string sData = [c,c,c,c];
> 	uint i = (((((sData[0]&0xFF)*256
> 					+ (sData[1]&0xFF))*256)
> 					+ (sData[2]&0xFF))*256
> 					+ (sData[3]&0xFF));
> 	if (i != 0xFFFFFFFF) { // is false - make sense
> 		writefln("WTF: %d", i);
> 	}			
> }
>
> compiled with:
> dmd -release -inline -boundscheck=off -w -of"test" "test.d"
>
> So it is code gen bug on c++ side, or there is something wrong with that code.

As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.
July 24, 2018
On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
> I am not C++ expert so this seems wierd to me:
>
> #include <iostream>
> #include <string>
>
> using namespace std;
>
> int main(int argc, char **argv)
> {
> 	char c = 0xFF;
> 	std::string sData = {c,c,c,c};
> 	unsigned int i = (((((sData[0]&0xFF)*256
> 					+ (sData[1]&0xFF))*256)
> 					+ (sData[2]&0xFF))*256
> 					+ (sData[3]&0xFF));
> 					
> 	if (i != 0xFFFFFFFF) { // it is true why?
> 		// this print 18446744073709551615 wow
> 		std::cout << "WTF: " << i  << std::endl;
> 	}	    	
> 	return 0;
> }
>
> compiled with:
> g++ -O2 -Wall  -o "test" "test.cxx"
> when compiled with -O0 it works as expected
>
> Vs. D:
>
> import std.stdio;
>
> void main(string[] args)
> {
> 	char c = 0xFF;
> 	string sData = [c,c,c,c];
> 	uint i = (((((sData[0]&0xFF)*256
> 					+ (sData[1]&0xFF))*256)
> 					+ (sData[2]&0xFF))*256
> 					+ (sData[3]&0xFF));
> 	if (i != 0xFFFFFFFF) { // is false - make sense
> 		writefln("WTF: %d", i);
> 	}			
> }

int promotion rule. char is signed. The 256 are signed. When the result goes above INT_MAX it overflows (i.e. we're in UB territory) and the result can be anything. The registers of the CPUs are 64 bit wide so it sign extends the calculation and as the optimization removes the truncating memory write and reload, the value of the complete register is then printed by the cout>>.

Conclusion: typical C(++) undefined behavior due to signed value overflow.
Fix: 256u
and always compile with -ftrapv . In your case it would have catched the overflow.

In D, signed overflow is not UB so everything works as planned.

>
> compiled with:
> dmd -release -inline -boundscheck=off -w -of"test" "test.d"
>
> So it is code gen bug on c++ side, or there is something wrong with that code.


July 24, 2018
On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
> I am not C++ expert so this seems wierd to me:
> (...)
> int main(int argc, char **argv)
> {
> 	char c = 0xFF;
> 	std::string sData = {c,c,c,c};
> 	unsigned int i = (((((sData[0]&0xFF)*256
> 					+ (sData[1]&0xFF))*256)
> 					+ (sData[2]&0xFF))*256
> 					+ (sData[3]&0xFF));
> 					
> 	if (i != 0xFFFFFFFF) { // it is true why?
> 		// this print 18446744073709551615 wow
> 		std::cout << "WTF: " << i  << std::endl;
> 	}	    	
> 	return 0;
> }
>
> compiled with:
> g++ -O2 -Wall  -o "test" "test.cxx"
> when compiled with -O0 it works as expected
>
> Vs. D: ....
> So it is code gen bug on c++ side, or there is something wrong with that code.

Signedness of char in C++ is platform dependent.
See https://en.cppreference.com/w/cpp/language/types "char"
You seem to be running into "signed overflow is undefined behaviour" shenanigans.

with all optimizations clang gives a different result than gcc.
https://godbolt.org/g/Dz5djj

Generally use unsigned char (or std::byte) when char means "memory".
And prefer a std::vector<unsigned char> to std::string in these cases as well.
July 24, 2018
On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
> On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
>> I am not C++ expert so this seems wierd to me:
>>
>> #include <iostream>
>> #include <string>
>>
>> using namespace std;
>>
>> int main(int argc, char **argv)
>> {
>> 	char c = 0xFF;
>> 	std::string sData = {c,c,c,c};
>> 	unsigned int i = (((((sData[0]&0xFF)*256
>> 					+ (sData[1]&0xFF))*256)
>> 					+ (sData[2]&0xFF))*256
>> 					+ (sData[3]&0xFF));
>> 					
>> 	if (i != 0xFFFFFFFF) { // it is true why?
>> 		// this print 18446744073709551615 wow
>> 		std::cout << "WTF: " << i  << std::endl;
>> 	}	    	
>> 	return 0;
>> }
>>
>> compiled with:
>> g++ -O2 -Wall  -o "test" "test.cxx"
>> when compiled with -O0 it works as expected
>>
>> Vs. D:
>>
>> import std.stdio;
>>
>> void main(string[] args)
>> {
>> 	char c = 0xFF;
>> 	string sData = [c,c,c,c];
>> 	uint i = (((((sData[0]&0xFF)*256
>> 					+ (sData[1]&0xFF))*256)
>> 					+ (sData[2]&0xFF))*256
>> 					+ (sData[3]&0xFF));
>> 	if (i != 0xFFFFFFFF) { // is false - make sense
>> 		writefln("WTF: %d", i);
>> 	}			
>> }
>>
>> compiled with:
>> dmd -release -inline -boundscheck=off -w -of"test" "test.d"
>>
>> So it is code gen bug on c++ side, or there is something wrong with that code.
>
> As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.

That's not exactly what happens here. There's no 64 bit buffer. It's signed overflow which is undefined behavior in C and C++.
He gets different results with and without optimization because without optimization the result of the calculation is spilled to the i unsigned int and then reloaded for the print call. This save and reload truncated the value to its real value. In the optimized version, the compiler removed the spill and the overflowed value contained in the register is printed as is.
July 24, 2018
On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter wrote:
> On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
>> On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
>>> I am not C++ expert so this seems wierd to me:
>>>
>>> #include <iostream>
>>> #include <string>
>>>
>>> using namespace std;
>>>
>>> int main(int argc, char **argv)
>>> {
>>> 	char c = 0xFF;
>>> 	std::string sData = {c,c,c,c};
>>> 	unsigned int i = (((((sData[0]&0xFF)*256
>>> 					+ (sData[1]&0xFF))*256)
>>> 					+ (sData[2]&0xFF))*256
>>> 					+ (sData[3]&0xFF));
>>> 					
>>> 	if (i != 0xFFFFFFFF) { // it is true why?
>>> 		// this print 18446744073709551615 wow
>>> 		std::cout << "WTF: " << i  << std::endl;
>>> 	}	    	
>>> 	return 0;
>>> }
>>>
>>> compiled with:
>>> g++ -O2 -Wall  -o "test" "test.cxx"
>>> when compiled with -O0 it works as expected
>>>
>>> Vs. D:
>>>
>>> import std.stdio;
>>>
>>> void main(string[] args)
>>> {
>>> 	char c = 0xFF;
>>> 	string sData = [c,c,c,c];
>>> 	uint i = (((((sData[0]&0xFF)*256
>>> 					+ (sData[1]&0xFF))*256)
>>> 					+ (sData[2]&0xFF))*256
>>> 					+ (sData[3]&0xFF));
>>> 	if (i != 0xFFFFFFFF) { // is false - make sense
>>> 		writefln("WTF: %d", i);
>>> 	}			
>>> }
>>>
>>> compiled with:
>>> dmd -release -inline -boundscheck=off -w -of"test" "test.d"
>>>
>>> So it is code gen bug on c++ side, or there is something wrong with that code.
>>
>> As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.
>
> That's not exactly what happens here. There's no 64 bit buffer.

Sure about that ? ;)

As "i" is printed as 18446744073709551615 when put into cout, I don't see how I couldn't be stored as a uint64...

It's actually -1 stored as an uint64.

This kind of optimizer problem is classical when mixing signed and unsigned values into such bit shifting expressions.

This is why you should always cast the signed input values to the unsigned result type right from the start before starting to mix/shift them.

July 24, 2018
> He gets different results with and without optimization because without optimization the result of the calculation is spilled to the i unsigned int and then reloaded for the print call. This save and reload truncated the value to its real value. In the optimized version, the compiler removed the spill and the overflowed value contained in the register is printed as is.

Btw you are actually confirming what I said.

if (i != 0xFFFFFFFF) ...

In the optimized version, when the 64 bits "i" value is compared to a 32 bits constant, the test fails...

Proof that the value is stored in a **64** bits register, not 32...

July 24, 2018
On Tuesday, 24 July 2018 at 19:24:05 UTC, Ecstatic Coder wrote:
> On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter wrote:
>> On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
>>> On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
>>>> I am not C++ expert so this seems wierd to me:
>>>>
>>>> #include <iostream>
>>>> #include <string>
>>>>
>>>> using namespace std;
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>> 	char c = 0xFF;
>>>> 	std::string sData = {c,c,c,c};
>>>> 	unsigned int i = (((((sData[0]&0xFF)*256
>>>> 					+ (sData[1]&0xFF))*256)
>>>> 					+ (sData[2]&0xFF))*256
>>>> 					+ (sData[3]&0xFF));
>>>> 					
>>>> 	if (i != 0xFFFFFFFF) { // it is true why?
>>>> 		// this print 18446744073709551615 wow
>>>> 		std::cout << "WTF: " << i  << std::endl;
>>>> 	}	    	
>>>> 	return 0;
>>>> }
>>>>
>>>> compiled with:
>>>> g++ -O2 -Wall  -o "test" "test.cxx"
>>>> when compiled with -O0 it works as expected
>>>>
>>>> Vs. D:
>>>>
>>>> import std.stdio;
>>>>
>>>> void main(string[] args)
>>>> {
>>>> 	char c = 0xFF;
>>>> 	string sData = [c,c,c,c];
>>>> 	uint i = (((((sData[0]&0xFF)*256
>>>> 					+ (sData[1]&0xFF))*256)
>>>> 					+ (sData[2]&0xFF))*256
>>>> 					+ (sData[3]&0xFF));
>>>> 	if (i != 0xFFFFFFFF) { // is false - make sense
>>>> 		writefln("WTF: %d", i);
>>>> 	}			
>>>> }
>>>>
>>>> compiled with:
>>>> dmd -release -inline -boundscheck=off -w -of"test" "test.d"
>>>>
>>>> So it is code gen bug on c++ side, or there is something wrong with that code.
>>>
>>> As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.
>>
>> That's not exactly what happens here. There's no 64 bit buffer.
>
> Sure about that ? ;)

Yes, there are no "buffers" only register and a place on the stack for the variable i.

As said it's undefined behaviour so anything goes. I just checked on godbolt what code is generated. https://godbolt.org/g/wxqfmM
So with -O0 this happens:
From line 41 to line 77 the instruction to make the calculation. At line 78
mov DWORD PTR [rbp-40], eax which is writing out 32 bits to reserved space of i.
At line 85  mov eax, DWORD PTR [rbp-40] reloads that value in eax, this annuls the high part of RAX => RAX contains 0x0000_0000_FFFF_FFFF

On the -O2 version it's even simpler. The calculation is done at compile time and the endresult -1 is put directly to the output. The test is even removed. Everything happens in the compiler.
July 24, 2018
On Tuesday, 24 July 2018 at 19:39:10 UTC, Ecstatic Coder wrote:
>> He gets different results with and without optimization because without optimization the result of the calculation is spilled to the i unsigned int and then reloaded for the print call. This save and reload truncated the value to its real value. In the optimized version, the compiler removed the spill and the overflowed value contained in the register is printed as is.
>
> Btw you are actually confirming what I said.
>
> if (i != 0xFFFFFFFF) ...
>
> In the optimized version, when the 64 bits "i" value is compared to a 32 bits constant, the test fails...
>
> Proof that the value is stored in a **64** bits register, not 32...

We're nitpicking over vocabulary. For me buffer != register. Buffer is something in memory in my mental model (or is hardware like the store buffer between register and the cache) but never would I denominate a register as a buffer.
July 24, 2018
On Tuesday, 24 July 2018 at 20:59:22 UTC, Patrick Schluter wrote:
> On Tuesday, 24 July 2018 at 19:24:05 UTC, Ecstatic Coder wrote:
>> On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter wrote:
>>> On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
>>>> On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
>>>>> [...]
>>>>
>>>> As the C++ char are signed by default, when you accumulate several shifted 8 bit -1 into a char result and then store it in a 64 bit unsigned buffer, you get -1 in 64 bits : 18446744073709551615.
>>>
>>> That's not exactly what happens here. There's no 64 bit buffer.
>>
>> Sure about that ? ;)
>
> Yes, there are no "buffers" only register and a place on the stack for the variable i.
>
> As said it's undefined behaviour so anything goes. I just checked on godbolt what code is generated. https://godbolt.org/g/wxqfmM
> So with -O0 this happens:
> From line 41 to line 77 the instruction to make the calculation. At line 78
> mov DWORD PTR [rbp-40], eax which is writing out 32 bits to reserved space of i.
> At line 85  mov eax, DWORD PTR [rbp-40] reloads that value in eax, this annuls the high part of RAX => RAX contains 0x0000_0000_FFFF_FFFF

what I forgot to mention, for the compiler the type deduction for the >> operator is done with the i variable, so it chooses the right template with unsigned int. For the optimized code as the calculation is done during compilation and there is no spill to the variable the type deduction for the >> operator for cout is done with that internal promoted temporary value and it deduces it as long (funnily declaring i as volatile doesn't change that even if the value is spilled to the stack).

>
> On the -O2 version it's even simpler. The calculation is done at compile time and the endresult -1 is put directly to the output. The test is even removed. Everything happens in the compiler.

« First   ‹ Prev
1 2