Thanks Nazriel,
It is very cool you are able to narrow the gap to within 1.5x of c++ with a few simple changes.
I checked your version, there are 3 changes (correct me if i missed any):
* Change the (float) constructor from v= [x,x,x] to v[0] = x; v[1] = x; v[2] = x;
* Get rid of the (float[]) constructor and use 3 floats instead
* Change class methods to final
The first change alone shaved off 220ms off the runtime, the 2nd one cuts 130ms
and the 3rd one cuts 60ms.
Lesson learned: by very very careful about dynamic arrays.