View mode: basic / threaded / horizontal-split · Log in · Help
September 30, 2009
Re: Null references redux
Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:

> Don wrote:
>> Greater risks come from using more complicated algorithms. Brute-force
>> algorithms are always the easiest ones to get right <g>.
> 
> I'm not sure I agree with that. Those algorithms are pretty isolated and
> really easy to write unittests for so I don't see where the risk is when
> writing more complex algorithms, it's obviously harder, but not riskier.

Do you recommend writing larger algorithms like a hard real-time 
distributed (let's say e.g. for 100+ processes/nodes) garbage collector 
or even larger stuff like btrfs or ntfs file system drivers in assembly? 
Don't you care about portability? Of course it would be nice to provide 
optimal solution for each platform and for each use case, but 
unfortunately the TCO thinking managers do not often agree.
September 30, 2009
Re: Null references redux + Cyclone
Saaa wrote:
> Andrei Alexandrescu wrote
>> I wonder whether this would be a good topic for TDPL. Currently I'm 
>> thinking it's too low-level. I do plan to insert a short section about 
>> implementation, just not go deep inside the object model.
>>
>> Andrei
> 
> I'd really love to see more about implementations as it makes me twitch
> to use something I don't really know the impact of.
> 
> As for using diagrams and other visual presentations:
> Please use them as much as possible;
> e.g. Pointers without arrows is like a film without moving pictures :)

I do have the clasic arrow drawings that illustrate how reference 
semantics works, but by and large I'm not talented with drawing. If 
anyone in this group has such an inclination and would want to 
collaborate with me on the book, let me know. Send me your portfolio :o).

Andrei
September 30, 2009
Re: Null references redux
language_fan wrote:
> Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:
> 
>> Don wrote:
>>> Greater risks come from using more complicated algorithms. Brute-force
>>> algorithms are always the easiest ones to get right <g>.
>> I'm not sure I agree with that. Those algorithms are pretty isolated and
>> really easy to write unittests for so I don't see where the risk is when
>> writing more complex algorithms, it's obviously harder, but not riskier.
> 
> Do you recommend writing larger algorithms like a hard real-time 
> distributed (let's say e.g. for 100+ processes/nodes) garbage collector 
> or even larger stuff like btrfs or ntfs file system drivers in assembly? 
> Don't you care about portability? Of course it would be nice to provide 
> optimal solution for each platform and for each use case, but 
> unfortunately the TCO thinking managers do not often agree.

Why does everyone associate complexity with assembly? You can write a 
more complex algorithm in the same language as the original one and get 
quite a good performance boost (ie binary search vs walking an array). 
Assembly is only useful to optimize when you found the optimal algorithm 
and want to lower its overhead a step further.

I don't recommend any language anyways, the base algorithm is often 
independent of its implementation language, be it implemented in C#, D 
or assembly its gonna do the same thing at different performance levels.

For example a simple binary search is already faster in D than in say 
JavaScript, but its even faster in assembly than in D, that doesn't make 
your entire program harder to code, nor does it change the logic.
September 30, 2009
Re: Null references redux
On 2009-09-30 15:30:02 -0400, "Denis Koroskin" <2korden@gmail.com> said:

> Note that C stdlib (and other libraries/bindings) will need to be 
> updated  to reflect changes, e.g.
> 
> extern(C) void*? malloc(size_t size); // may return null!
> 
> which is great because it will provide additional safety. I've seen 
> quite  a lot of code that don't test returned value against null (which 
> is a  mistake, I believe).

Which makes me think of this: pointers being non-nullable by default 
will make it easy to make mistakes when writing C bindings. A 
programmer might see this C declaration:

	void* malloc(size_t size);

and naively translate it to D like this:

	extern(C) void* malloc(size_t size);

without noticing the change in semantics.

For pointer arguments it's not much of a problem: the worse that can 
happen is that it blocks you from passing a null value when you should 
(in which case you can update the bindings). For a return value it's 
more troublesome because you're implicitly adding a promise that the 
function will not return null, and you might not realize it's wrong 
until it does indeed return null and your program crashes with a 
segfault.

Not that I think it's worth bothering too much, but it's something to 
keep in mind.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/
September 30, 2009
Re: Null references redux
Wed, 30 Sep 2009 17:05:18 -0400, Jeremie Pelletier thusly wrote:

> language_fan wrote:
>> Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:
>> 
>>> Don wrote:
>>>> Greater risks come from using more complicated algorithms.
>>>> Brute-force algorithms are always the easiest ones to get right <g>.
>>> I'm not sure I agree with that. Those algorithms are pretty isolated
>>> and really easy to write unittests for so I don't see where the risk
>>> is when writing more complex algorithms, it's obviously harder, but
>>> not riskier.
>> 
>> Do you recommend writing larger algorithms like a hard real-time
>> distributed (let's say e.g. for 100+ processes/nodes) garbage collector
>> or even larger stuff like btrfs or ntfs file system drivers in
>> assembly? Don't you care about portability? Of course it would be nice
>> to provide optimal solution for each platform and for each use case,
>> but unfortunately the TCO thinking managers do not often agree.
> 
> Why does everyone associate complexity with assembly? You can write a
> more complex algorithm in the same language as the original one and get
> quite a good performance boost (ie binary search vs walking an array).
> Assembly is only useful to optimize when you found the optimal algorithm
> and want to lower its overhead a step further.
> 
> I don't recommend any language anyways, the base algorithm is often
> independent of its implementation language, be it implemented in C#, D
> or assembly its gonna do the same thing at different performance levels.
> 
> For example a simple binary search is already faster in D than in say
> JavaScript, but its even faster in assembly than in D, that doesn't make
> your entire program harder to code, nor does it change the logic.

Well I meant that we can assume the algorithm choice is already optimal.

Porting the high level program to assembly tends to grow the line count 
quite a bit. For instance I have experience converting Java code to 
Scala, and C++ to Haskell. In both cases the LOC will decrease about 
50-90%. If you convert things like foreach, ranges, complex expressions, 
lambdas, an scope() constructs to assembly, it will increase the line 
count at least one order of magnitude. Reading the lower level code is 
much harder. And you lose important safety nets like the type system.
September 30, 2009
Re: Null references redux
language_fan wrote:
> Wed, 30 Sep 2009 17:05:18 -0400, Jeremie Pelletier thusly wrote:
> 
>> language_fan wrote:
>>> Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:
>>>
>>>> Don wrote:
>>>>> Greater risks come from using more complicated algorithms.
>>>>> Brute-force algorithms are always the easiest ones to get right <g>.
>>>> I'm not sure I agree with that. Those algorithms are pretty isolated
>>>> and really easy to write unittests for so I don't see where the risk
>>>> is when writing more complex algorithms, it's obviously harder, but
>>>> not riskier.
>>> Do you recommend writing larger algorithms like a hard real-time
>>> distributed (let's say e.g. for 100+ processes/nodes) garbage collector
>>> or even larger stuff like btrfs or ntfs file system drivers in
>>> assembly? Don't you care about portability? Of course it would be nice
>>> to provide optimal solution for each platform and for each use case,
>>> but unfortunately the TCO thinking managers do not often agree.
>> Why does everyone associate complexity with assembly? You can write a
>> more complex algorithm in the same language as the original one and get
>> quite a good performance boost (ie binary search vs walking an array).
>> Assembly is only useful to optimize when you found the optimal algorithm
>> and want to lower its overhead a step further.
>>
>> I don't recommend any language anyways, the base algorithm is often
>> independent of its implementation language, be it implemented in C#, D
>> or assembly its gonna do the same thing at different performance levels.
>>
>> For example a simple binary search is already faster in D than in say
>> JavaScript, but its even faster in assembly than in D, that doesn't make
>> your entire program harder to code, nor does it change the logic.
> 
> Well I meant that we can assume the algorithm choice is already optimal.
> 
> Porting the high level program to assembly tends to grow the line count 
> quite a bit. For instance I have experience converting Java code to 
> Scala, and C++ to Haskell. In both cases the LOC will decrease about 
> 50-90%. If you convert things like foreach, ranges, complex expressions, 
> lambdas, an scope() constructs to assembly, it will increase the line 
> count at least one order of magnitude. Reading the lower level code is 
> much harder. And you lose important safety nets like the type system.

Yeah but I don't rate my code based on the number of lines I write, but 
rather on how well it performs :)

I usually only go into assembly after profiling, or when I know from the 
start its gonna be faster, such as matrix multiplication.

If lines of code were more important than performance, you'd get entire 
OSes and all their programs written in javascript, and you'd wait 20 
minutes for your computer to boot.
September 30, 2009
Re: Null references redux
Michel Fortin:

> For a return value it's 
> more troublesome because you're implicitly adding a promise that the 
> function will not return null, and you might not realize it's wrong 
> until it does indeed return null and your program crashes with a 
> segfault.

I see.
It's a matter of how much you value safety in your language. If you want a safer language, like Cyclone tries to be, the compiler may disallow function signatures like:
extern(C) void* foo(size_t size);

And force to use:
extern(C) void*? foo(size_t size);

Because the D compiler can't be sure that foo() returns a nonnull. In such situation you may just use the function like that, that returns a nullable pointer. This gives no overhead, and no safety.

Or you can add a bit of overhead and use something like an enforce (or an if) to create a nonnullable pointer from the nullable result of foo().

Finally if you are very sure your C function never returns a null, and you really want to use a nonnullable pointer around in your code, but you don't want to pay for the little null test in the D code, then you hard cast the nullable pointer to a nonnullable one, but I hope this is done in really uncommon situations.

I think this may solve the problem in a good enough way.

Bye,
bearophile
October 01, 2009
Re: Null references redux
On 30/09/2009 16:53, Jeremie Pelletier wrote:
> Yigal Chripun wrote:
>> On 29/09/2009 16:41, Jeremie Pelletier wrote:
>>
>>> What I argued about was your view on today's software being too big and
>>> complex to bother optimize it.
>>>
>>
>> that is not what I said.
>> I was saying that hand optimized code needs to be kept at minimum and
>> only for visible bottlenecks, because the risk of introducing
>> low-level unsafe code is bigger in more complex and bigger software.
>
> What's wrong with taking a risk? If you know what you're doing where is
> the risk, and if now how will you learn? If you write your software
> correctly, you could add countless assembly optimizations and never
> compromise the security of the entire thing, because these optimizations
> are isolated, so if it crashes there you have only a narrow area to
> debug within.
>
> There are some parts where hand optimizing is almost useless, like
> network I/O since latency is already so high having a faster code won't
> make a difference.
>
> And sometimes the optimization doesn't even need assembly, it just
> requires using a different high level construct or a different
> algorithm. The first optimization is to get the most efficient data
> structures with the most efficient algorithms for a given task, and THEN
> if you can't optimize it more you dig into assembly.
>
> People seem to think assembly is something magical and incredibly hard,
> it's not.
>
> Jeremie

When I said optimizing, I meant lowering the implementation level by 
using lower level language constructs (pointers vs. references for 
example) and asm instead of D.
Assume that the choice of algorithm and data structures is optimal.

Like language_fan wrote, when you lower the level your increase your LOC 
and your loose all sorts of safety features.
statistically speaking there's about a bug per 2000LOC on average so you 
also increase the chance of a bug.
All that together mean a higher risk.

your ASM implementation of binary search could be slightly faster than a 
comparable Haskel implementation, but the latter would be much easier to 
formally prove that it's correct.

I don't know about you, but I prefer hospital equipment, airplanes, 
cars, etc, to be correct even if they'll be a couple percent slower.
October 01, 2009
Re: Null references redux
Jeremie Pelletier wrote:
> Don wrote:
>> Jeremie Pelletier wrote:
>>> Yigal Chripun wrote:
>>>> On 29/09/2009 16:41, Jeremie Pelletier wrote:
>>>>
>>>>> What I argued about was your view on today's software being too big 
>>>>> and
>>>>> complex to bother optimize it.
>>>>>
>>>>
>>>> that is not what I said.
>>>> I was saying that hand optimized code needs to be kept at minimum 
>>>> and only for visible bottlenecks, because the risk of introducing 
>>>> low-level unsafe code is bigger in more complex and bigger software.
>>>
>>> What's wrong with taking a risk? If you know what you're doing where 
>>> is the risk, and if now how will you learn? If you write your 
>>> software correctly, you could add countless assembly optimizations 
>>> and never compromise the security of the entire thing, because these 
>>> optimizations are isolated, so if it crashes there you have only a 
>>> narrow area to debug within.
>>>
>>> There are some parts where hand optimizing is almost useless, like 
>>> network I/O since latency is already so high having a faster code 
>>> won't make a difference.
>>>
>>> And sometimes the optimization doesn't even need assembly, it just 
>>> requires using a different high level construct or a different 
>>> algorithm. The first optimization is to get the most efficient data 
>>> structures with the most efficient algorithms for a given task, and 
>>> THEN if you can't optimize it more you dig into assembly.
>>>
>>> People seem to think assembly is something magical and incredibly 
>>> hard, it's not.
>>>
>>> Jeremie
>>
>> Also, if you're using asm on something other than a small, simple 
>> loop, you're probably doing something badly wrong. Therefore, it 
>> should always be localised, and easy to test thoroughly. I don't think 
>> local extreme optimisation is a big risk.
> 
> That's also how I do it once I find the ideal algorithm, I've never had 
> any problems or seen any risk with this technique, I did see some good 
> performance gains however.
> 
>> Greater risks come from using more complicated algorithms. Brute-force 
>> algorithms are always the easiest ones to get right <g>.
> 
> I'm not sure I agree with that. Those algorithms are pretty isolated and 
> really easy to write unittests for so I don't see where the risk is when 
> writing more complex algorithms, it's obviously harder, but not riskier.

By "riskier" I mean "more chance of containing an error".

I'm partly basing this on my recent experience with writing BigInt. The 
low-level asm routines are easy to get right, and it's easy to tell when 
you've go them wrong. They do brute-force stuff, like schoolbook O(n^2) 
multiplication, and importantly, _there are no special cases_ because it 
needs to be fast.
But the higher-level O(n^1.3) multiplication algorithms are full of 
special cases, and that's where the bugs are.
October 01, 2009
Re: Null references redux
language_fan wrote:
> Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:
> 
>> Don wrote:
>>> Greater risks come from using more complicated algorithms. Brute-force
>>> algorithms are always the easiest ones to get right <g>.
>> I'm not sure I agree with that. Those algorithms are pretty isolated and
>> really easy to write unittests for so I don't see where the risk is when
>> writing more complex algorithms, it's obviously harder, but not riskier.
> 
> Do you recommend writing larger algorithms like a hard real-time 
> distributed (let's say e.g. for 100+ processes/nodes) garbage collector 
> or even larger stuff like btrfs or ntfs file system drivers in assembly? 
> Don't you care about portability? Of course it would be nice to provide 
> optimal solution for each platform and for each use case, but 
> unfortunately the TCO thinking managers do not often agree.

You deal with this by ensuring that you have a clear division between 
"simple but needs to be as fast as possible" (which you do low-level 
optimisation on) and "complicated, but less speed critical".
It's a classic problem of separation of concerns: you need to ensure 
that no piece of code has requirements to be fast AND clever at the same 
time.

Incidentally, it's usually not possible to make something optimally fast 
unless it's really simple.
So no, you should never do something complicated in asm.
20 21 22 23 24 25
Top | Discussion index | About this forum | D home