topN using a heap (page 4)

On 1/18/16 6:44 PM, Ilya wrote: > On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: >> On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: >>> A RNGs don't improve worst case. It only changes an permutation for >>> worst case. --Ilya >> >> Still, use of RNG makes it impossible to construct the worst case >> beforehand, once and for all. In that sense, this is a regression. > > No, it is definitely possible, because RNGs are Pseudo-RNGs. --Ilya unpredictableSeed uses the system clock as a source of randomness, so we're good there. -- Andrei

January 18, 2016

Re: topN using a heap

Posted by Ilya
in reply to Andrei Alexandrescu

Permalink

Ilya

Posted in reply to Andrei Alexandrescu

Permalink

On Monday, 18 January 2016 at 23:39:02 UTC, Andrei Alexandrescu wrote:
> On 1/18/16 6:18 PM, Ilya wrote:
>> On Monday, 18 January 2016 at 20:45:56 UTC, Ivan Kazmenko wrote:
>>> On Monday, 18 January 2016 at 12:00:10 UTC, Ivan Kazmenko wrote:
>>>> Here goes the test which shows quadratic behavior for the new version:
>>>> http://dpaste.dzfl.pl/e4b3bc26c3cf
>>>> (dpaste kills the slow code before it completes the task)
>>>
>>> Correction: this is the result of removing a uniform call in pull
>>> 3921.  Since we are supposed to discuss the improvements related to
>>> pull 3934 here, I created a separate entry for the issue:
>>> https://issues.dlang.org/show_bug.cgi?id=15583.
>>
>> A RNGs don't improve worst case. It only changes an permutation for
>> worst case. --Ilya
>
> Well it does improve things. The probability of hitting the worst case repeatedly is practically zero, and it's impossible to create an attack input.
>
> I'm not sure whether we should worry about this. Probably we do because sort attacks are a press favorite.
>
> The nice thing about not relying on randomness is that pure functions can call sort, topN etc.
>
> As a sort of a compromise I was thinking of seeding the RNG with not only the length of the range, but also the integral representation of the address of the first element. This would still allow an attack if the input is always at the same address.
>
>
> Thoughts?
>
> Andrei

1. Yes, probability of hitting the worst case repeatedly is is practically zero. But RNGs do not change this probability.
2. It is possible to build attack for our RNGs, because they are Pseudo-RNGs.
--Ilya

On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: > On 1/18/16 6:44 PM, Ilya wrote: >> On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: >>> On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: >>>> A RNGs don't improve worst case. It only changes an permutation for >>>> worst case. --Ilya >>> >>> Still, use of RNG makes it impossible to construct the worst case >>> beforehand, once and for all. In that sense, this is a regression. >> >> No, it is definitely possible, because RNGs are Pseudo-RNGs. --Ilya > > unpredictableSeed uses the system clock as a source of randomness, so we're good there. -- Andrei Would this work for pure functions? --Ilya

On 01/19/2016 12:39 AM, Andrei Alexandrescu wrote: > On 1/18/16 6:18 PM, Ilya wrote: >> On Monday, 18 January 2016 at 20:45:56 UTC, Ivan Kazmenko wrote: >>> On Monday, 18 January 2016 at 12:00:10 UTC, Ivan Kazmenko wrote: >>>> Here goes the test which shows quadratic behavior for the new version: >>>> http://dpaste.dzfl.pl/e4b3bc26c3cf >>>> (dpaste kills the slow code before it completes the task) >>> >>> Correction: this is the result of removing a uniform call in pull >>> 3921. Since we are supposed to discuss the improvements related to >>> pull 3934 here, I created a separate entry for the issue: >>> https://issues.dlang.org/show_bug.cgi?id=15583. >> >> A RNGs don't improve worst case. It only changes an permutation for >> worst case. --Ilya > > Well it does improve things. The probability of hitting the worst case > repeatedly is practically zero, and it's impossible to create an attack > input. > > I'm not sure whether we should worry about this. Probably we do because > sort attacks are a press favorite. > > The nice thing about not relying on randomness is that pure functions > can call sort, topN etc. > > As a sort of a compromise I was thinking of seeding the RNG with not > only the length of the range, but also the integral representation of > the address of the first element. This would still allow an attack if > the input is always at the same address. > > > Thoughts? > > Andrei https://en.wikipedia.org/wiki/Introselect

On 01/19/2016 12:50 AM, Ilya wrote: > ... > > 1. Yes, probability of hitting the worst case repeatedly is is > practically zero. But RNGs do not change this probability. > 2. It is possible to build attack for our RNGs, because they are > Pseudo-RNGs. > --Ilya You also need to predict the seed. How do you do that?

On Monday, 18 January 2016 at 23:53:53 UTC, Timon Gehr wrote: > On 01/19/2016 12:50 AM, Ilya wrote: >> ... >> >> 1. Yes, probability of hitting the worst case repeatedly is is >> practically zero. But RNGs do not change this probability. >> 2. It is possible to build attack for our RNGs, because they are >> Pseudo-RNGs. >> --Ilya > > You also need to predict the seed. How do you do that? We can not use unpredictable seed (like system clock) in pure functions. --Ilya

On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: > unpredictableSeed uses the system clock as a source of randomness, so we're good there. -- Andrei I got problem with that even when crytographically secure randomness wasn't needed more than once. A specific case included adding jitter on some timeout to avoid all hosts to expire at the same time, and used mersenne twister with a pseudo random seed. There still were way too many hosts ending up with the same timeout. System time is not enough.

On 01/19/2016 12:51 AM, Ilya wrote: > On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: >> On 1/18/16 6:44 PM, Ilya wrote: >>> On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: >>>> On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: >>>>> A RNGs don't improve worst case. It only changes an permutation for >>>>> worst case. --Ilya >>>> >>>> Still, use of RNG makes it impossible to construct the worst case >>>> beforehand, once and for all. In that sense, this is a regression. >>> >>> No, it is definitely possible, because RNGs are Pseudo-RNGs. --Ilya >> >> unpredictableSeed uses the system clock as a source of randomness, so >> we're good there. -- Andrei > > Would this work for pure functions? --Ilya Only if they accept the RNG as an additional argument.

On Monday, 18 January 2016 at 23:55:38 UTC, Timon Gehr wrote: > On 01/19/2016 12:51 AM, Ilya wrote: >> On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: >>> On 1/18/16 6:44 PM, Ilya wrote: >>>> On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: >>>>> [...] >>>> >>>> No, it is definitely possible, because RNGs are Pseudo-RNGs. --Ilya >>> >>> unpredictableSeed uses the system clock as a source of randomness, so >>> we're good there. -- Andrei >> >> Would this work for pure functions? --Ilya > > Only if they accept the RNG as an additional argument. Exactly :) So, I hope we would not add Pseudo-RNGs in std.algorithm. --Ilya

On 01/19/2016 12:55 AM, Ilya wrote: > On Monday, 18 January 2016 at 23:53:53 UTC, Timon Gehr wrote: >> On 01/19/2016 12:50 AM, Ilya wrote: >>> ... >>> >>> 1. Yes, probability of hitting the worst case repeatedly is is >>> practically zero. But RNGs do not change this probability. >>> 2. It is possible to build attack for our RNGs, because they are >>> Pseudo-RNGs. >>> --Ilya >> >> You also need to predict the seed. How do you do that? > > We can not use unpredictable seed (like system clock) in pure functions. > --Ilya Clearly. The point is, there already was an impure implementation of topN whose expected linear running time is still specified in the documentation. The current implementation does not fulfill this bound.

Forums