April 22, 2016
On Friday, 22 April 2016 at 22:10:47 UTC, Walter Bright wrote:
> On 4/22/2016 2:38 PM, Johan Engelen wrote:
>> I don't understand exactly what you mean; do you propose to resort to linear
>> search after a removal happened? Or only do a linear search when the shadow data
>> structure says the item is present?
>> I don't know how often removals happen, but for the 140k elements list removals
>> happens _very_ often. While compiling phobos, removals happen not for all
>> modules, but for quite a few of them.
>
> I did a grep for "members.remove" and got no hits. Where are the removes happening?

One of them is https://github.com/dlang/dmd/blob/5ea445c68451152d43595c9de4797b6ec1e4f57d/src/dtemplate.d#L6503, I think.

 — David
April 22, 2016
On Friday, 22 April 2016 at 21:38:41 UTC, Johan Engelen wrote:
> I don't understand exactly what you mean; do you propose to […] do a linear search when the shadow data structure says the item is present?

That's the idea. As long as you can reduce the need to do a full linear search by, say, 1 : 1000, it would be enough to make the quadratic behaviour disappear into the noise.

That being said, I did some estimates, and since the keys are only 8 bytes in size, Bloom filters in particular might not be worth it in this case, unfortunately. Quite sad, actually. I like Bloom filters.

 — David
April 22, 2016
On 4/22/2016 3:16 PM, David Nadlinger wrote:
> One of them is
> https://github.com/dlang/dmd/blob/5ea445c68451152d43595c9de4797b6ec1e4f57d/src/dtemplate.d#L6503,
> I think.

Definitely one.

April 22, 2016
On 4/22/2016 3:07 PM, David Nadlinger wrote:
> On Friday, 22 April 2016 at 21:48:09 UTC, Walter Bright wrote:
>> Why not just use a hash table? D's builtin one?
>
> Apparently, some parts rely on the insertion order, although I'm not convinced
> they should. Module::importAll is one of them, but in a way that's trivial to
> avoid. I didn't check any other locations.
>
> If order is not important, a hash set should be fine.

The linear search is checking for membership. Another way is to have the 'this' have a field that says which says which members[] it is in, if any. Voila, the membership becomes an O(1) search.

April 22, 2016
On 4/22/2016 4:31 PM, Walter Bright wrote:
> On 4/22/2016 3:16 PM, David Nadlinger wrote:
>> One of them is
>> https://github.com/dlang/dmd/blob/5ea445c68451152d43595c9de4797b6ec1e4f57d/src/dtemplate.d#L6503,
>>
>> I think.
>
> Definitely one.
>

BTW, this looks like a particularly bad piece of engineering. The trouble is, it saves an index to the member, then does a bunch of semantic processing, then deletes what is on that index. But what if the members[] in the meantime shifts around?

There is an assert for that case, so it at least won't cause corruption, but it looks bad.
April 22, 2016
On 4/15/2016 11:33 AM, Johan Engelen wrote:
> The culprit is the linear search at the end of appendToModuleMember, because the
> members list with Dsymbols gets pretty large (50k+ members).
> https://github.com/D-Programming-Language/dmd/blob/master/src/dtemplate.d#L8012-L8026

https://github.com/dlang/dmd/pull/5693
April 23, 2016
On Friday, 22 April 2016 at 23:49:22 UTC, Walter Bright wrote:
> BTW, this looks like a particularly bad piece of engineering. The trouble is, it saves an index to the member, then does a bunch of semantic processing, then deletes what is on that index. But what if the members[] in the meantime shifts around?
>
> There is an assert for that case, so it at least won't cause corruption, but it looks bad.

IIRC I was involved in this somehow. It has been some time, but I think the deal is that it only does so when all the other members that might have been added in the meantime have also been removed. I suppose it seemed like a good idea to save that local information locally instead of increasing the size of TemplateInstance by a pointer, of which there might be a bajillion instances in a typical "modern" D program.

 — David
April 23, 2016
On Friday, 22 April 2016 at 22:37:49 UTC, David Nadlinger wrote:
> On Friday, 22 April 2016 at 21:38:41 UTC, Johan Engelen wrote:
>> I don't understand exactly what you mean; do you propose to […] do a linear search when the shadow data structure says the item is present?
>
> That's the idea. As long as you can reduce the need to do a full linear search by, say, 1 : 1000, it would be enough to make the quadratic behaviour disappear into the noise.

Well, this 140k list has a removal very early on, so the linear search is going to happen >100k times from then on ;)

> That being said, I did some estimates, and since the keys are only 8 bytes in size, Bloom filters in particular might not be worth it in this case, unfortunately. Quite sad, actually. I like Bloom filters.

I figured :-)

April 23, 2016
On Friday, 22 April 2016 at 23:33:24 UTC, Walter Bright wrote:
> On 4/22/2016 3:07 PM, David Nadlinger wrote:
>> On Friday, 22 April 2016 at 21:48:09 UTC, Walter Bright wrote:
>>> Why not just use a hash table? D's builtin one?
>>
>> Apparently, some parts rely on the insertion order, although I'm not convinced
>> they should. Module::importAll is one of them, but in a way that's trivial to
>> avoid. I didn't check any other locations.
>>
>> If order is not important, a hash set should be fine.
>
> The linear search is checking for membership. Another way is to have the 'this' have a field that says which says which members[] it is in, if any. Voila, the membership becomes an O(1) search.

Excellent. I didn't dare assume the code actually worked that simply.
April 23, 2016
On Saturday, 23 April 2016 at 09:13:01 UTC, Johan Engelen wrote:
> Well, this 140k list has a removal very early on, so the linear search is going to happen >100k times from then on ;)

No – only for searching the particular removed item (or other false positives due to hash collisions).

 — David
1 2
Next ›   Last »