April 22, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, 22 April 2016 at 22:10:47 UTC, Walter Bright wrote: > On 4/22/2016 2:38 PM, Johan Engelen wrote: >> I don't understand exactly what you mean; do you propose to resort to linear >> search after a removal happened? Or only do a linear search when the shadow data >> structure says the item is present? >> I don't know how often removals happen, but for the 140k elements list removals >> happens _very_ often. While compiling phobos, removals happen not for all >> modules, but for quite a few of them. > > I did a grep for "members.remove" and got no hits. Where are the removes happening? One of them is https://github.com/dlang/dmd/blob/5ea445c68451152d43595c9de4797b6ec1e4f57d/src/dtemplate.d#L6503, I think. — David | |||
April 22, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Johan Engelen | On Friday, 22 April 2016 at 21:38:41 UTC, Johan Engelen wrote:
> I don't understand exactly what you mean; do you propose to […] do a linear search when the shadow data structure says the item is present?
That's the idea. As long as you can reduce the need to do a full linear search by, say, 1 : 1000, it would be enough to make the quadratic behaviour disappear into the noise.
That being said, I did some estimates, and since the keys are only 8 bytes in size, Bloom filters in particular might not be worth it in this case, unfortunately. Quite sad, actually. I like Bloom filters.
— David
| |||
April 22, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger | On 4/22/2016 3:16 PM, David Nadlinger wrote:
> One of them is
> https://github.com/dlang/dmd/blob/5ea445c68451152d43595c9de4797b6ec1e4f57d/src/dtemplate.d#L6503,
> I think.
Definitely one.
| |||
April 22, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger | On 4/22/2016 3:07 PM, David Nadlinger wrote:
> On Friday, 22 April 2016 at 21:48:09 UTC, Walter Bright wrote:
>> Why not just use a hash table? D's builtin one?
>
> Apparently, some parts rely on the insertion order, although I'm not convinced
> they should. Module::importAll is one of them, but in a way that's trivial to
> avoid. I didn't check any other locations.
>
> If order is not important, a hash set should be fine.
The linear search is checking for membership. Another way is to have the 'this' have a field that says which says which members[] it is in, if any. Voila, the membership becomes an O(1) search.
| |||
April 22, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 4/22/2016 4:31 PM, Walter Bright wrote:
> On 4/22/2016 3:16 PM, David Nadlinger wrote:
>> One of them is
>> https://github.com/dlang/dmd/blob/5ea445c68451152d43595c9de4797b6ec1e4f57d/src/dtemplate.d#L6503,
>>
>> I think.
>
> Definitely one.
>
BTW, this looks like a particularly bad piece of engineering. The trouble is, it saves an index to the member, then does a bunch of semantic processing, then deletes what is on that index. But what if the members[] in the meantime shifts around?
There is an assert for that case, so it at least won't cause corruption, but it looks bad.
| |||
April 22, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Johan Engelen | On 4/15/2016 11:33 AM, Johan Engelen wrote: > The culprit is the linear search at the end of appendToModuleMember, because the > members list with Dsymbols gets pretty large (50k+ members). > https://github.com/D-Programming-Language/dmd/blob/master/src/dtemplate.d#L8012-L8026 https://github.com/dlang/dmd/pull/5693 | |||
April 23, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, 22 April 2016 at 23:49:22 UTC, Walter Bright wrote:
> BTW, this looks like a particularly bad piece of engineering. The trouble is, it saves an index to the member, then does a bunch of semantic processing, then deletes what is on that index. But what if the members[] in the meantime shifts around?
>
> There is an assert for that case, so it at least won't cause corruption, but it looks bad.
IIRC I was involved in this somehow. It has been some time, but I think the deal is that it only does so when all the other members that might have been added in the meantime have also been removed. I suppose it seemed like a good idea to save that local information locally instead of increasing the size of TemplateInstance by a pointer, of which there might be a bajillion instances in a typical "modern" D program.
— David
| |||
April 23, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger | On Friday, 22 April 2016 at 22:37:49 UTC, David Nadlinger wrote: > On Friday, 22 April 2016 at 21:38:41 UTC, Johan Engelen wrote: >> I don't understand exactly what you mean; do you propose to […] do a linear search when the shadow data structure says the item is present? > > That's the idea. As long as you can reduce the need to do a full linear search by, say, 1 : 1000, it would be enough to make the quadratic behaviour disappear into the noise. Well, this 140k list has a removal very early on, so the linear search is going to happen >100k times from then on ;) > That being said, I did some estimates, and since the keys are only 8 bytes in size, Bloom filters in particular might not be worth it in this case, unfortunately. Quite sad, actually. I like Bloom filters. I figured :-) | |||
April 23, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, 22 April 2016 at 23:33:24 UTC, Walter Bright wrote:
> On 4/22/2016 3:07 PM, David Nadlinger wrote:
>> On Friday, 22 April 2016 at 21:48:09 UTC, Walter Bright wrote:
>>> Why not just use a hash table? D's builtin one?
>>
>> Apparently, some parts rely on the insertion order, although I'm not convinced
>> they should. Module::importAll is one of them, but in a way that's trivial to
>> avoid. I didn't check any other locations.
>>
>> If order is not important, a hash set should be fine.
>
> The linear search is checking for membership. Another way is to have the 'this' have a field that says which says which members[] it is in, if any. Voila, the membership becomes an O(1) search.
Excellent. I didn't dare assume the code actually worked that simply.
| |||
April 23, 2016 Re: DMD internal: appendToModuleMember performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Johan Engelen | On Saturday, 23 April 2016 at 09:13:01 UTC, Johan Engelen wrote:
> Well, this 140k list has a removal very early on, so the linear search is going to happen >100k times from then on ;)
No – only for searching the particular removed item (or other false positives due to hash collisions).
— David
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply