May 06, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright wrote: > Walter Bright wrote: >> Alex Makhotin wrote: >>> It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 kernel 2.6.32.4), to get the actual error messages about the undefined identifier. >> >> Definitely there's a problem. > > The problem is the spell checker is O(n*n) on the number of characters in the undefined identifier. Is there a way to disable it? -- Alex Makhotin, the founder of BITPROX, http://bitprox.com |
May 06, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alex Makhotin | Alex Makhotin wrote:
> Walter Bright wrote:
>> Walter Bright wrote:
>>> Alex Makhotin wrote:
>>>> It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 kernel 2.6.32.4), to get the actual error messages about the undefined identifier.
>>>
>>> Definitely there's a problem.
>>
>> The problem is the spell checker is O(n*n) on the number of characters in the undefined identifier.
>
> Is there a way to disable it?
>
Currently, no.
|
May 06, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Wed, 05 May 2010 23:45:50 -0400, Walter Bright <newshound1@digitalmars.com> wrote:
> Walter Bright wrote:
>> Alex Makhotin wrote:
>>> It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 kernel 2.6.32.4), to get the actual error messages about the undefined identifier.
>> Definitely there's a problem.
>
> The problem is the spell checker is O(n*n) on the number of characters in the undefined identifier.
That can't be it. The identifier shown by Alex is only 33 characters. O(n^2) is not that slow, especially for smaller variables. There must be other factors you're not considering...
-Steve
|
May 06, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Hello Walter, > Walter Bright wrote: > >> Alex Makhotin wrote: >> >>> It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 >>> kernel 2.6.32.4), to get the actual error messages about the >>> undefined identifier. >>> >> Definitely there's a problem. >> > The problem is the spell checker is O(n*n) on the number of characters > in the undefined identifier. > How about switch algos for long identifiers: you could bucket the knows by length and compare histograms on things of similar length. Or maybe just turn it off for long names. -- ... <IXOYE>< |
May 06, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 2010-05-05 23:45:50 -0400, Walter Bright <newshound1@digitalmars.com> said: > Walter Bright wrote: >> Alex Makhotin wrote: >>> It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 kernel 2.6.32.4), to get the actual error messages about the undefined identifier. >> >> Definitely there's a problem. > > The problem is the spell checker is O(n*n) on the number of characters in the undefined identifier. That's an algorithm that can't scale then. Checking the Levenshtein distance for each known identifier within a small difference in length would be a better idea. (Clang is said to use the Levenshtein distance, it probably does something of the sort.) http://en.wikipedia.org/wiki/Levenshtein_distance -- Michel Fortin michel.fortin@michelf.com http://michelf.com/ |
May 06, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | Steven Schveighoffer, el 6 de mayo a las 07:17 me escribiste: > On Wed, 05 May 2010 23:45:50 -0400, Walter Bright <newshound1@digitalmars.com> wrote: > > >Walter Bright wrote: > >>Alex Makhotin wrote: > >>>It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 kernel 2.6.32.4), to get the actual error messages about the undefined identifier. > >> Definitely there's a problem. > > > >The problem is the spell checker is O(n*n) on the number of > >characters in the undefined identifier. > > That can't be it. The identifier shown by Alex is only 33 characters. O(n^2) is not that slow, especially for smaller variables. There must be other factors you're not considering... Run a profiler. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- No existiría el sonido del mar si faltara en la vida oreja y caracol. -- Ricardo Vaporeso. Cosquín, 1908. |
May 06, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | Steven Schveighoffer wrote:
> That can't be it. The identifier shown by Alex is only 33 characters. O(n^2) is not that slow, especially for smaller variables. There must be other factors you're not considering...
I recompiled dmd with the profiler (-gt switch) which confirmed it.
|
May 06, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Thu, 06 May 2010 17:07:12 -0400, Walter Bright <newshound1@digitalmars.com> wrote:
> Steven Schveighoffer wrote:
>> That can't be it. The identifier shown by Alex is only 33 characters. O(n^2) is not that slow, especially for smaller variables. There must be other factors you're not considering...
>
> I recompiled dmd with the profiler (-gt switch) which confirmed it.
So a single unknown symbol (from Alex's example) which can be checked against each existing symbol in O(n^2) time, takes 40 seconds on a decent CPU? How many other symbols are there? 33^2 == 1089, if there are 10000 symbols, that's 10 million iterations, that shouldn't take 40 seconds to run, should it? Are there more symbols to compare against? Do you use heuristics to prune the search? For example, if the max distance is 2, and the difference in length between two strings is >2, you should be able to return immediately.
-Steve
|
May 06, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | Steven Schveighoffer wrote: > On Thu, 06 May 2010 17:07:12 -0400, Walter Bright <newshound1@digitalmars.com> wrote: > >> Steven Schveighoffer wrote: >>> That can't be it. The identifier shown by Alex is only 33 characters. O(n^2) is not that slow, especially for smaller variables. There must be other factors you're not considering... >> >> I recompiled dmd with the profiler (-gt switch) which confirmed it. > > So a single unknown symbol (from Alex's example) which can be checked against each existing symbol in O(n^2) time, takes 40 seconds on a decent CPU? How many other symbols are there? 33^2 == 1089, if there are 10000 symbols, that's 10 million iterations, that shouldn't take 40 seconds to run, should it? Are there more symbols to compare against? Do you use heuristics to prune the search? For example, if the max distance is 2, and the difference in length between two strings is >2, you should be able to return immediately. Check out the profiler output. It's clearly the vast number of calls to the symbol lookup, not the time spent in each call. ----------------------------------------- Num Tree Func Per Calls Time Time Call 50409318 632285778 145858160 2 Dsymbol *syscall ScopeDsymbol::search(Loc ,Identifier *,int ) 50411264 131394915 106356855 2 void **syscall StringTable::search(char const *,unsigned ) 50409329 341960075 105532978 2 Dsymbol *syscall DsymbolTable::lookup(Identifier *) 50409329 236427096 105037393 2 StringValue *syscall StringTable::lookup(char const *,unsigned ) 12602340 613890619 67393753 5 Dsymbol *syscall Scope::search(Loc ,Identifier *,Dsymbol **) 12602178 693915197 66918360 5 void *cdecl scope_search_fp(void *,char const *) 25204505 461352920 52529164 2 Dsymbol *syscall Module::search(Loc ,Identifier *,int ) 50412137 25038474 25038474 0 unsigned cdecl Dchar::calcHash(char const *,unsigned ) 3520 1428323068 20349375 5781 void *cdecl spellerX(char const *,void *cdecl (*)(void *,char const *),void *,char const *,int ) 12602664 6811916 6811916 0 syscall Identifier::Identifier(char const *,int ) 12602178 6299089 6299089 0 void cdecl Module::clearCache() 12602183 6151175 6151175 0 Module *syscall Module::isModule() 1600 11329 4261 2 StringValue *syscall StringTable::update(char const *,unsigned ) ----------------------------------------- |
May 07, 2010 Re: dmd 1.060 and 2.045 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Michel Fortin | On 6-5-2010 22:37, Michel Fortin wrote:
> On 2010-05-05 23:45:50 -0400, Walter Bright <newshound1@digitalmars.com> said:
>
>> Walter Bright wrote:
>>> Alex Makhotin wrote:
>>>> It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 kernel 2.6.32.4), to get the actual error messages about the undefined identifier.
>>>
>>> Definitely there's a problem.
>>
>> The problem is the spell checker is O(n*n) on the number of characters
>> in the undefined identifier.
>
> That's an algorithm that can't scale then.
>
> Checking the Levenshtein distance for each known identifier within a small difference in length would be a better idea. (Clang is said to use the Levenshtein distance, it probably does something of the sort.)
>
> http://en.wikipedia.org/wiki/Levenshtein_distance
>
and especially this line:
# If we are only interested in the distance if it is smaller than a threshold k, then it suffices to compute a diagonal stripe of width 2k+1 in the matrix. In this way, the algorithm can be run in O(kl) time, where l is the length of the shortest string.
|
Copyright © 1999-2021 by the D Language Foundation