August 26, 2010
Hello Jacob,

> On 2010-08-26 08:13, Walter Bright wrote:
> 
>> Steven Schveighoffer wrote:
>> 
>>> On Tue, 24 Aug 2010 03:58:57 -0400, Walter Bright
>>> <newshound2@digitalmars.com> wrote:
>>>> Steven Schveighoffer wrote:
>>>> 
>>>>> With profiling enabled, gprof outputs this as the top hitters:
>>>>> Flat profile:
>>>>> Each sample counts as 0.01 seconds.
>>>>> % cumulative self self total
>>>>> time seconds seconds calls ms/call ms/call name
>>>>> 77.76 6.68 6.68 2952 2.26 2.26 elf_findstr(Outbuffer*, char
>>>>> const*,
>>>>> char const*)
>>>>> 2.10 6.86 0.18 4342 0.04 0.04 searchfixlist
>>>> elf_findstr definitely looks like a problem area. I can't look at
>>>> it right now, so can you post this to bugzilla please?
>>>> 
>>> http://d.puremagic.com/issues/show_bug.cgi?id=4721
>>> 
>> Let me know how this works:
>> 
>> http://www.dsource.org/projects/dmd/changeset/628
>> 
> Shouldn't machobj.c get the same optimization?
> 

Shouldn't something like a table lookup be shared rather than duplicated?

-- 
... <IXOYE><



August 26, 2010
Steven Schveighoffer wrote:
> Better, now takes 20 seconds vs over 60.  The new culprit:
> 
> Flat profile:
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  75.79      6.51     6.51     8103     0.80     0.80  TemplateDeclaration::toJsonBuffer(OutBuffer*)

This is most peculiar, as that should have shown up on the previous profile.

> I haven't looked at toJsonBuffer at all (btw, why are we calling this function if I'm not outputting json?)

That only happens if -X is passed on the command line, or one of the files on the command line has a .json index.
August 26, 2010
On Thu, 26 Aug 2010 12:53:59 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> Steven Schveighoffer wrote:
>> Better, now takes 20 seconds vs over 60.  The new culprit:
>>  Flat profile:
>>  Each sample counts as 0.01 seconds.
>>   %   cumulative   self              self     total
>>  time   seconds   seconds    calls  ms/call  ms/call  name
>>  75.79      6.51     6.51     8103     0.80     0.80  TemplateDeclaration::toJsonBuffer(OutBuffer*)
>
> This is most peculiar, as that should have shown up on the previous profile.

I did some more testing.  I think I compiled the profiled version of the svn trunk dmd wrong.  This is what happens when you let idiots debug your code for you ;)

I recompiled it, and here is the new list:


Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 80.31     11.99    11.99    19000     0.63     0.63  searchfixlist
  0.67     12.09     0.10   203173     0.00     0.00  StringTable::search(char const*, unsigned int)
  0.60     12.18     0.09   369389     0.00     0.00  Lexer::scan(Token*)
  0.54     12.26     0.08   953613     0.00     0.00  ScopeDsymbol::search(Loc, Identifier*, int)
  0.47     12.33     0.07  1449798     0.00     0.00  calccodsize
  0.40     12.39     0.06   587814     0.00     0.00  code_calloc
  0.40     12.45     0.06    41406     0.00     0.00  pinholeopt
  0.33     12.50     0.05   901563     0.00     0.00  _aaGetRvalue(AA*, void*)
  0.33     12.55     0.05   138329     0.00     0.00  reftoident(int, unsigned long long, Symbol*, unsigned long long, int)
  0.33     12.60     0.05    26849     0.00     0.00  ecom(elem**)
  0.27     12.64     0.04   230869     0.00     0.00  Type::totym()
  0.27     12.68     0.04    62784     0.00     0.00  touchfunc(int)
  0.27     12.72     0.04    37623     0.00     0.00  optelem(elem*, int)
  0.27     12.76     0.04    28348     0.00     0.00  assignaddrc

It looks like searchfixlist is another linear search, looking back at the other profile, it was the second highest consumer of runtime at 2% of the runtime before your fix, so it catapulted up to 80% of the runtime.  It looks like a linked-list search, so it might benefit from a hash table as well?  I'm not really sure.

Also, the 2% was on only one file being compiled, with the shortened run time, I was able to build all of dcollections, that's why the # of calls for searchfixlist is much higher.

I'll update the bug.
August 26, 2010
Steven Schveighoffer wrote:
> I'll update the bug.

Thanks!
August 26, 2010
Steven Schveighoffer wrote:
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  80.31     11.99    11.99    19000     0.63     0.63  searchfixlist


Just for fun, searchfixlist goes back at least to 1983 or so.
August 26, 2010
Walter Bright:
> Just for fun, searchfixlist goes back at least to 1983 or so.

It contains this if (I am not able to indent it well):

			if (s->Sseg == p->Lseg &&
                (s->Sclass == SCstatic ||
#if TARGET_LINUX || TARGET_OSX || TARGET_FREEBSD || TARGET_SOLARIS
                (!(config.flags3 & CFG3pic) && s->Sclass == SCglobal)) &&
#else
                  s->Sclass == SCglobal) &&
#endif
                  s->Sxtrnnum == 0 && p->Lflags & CFselfrel) {


How do you rewrite that in good D code? A possible way is to split that messy if into two nested ifs. Between first and second if you define a boolean variable in two different ways using a static if. And in the second if you use the boolean variable and the second part of the runtime test. Something like this:

if (part1) {
	static if (versions_test) {
		bool aux = ...
	} else {
		bool aux = ...
	}

	if (aux && part2) {
		// ...
	} else {
		// ...
	}
}

aux is defined in the middle and not before the first if because performing this runtime test is not necessary when part1 fails.

Bye,
bearophile
August 27, 2010
Hello Walter,

> Steven Schveighoffer wrote:
> 
>> Each sample counts as 0.01 seconds.
>> %   cumulative   self              self     total
>> time   seconds   seconds    calls  ms/call  ms/call  name
>> 80.31     11.99    11.99    19000     0.63     0.63  searchfixlist
> Just for fun, searchfixlist goes back at least to 1983 or so.
> 

Early or late '83? I ask because *I* go back to '83 or so. :)

-- 
... <IXOYE><



August 27, 2010
BCS wrote:
> Hello Walter,
> 
>> Steven Schveighoffer wrote:
>>
>>> Each sample counts as 0.01 seconds.
>>> %   cumulative   self              self     total
>>> time   seconds   seconds    calls  ms/call  ms/call  name
>>> 80.31     11.99    11.99    19000     0.63     0.63  searchfixlist
>> Just for fun, searchfixlist goes back at least to 1983 or so.
>>
> 
> Early or late '83? I ask because *I* go back to '83 or so. :)

June 7th, 3:26 PM. Give or take 6 months.
August 27, 2010
On 2010-08-26 16:14, BCS wrote:
> Hello Jacob,
>
>> On 2010-08-26 08:13, Walter Bright wrote:
>>
>>> Steven Schveighoffer wrote:
>>>
>>>> On Tue, 24 Aug 2010 03:58:57 -0400, Walter Bright
>>>> <newshound2@digitalmars.com> wrote:
>>>>> Steven Schveighoffer wrote:
>>>>>
>>>>>> With profiling enabled, gprof outputs this as the top hitters:
>>>>>> Flat profile:
>>>>>> Each sample counts as 0.01 seconds.
>>>>>> % cumulative self self total
>>>>>> time seconds seconds calls ms/call ms/call name
>>>>>> 77.76 6.68 6.68 2952 2.26 2.26 elf_findstr(Outbuffer*, char
>>>>>> const*,
>>>>>> char const*)
>>>>>> 2.10 6.86 0.18 4342 0.04 0.04 searchfixlist
>>>>> elf_findstr definitely looks like a problem area. I can't look at
>>>>> it right now, so can you post this to bugzilla please?
>>>>>
>>>> http://d.puremagic.com/issues/show_bug.cgi?id=4721
>>>>
>>> Let me know how this works:
>>>
>>> http://www.dsource.org/projects/dmd/changeset/628
>>>
>> Shouldn't machobj.c get the same optimization?
>>
>
> Shouldn't something like a table lookup be shared rather than duplicated?

Yes, that would be better.


-- 
/Jacob Carlborg
1 2 3 4 5 6 7 8 9 10 11
Next ›   Last »