How would I optimize this parser? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » How would I optimize this parser?

Thread overview

How would I optimize this parser?

Oct 31, 2010

Oct 31, 2010

joining lists into texts [was: Re: How would I optimize this parser?]

Oct 31, 2010

Oct 31, 2010

Nick Sabalausky

Re: higher-order functions
Oct 31, 2010 spir
Oct 31, 2010 Simen kjaeraas
Oct 31, 2010 bearophile
Nov 02, 2010 Steven Schveighoffer
Nov 01, 2010 Rainer Deyke
Nov 01, 2010 spir

Nov 01, 2010

Nov 01, 2010

Nov 01, 2010

Nov 01, 2010

Nov 01, 2010

Nov 01, 2010

Nov 01, 2010

Nov 01, 2010

Nov 03, 2010

Nov 03, 2010

October 31, 2010

How would I optimize this parser?

Posted by T.D.Spenser

T.D.Spenser

Attachments:

parser.d

I wrote a simple parser in D. The language for it looks like this:

[tagName content content [childTagName] [childTagName more content]]

The parser is here:

http://thoughtdispenser.net/files/parser.d

Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)?

October 31, 2010

Re: How would I optimize this parser?

Posted by bearophile
in reply to T.D.Spenser

bearophile

Posted in reply to T.D.Spenser

T.D.Spenser:

> Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)?

Very good :-)
First suggestions:
- Generally compile your D code using the -w switch (you have missing some overrides and more things)
- Add a main() with some benchmarking examples to your program, so I will have something to optimize for.
- Use the -profile compiler switch on the benchmarking examples to see what's slow.
- Use final classes or final methods where possible and keep an eye on memory allocations. Consider using structs in some places instead of classes.
- Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some cases an Appender helps, and in some cases some workaround may be needed.

See you later,
bearophile

October 31, 2010

joining lists into texts [was: Re: How would I optimize this parser?]

Posted by spir
in reply to bearophile

spir

Posted in reply to bearophile

On Sat, 30 Oct 2010 23:00:59 -0400
bearophile <bearophileHUGS@lycos.com> wrote:

> - Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some cases an Appender helps, and in some cases some workaround may be needed.

A use case of join (in std.string) is indeed to build a text from snippets.
But I personly use this func most often to build a text representation of list of elements of any kind. For this, join falls short:
* no auto-conversion to string
* no delimiters
Thus, I think the following may be useful in the stdlib (aside join):

=================================================
string listText(T)(T[] elements, string sep) {
    uint l = elements.length ;
    if (l == 0)
        return "" ;
    string[] texts ; texts.length = l ;
    foreach (uint i, T element ; elements)
        texts[i] = to!string(elements[i]) ;
    return join(texts, sep) ;
}
string listText(T)(T[] elements, string sep, string leftDelim, string rightDelim) {
    return format("%s%s%s", leftDelim, listText(elements, sep), rightDelim) ;
}

// testing
struct Symbol {
    string k;
    int v;
    string toString () {
        return format("%s:%s", this.k, to!string(this.v)) ;
    }
}

void main () {
    int[] ints = [1,2,3] ;
    writeln(listText(ints , "---")) ;
    writeln(listText(ints , " " , "(",")")) ;
    Symbol[3] symbols = [Symbol("a",1) , Symbol("b",2) , Symbol("c",3)] ;
    writeln(listText(symbols , " | " , "{ "," }")) ;
}

// writes:
1---2---3
(1 2 3)
{ a:1 | b:2 | c:3 }
==========================================

This works sensibly with any custom type where toString is defined. (else you get the type name ;-)

I would also love a mapping-func param, to allow expressing things like listText(ints, square, " "); but without optional parameters (AFAIK), combinations are problematic.

Also, I could not find functional methods like map, filter, reduce in std.functional. Where else? Also not in std.array. Map would be handy above to string-ify. And remove the need for a map func param in listText (I would be happy to contribute them if someone guides me on how to participate.)


Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

October 31, 2010

Re: joining lists into texts [was: Re: How would I optimize this parser?]

Posted by Nick Sabalausky
in reply to spir

Nick Sabalausky

Posted in reply to spir

"spir" <denis.spir@gmail.com> wrote in message news:mailman.45.1288523296.21107.digitalmars-d-learn@puremagic.com...
>
>Also, I could not find functional methods like map, filter,
> reduce in std.functional. Where else? Also not in std.array.

std.algorithm. But the docs for it do need to be improved.

October 31, 2010

Re: higher-order functions

Posted by spir
in reply to Nick Sabalausky

spir

Posted in reply to Nick Sabalausky

On Sun, 31 Oct 2010 17:31:54 -0400
"Nick Sabalausky" <a@a.a> wrote:

> "spir" <denis.spir@gmail.com> wrote in message news:mailman.45.1288523296.21107.digitalmars-d-learn@puremagic.com...
> >
> >Also, I could not find functional methods like map, filter,
> > reduce in std.functional. Where else? Also not in std.array.
> 
> std.algorithm. But the docs for it do need to be improved.

I find it very strange that map/filter/reduce take *strings* as func expressions, instead of function literals:
	int[] arr = [ 1, 2, 3, 4 ];
	auto squares = map!("a * a")(arr);
Why then have func literal? Func parameter is precisely the place where programmers like to use them, isn't it? Isn't this a sign that D func literals are not considered as practical enough?

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

October 31, 2010

Re: higher-order functions

Posted by Simen kjaeraas
in reply to spir

Simen kjaeraas

Posted in reply to spir

spir <denis.spir@gmail.com> wrote:

> On Sun, 31 Oct 2010 17:31:54 -0400
> "Nick Sabalausky" <a@a.a> wrote:
>
>> "spir" <denis.spir@gmail.com> wrote in message
>> news:mailman.45.1288523296.21107.digitalmars-d-learn@puremagic.com...
>> >
>> >Also, I could not find functional methods like map, filter,
>> > reduce in std.functional. Where else? Also not in std.array.
>>
>> std.algorithm. But the docs for it do need to be improved.
>
> I find it very strange that map/filter/reduce take *strings* as func expressions, instead of function literals:
> 	int[] arr = [ 1, 2, 3, 4 ];
> 	auto squares = map!("a * a")(arr);

They take both, in fact:

auto cubes = map!((a){ return a*a*a; })(arr);


> Why then have func literal? Func parameter is precisely the place where programmers like to use them, isn't it? Isn't this a sign that D func literals are not considered as practical enough?

For very short functions, strings are better, because of the length of
the 'return' keyword. Had D instead always returned the result of the
last line of a function (unless specified to be of void return type),
map/filter/reduce would likely not take strings.

-- 
Simen

October 31, 2010

Re: higher-order functions

Posted by bearophile
in reply to Simen kjaeraas

bearophile

Posted in reply to Simen kjaeraas

Simen kjaeraas:

> They take both, in fact:
> 
> auto cubes = map!((a){ return a*a*a; })(arr);

But that needs to be compile-time constant, so if you have several functions, you need to put them inside a typetuple, or duplicate the code. And some idioms are just not possible.

You may see it well here regarding the "compose": http://rosettacode.org/wiki/First-class_functions#D


import std.stdio, std.typetuple, std.functional;
private import std.math;

void main() {
    // wrappers needed as not all built-in functions
    // have same signature, eg pure/nothrow
    auto sin  = (real x) { return std.math.sin(x); };
    auto asin = (real x) { return std.math.asin(x); };
    auto cos  = (real x) { return std.math.cos(x); };
    auto acos = (real x) { return std.math.acos(x); };
    auto cube = (real x) { return x ^^ 3; };
    auto cbrt = (real x) { return std.math.cbrt(x); };

    alias TypeTuple!(sin,  cos,  cube) dir;
    alias TypeTuple!(asin, acos, cbrt) inv;

    foreach (i, f; dir)
        writefln("%6.3f", compose!(f, inv[i])(0.5));
}

Bye,
bearophile

November 01, 2010

Re: How would I optimize this parser?

Posted by T.D.Spenser
in reply to bearophile

T.D.Spenser

Posted in reply to bearophile

== Quote from bearophile (bearophileHUGS@lycos.com)'s article
> T.D.Spenser:
> > Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)?
> Very good :-)
> First suggestions:
> - Generally compile your D code using the -w switch (you have missing some
overrides and more things)
> - Add a main() with some benchmarking examples to your program, so I will have
something to optimize for.
> - Use the -profile compiler switch on the benchmarking examples to see what's slow. - Use final classes or final methods where possible and keep an eye on memory
allocations. Consider using structs in some places instead of classes.
> - Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some
cases an Appender helps, and in some cases some workaround may be needed.
> See you later,
> bearophile

Thanks for replying.

Here is a version with a main function and sample file to parse:

http://thoughtdispenser.net/files/parser-test.7z

I fixed the warnings, but didn't change anything else yet.

Having built-in profiler is nice, I didn't know about it. Although, the files it spits out are quite strange. If I read them right, it seems the parser spends most time constructing nodes for the document's object model.

November 01, 2010

Re: How would I optimize this parser?

Posted by bearophile
in reply to T.D.Spenser

bearophile

Posted in reply to T.D.Spenser

T.D.Spenser:

> Here is a version with a main function and sample file to parse:

Good. Your code looks good, well formatted and readable.
But before you try to optimize code you have write some tests, otherwise you can't be sure you are not breaking the code, it's like driving with closed eyes. D has both unittests and design by contract, add them (it's also a good training in using such important things). I will try to look at your code even without that necessary safety net, but I can't be sure I am not breaking your code, so I hope you will give me some tests&contracts too.


> Although, the files it spits out are quite strange.

I think they are normal enough for a profiler. GCC produces a not too much different file.

Notes on your code:

Is this a critique of the D way to tell if an object is of a class? :-)

auto childTag = cast(TagNode) child;
if (childTag !is null && childTag.getName() == name) { //tricky, since cast is used to check type


In D you may have many constructors, so what were you trying to do here?

public this(string name, Node[] children...) { //can't have two constructors like in Java

Bye,
bearophile

November 01, 2010

Re: How would I optimize this parser?

Posted by bearophile
in reply to T.D.Spenser

bearophile

Posted in reply to T.D.Spenser

If you import:
import core.memory: GC;

And then you disable the GC just before parsing:
GC.disable();

The parsing runtime on my PC becomes about one third. Disabling the GC when you have to build a large data structure and re-enabling it after the creation is a normal optimization both in CPython and D.

You may also import:
import std.c.stdlib: exit;

And add this last line to the main:

exit(0);

to kill garbage collection at the end to remove the final collection time you weren't timing (but was there).

Now the total running time is about 0.3 seconds instead of 1.1 seconds.

All this means that your code is GC-bound :-) D GC is far worse than the Java one.

I will try to optimize other things.

Later,
bearophile

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation