Jump to page: 1 2
Thread overview
How would I optimize this parser?
Oct 31, 2010
T.D.Spenser
Oct 31, 2010
bearophile
joining lists into texts [was: Re: How would I optimize this parser?]
Oct 31, 2010
spir
Oct 31, 2010
Nick Sabalausky
Re: higher-order functions
Oct 31, 2010
spir
Oct 31, 2010
Simen kjaeraas
Oct 31, 2010
bearophile
Nov 01, 2010
Rainer Deyke
Nov 01, 2010
spir
Nov 01, 2010
T.D.Spenser
Nov 01, 2010
bearophile
Nov 01, 2010
bearophile
Nov 01, 2010
bearophile
Nov 01, 2010
bearophile
Nov 01, 2010
spir
Nov 01, 2010
bearophile
Nov 01, 2010
bearophile
Nov 03, 2010
T.D.Spenser
Nov 03, 2010
bearophile
October 31, 2010
I wrote a simple parser in D. The language for it looks like this:

[tagName content content [childTagName] [childTagName more content]]

The parser is here:

http://thoughtdispenser.net/files/parser.d

Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)?
October 31, 2010
T.D.Spenser:

> Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)?

Very good :-)
First suggestions:
- Generally compile your D code using the -w switch (you have missing some overrides and more things)
- Add a main() with some benchmarking examples to your program, so I will have something to optimize for.
- Use the -profile compiler switch on the benchmarking examples to see what's slow.
- Use final classes or final methods where possible and keep an eye on memory allocations. Consider using structs in some places instead of classes.
- Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some cases an Appender helps, and in some cases some workaround may be needed.

See you later,
bearophile
October 31, 2010
On Sat, 30 Oct 2010 23:00:59 -0400
bearophile <bearophileHUGS@lycos.com> wrote:

> - Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some cases an Appender helps, and in some cases some workaround may be needed.

A use case of join (in std.string) is indeed to build a text from snippets.
But I personly use this func most often to build a text representation of list of elements of any kind. For this, join falls short:
* no auto-conversion to string
* no delimiters
Thus, I think the following may be useful in the stdlib (aside join):

=================================================
string listText(T)(T[] elements, string sep) {
    uint l = elements.length ;
    if (l == 0)
        return "" ;
    string[] texts ; texts.length = l ;
    foreach (uint i, T element ; elements)
        texts[i] = to!string(elements[i]) ;
    return join(texts, sep) ;
}
string listText(T)(T[] elements, string sep, string leftDelim, string rightDelim) {
    return format("%s%s%s", leftDelim, listText(elements, sep), rightDelim) ;
}

// testing
struct Symbol {
    string k;
    int v;
    string toString () {
        return format("%s:%s", this.k, to!string(this.v)) ;
    }
}

void main () {
    int[] ints = [1,2,3] ;
    writeln(listText(ints , "---")) ;
    writeln(listText(ints , " " , "(",")")) ;
    Symbol[3] symbols = [Symbol("a",1) , Symbol("b",2) , Symbol("c",3)] ;
    writeln(listText(symbols , " | " , "{ "," }")) ;
}

// writes:
1---2---3
(1 2 3)
{ a:1 | b:2 | c:3 }
==========================================

This works sensibly with any custom type where toString is defined. (else you get the type name ;-)

I would also love a mapping-func param, to allow expressing things like listText(ints, square, " "); but without optional parameters (AFAIK), combinations are problematic.

Also, I could not find functional methods like map, filter, reduce in std.functional. Where else? Also not in std.array. Map would be handy above to string-ify. And remove the need for a map func param in listText (I would be happy to contribute them if someone guides me on how to participate.)


Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

October 31, 2010
"spir" <denis.spir@gmail.com> wrote in message news:mailman.45.1288523296.21107.digitalmars-d-learn@puremagic.com...
>
>Also, I could not find functional methods like map, filter,
> reduce in std.functional. Where else? Also not in std.array.

std.algorithm. But the docs for it do need to be improved.


October 31, 2010
On Sun, 31 Oct 2010 17:31:54 -0400
"Nick Sabalausky" <a@a.a> wrote:

> "spir" <denis.spir@gmail.com> wrote in message news:mailman.45.1288523296.21107.digitalmars-d-learn@puremagic.com...
> >
> >Also, I could not find functional methods like map, filter,
> > reduce in std.functional. Where else? Also not in std.array.
> 
> std.algorithm. But the docs for it do need to be improved.

I find it very strange that map/filter/reduce take *strings* as func expressions, instead of function literals:
	int[] arr = [ 1, 2, 3, 4 ];
	auto squares = map!("a * a")(arr);
Why then have func literal? Func parameter is precisely the place where programmers like to use them, isn't it? Isn't this a sign that D func literals are not considered as practical enough?

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

October 31, 2010
spir <denis.spir@gmail.com> wrote:

> On Sun, 31 Oct 2010 17:31:54 -0400
> "Nick Sabalausky" <a@a.a> wrote:
>
>> "spir" <denis.spir@gmail.com> wrote in message
>> news:mailman.45.1288523296.21107.digitalmars-d-learn@puremagic.com...
>> >
>> >Also, I could not find functional methods like map, filter,
>> > reduce in std.functional. Where else? Also not in std.array.
>>
>> std.algorithm. But the docs for it do need to be improved.
>
> I find it very strange that map/filter/reduce take *strings* as func expressions, instead of function literals:
> 	int[] arr = [ 1, 2, 3, 4 ];
> 	auto squares = map!("a * a")(arr);

They take both, in fact:

auto cubes = map!((a){ return a*a*a; })(arr);


> Why then have func literal? Func parameter is precisely the place where programmers like to use them, isn't it? Isn't this a sign that D func literals are not considered as practical enough?

For very short functions, strings are better, because of the length of
the 'return' keyword. Had D instead always returned the result of the
last line of a function (unless specified to be of void return type),
map/filter/reduce would likely not take strings.

-- 
Simen
October 31, 2010
Simen kjaeraas:

> They take both, in fact:
> 
> auto cubes = map!((a){ return a*a*a; })(arr);

But that needs to be compile-time constant, so if you have several functions, you need to put them inside a typetuple, or duplicate the code. And some idioms are just not possible.

You may see it well here regarding the "compose": http://rosettacode.org/wiki/First-class_functions#D


import std.stdio, std.typetuple, std.functional;
private import std.math;

void main() {
    // wrappers needed as not all built-in functions
    // have same signature, eg pure/nothrow
    auto sin  = (real x) { return std.math.sin(x); };
    auto asin = (real x) { return std.math.asin(x); };
    auto cos  = (real x) { return std.math.cos(x); };
    auto acos = (real x) { return std.math.acos(x); };
    auto cube = (real x) { return x ^^ 3; };
    auto cbrt = (real x) { return std.math.cbrt(x); };

    alias TypeTuple!(sin,  cos,  cube) dir;
    alias TypeTuple!(asin, acos, cbrt) inv;

    foreach (i, f; dir)
        writefln("%6.3f", compose!(f, inv[i])(0.5));
}

Bye,
bearophile
November 01, 2010
== Quote from bearophile (bearophileHUGS@lycos.com)'s article
> T.D.Spenser:
> > Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)?
> Very good :-)
> First suggestions:
> - Generally compile your D code using the -w switch (you have missing some
overrides and more things)
> - Add a main() with some benchmarking examples to your program, so I will have
something to optimize for.
> - Use the -profile compiler switch on the benchmarking examples to see what's slow. - Use final classes or final methods where possible and keep an eye on memory
allocations. Consider using structs in some places instead of classes.
> - Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some
cases an Appender helps, and in some cases some workaround may be needed.
> See you later,
> bearophile

Thanks for replying.

Here is a version with a main function and sample file to parse:

http://thoughtdispenser.net/files/parser-test.7z

I fixed the warnings, but didn't change anything else yet.

Having built-in profiler is nice, I didn't know about it. Although, the files it spits out are quite strange. If I read them right, it seems the parser spends most time constructing nodes for the document's object model.
November 01, 2010
T.D.Spenser:

> Here is a version with a main function and sample file to parse:

Good. Your code looks good, well formatted and readable.
But before you try to optimize code you have write some tests, otherwise you can't be sure you are not breaking the code, it's like driving with closed eyes. D has both unittests and design by contract, add them (it's also a good training in using such important things). I will try to look at your code even without that necessary safety net, but I can't be sure I am not breaking your code, so I hope you will give me some tests&contracts too.


> Although, the files it spits out are quite strange.

I think they are normal enough for a profiler. GCC produces a not too much different file.

Notes on your code:

Is this a critique of the D way to tell if an object is of a class? :-)

auto childTag = cast(TagNode) child;
if (childTag !is null && childTag.getName() == name) { //tricky, since cast is used to check type


In D you may have many constructors, so what were you trying to do here?

public this(string name, Node[] children...) { //can't have two constructors like in Java

Bye,
bearophile
November 01, 2010
If you import:
import core.memory: GC;

And then you disable the GC just before parsing:
GC.disable();

The parsing runtime on my PC becomes about one third. Disabling the GC when you have to build a large data structure and re-enabling it after the creation is a normal optimization both in CPython and D.

You may also import:
import std.c.stdlib: exit;

And add this last line to the main:

exit(0);

to kill garbage collection at the end to remove the final collection time you weren't timing (but was there).

Now the total running time is about 0.3 seconds instead of 1.1 seconds.

All this means that your code is GC-bound :-) D GC is far worse than the Java one.

I will try to optimize other things.

Later,
bearophile
« First   ‹ Prev
1 2