Thread overview | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
October 31, 2010 How would I optimize this parser? | ||||
---|---|---|---|---|
| ||||
Attachments: | I wrote a simple parser in D. The language for it looks like this: [tagName content content [childTagName] [childTagName more content]] The parser is here: http://thoughtdispenser.net/files/parser.d Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)? |
October 31, 2010 Re: How would I optimize this parser? | ||||
---|---|---|---|---|
| ||||
Posted in reply to T.D.Spenser | T.D.Spenser:
> Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)?
Very good :-)
First suggestions:
- Generally compile your D code using the -w switch (you have missing some overrides and more things)
- Add a main() with some benchmarking examples to your program, so I will have something to optimize for.
- Use the -profile compiler switch on the benchmarking examples to see what's slow.
- Use final classes or final methods where possible and keep an eye on memory allocations. Consider using structs in some places instead of classes.
- Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some cases an Appender helps, and in some cases some workaround may be needed.
See you later,
bearophile
|
October 31, 2010 joining lists into texts [was: Re: How would I optimize this parser?] | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Sat, 30 Oct 2010 23:00:59 -0400 bearophile <bearophileHUGS@lycos.com> wrote: > - Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some cases an Appender helps, and in some cases some workaround may be needed. A use case of join (in std.string) is indeed to build a text from snippets. But I personly use this func most often to build a text representation of list of elements of any kind. For this, join falls short: * no auto-conversion to string * no delimiters Thus, I think the following may be useful in the stdlib (aside join): ================================================= string listText(T)(T[] elements, string sep) { uint l = elements.length ; if (l == 0) return "" ; string[] texts ; texts.length = l ; foreach (uint i, T element ; elements) texts[i] = to!string(elements[i]) ; return join(texts, sep) ; } string listText(T)(T[] elements, string sep, string leftDelim, string rightDelim) { return format("%s%s%s", leftDelim, listText(elements, sep), rightDelim) ; } // testing struct Symbol { string k; int v; string toString () { return format("%s:%s", this.k, to!string(this.v)) ; } } void main () { int[] ints = [1,2,3] ; writeln(listText(ints , "---")) ; writeln(listText(ints , " " , "(",")")) ; Symbol[3] symbols = [Symbol("a",1) , Symbol("b",2) , Symbol("c",3)] ; writeln(listText(symbols , " | " , "{ "," }")) ; } // writes: 1---2---3 (1 2 3) { a:1 | b:2 | c:3 } ========================================== This works sensibly with any custom type where toString is defined. (else you get the type name ;-) I would also love a mapping-func param, to allow expressing things like listText(ints, square, " "); but without optional parameters (AFAIK), combinations are problematic. Also, I could not find functional methods like map, filter, reduce in std.functional. Where else? Also not in std.array. Map would be handy above to string-ify. And remove the need for a map func param in listText (I would be happy to contribute them if someone guides me on how to participate.) Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com |
October 31, 2010 Re: joining lists into texts [was: Re: How would I optimize this parser?] | ||||
---|---|---|---|---|
| ||||
Posted in reply to spir | "spir" <denis.spir@gmail.com> wrote in message news:mailman.45.1288523296.21107.digitalmars-d-learn@puremagic.com... > >Also, I could not find functional methods like map, filter, > reduce in std.functional. Where else? Also not in std.array. std.algorithm. But the docs for it do need to be improved. |
October 31, 2010 Re: higher-order functions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nick Sabalausky | On Sun, 31 Oct 2010 17:31:54 -0400
"Nick Sabalausky" <a@a.a> wrote:
> "spir" <denis.spir@gmail.com> wrote in message news:mailman.45.1288523296.21107.digitalmars-d-learn@puremagic.com...
> >
> >Also, I could not find functional methods like map, filter,
> > reduce in std.functional. Where else? Also not in std.array.
>
> std.algorithm. But the docs for it do need to be improved.
I find it very strange that map/filter/reduce take *strings* as func expressions, instead of function literals:
int[] arr = [ 1, 2, 3, 4 ];
auto squares = map!("a * a")(arr);
Why then have func literal? Func parameter is precisely the place where programmers like to use them, isn't it? Isn't this a sign that D func literals are not considered as practical enough?
Denis
-- -- -- -- -- -- --
vit esse estrany ☣
spir.wikidot.com
|
October 31, 2010 Re: higher-order functions | ||||
---|---|---|---|---|
| ||||
Posted in reply to spir | spir <denis.spir@gmail.com> wrote: > On Sun, 31 Oct 2010 17:31:54 -0400 > "Nick Sabalausky" <a@a.a> wrote: > >> "spir" <denis.spir@gmail.com> wrote in message >> news:mailman.45.1288523296.21107.digitalmars-d-learn@puremagic.com... >> > >> >Also, I could not find functional methods like map, filter, >> > reduce in std.functional. Where else? Also not in std.array. >> >> std.algorithm. But the docs for it do need to be improved. > > I find it very strange that map/filter/reduce take *strings* as func expressions, instead of function literals: > int[] arr = [ 1, 2, 3, 4 ]; > auto squares = map!("a * a")(arr); They take both, in fact: auto cubes = map!((a){ return a*a*a; })(arr); > Why then have func literal? Func parameter is precisely the place where programmers like to use them, isn't it? Isn't this a sign that D func literals are not considered as practical enough? For very short functions, strings are better, because of the length of the 'return' keyword. Had D instead always returned the result of the last line of a function (unless specified to be of void return type), map/filter/reduce would likely not take strings. -- Simen |
October 31, 2010 Re: higher-order functions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Simen kjaeraas | Simen kjaeraas: > They take both, in fact: > > auto cubes = map!((a){ return a*a*a; })(arr); But that needs to be compile-time constant, so if you have several functions, you need to put them inside a typetuple, or duplicate the code. And some idioms are just not possible. You may see it well here regarding the "compose": http://rosettacode.org/wiki/First-class_functions#D import std.stdio, std.typetuple, std.functional; private import std.math; void main() { // wrappers needed as not all built-in functions // have same signature, eg pure/nothrow auto sin = (real x) { return std.math.sin(x); }; auto asin = (real x) { return std.math.asin(x); }; auto cos = (real x) { return std.math.cos(x); }; auto acos = (real x) { return std.math.acos(x); }; auto cube = (real x) { return x ^^ 3; }; auto cbrt = (real x) { return std.math.cbrt(x); }; alias TypeTuple!(sin, cos, cube) dir; alias TypeTuple!(asin, acos, cbrt) inv; foreach (i, f; dir) writefln("%6.3f", compose!(f, inv[i])(0.5)); } Bye, bearophile |
November 01, 2010 Re: How would I optimize this parser? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | == Quote from bearophile (bearophileHUGS@lycos.com)'s article > T.D.Spenser: > > Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)? > Very good :-) > First suggestions: > - Generally compile your D code using the -w switch (you have missing some overrides and more things) > - Add a main() with some benchmarking examples to your program, so I will have something to optimize for. > - Use the -profile compiler switch on the benchmarking examples to see what's slow. - Use final classes or final methods where possible and keep an eye on memory allocations. Consider using structs in some places instead of classes. > - Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some cases an Appender helps, and in some cases some workaround may be needed. > See you later, > bearophile Thanks for replying. Here is a version with a main function and sample file to parse: http://thoughtdispenser.net/files/parser-test.7z I fixed the warnings, but didn't change anything else yet. Having built-in profiler is nice, I didn't know about it. Although, the files it spits out are quite strange. If I read them right, it seems the parser spends most time constructing nodes for the document's object model. |
November 01, 2010 Re: How would I optimize this parser? | ||||
---|---|---|---|---|
| ||||
Posted in reply to T.D.Spenser | T.D.Spenser: > Here is a version with a main function and sample file to parse: Good. Your code looks good, well formatted and readable. But before you try to optimize code you have write some tests, otherwise you can't be sure you are not breaking the code, it's like driving with closed eyes. D has both unittests and design by contract, add them (it's also a good training in using such important things). I will try to look at your code even without that necessary safety net, but I can't be sure I am not breaking your code, so I hope you will give me some tests&contracts too. > Although, the files it spits out are quite strange. I think they are normal enough for a profiler. GCC produces a not too much different file. Notes on your code: Is this a critique of the D way to tell if an object is of a class? :-) auto childTag = cast(TagNode) child; if (childTag !is null && childTag.getName() == name) { //tricky, since cast is used to check type In D you may have many constructors, so what were you trying to do here? public this(string name, Node[] children...) { //can't have two constructors like in Java Bye, bearophile |
November 01, 2010 Re: How would I optimize this parser? | ||||
---|---|---|---|---|
| ||||
Posted in reply to T.D.Spenser | If you import: import core.memory: GC; And then you disable the GC just before parsing: GC.disable(); The parsing runtime on my PC becomes about one third. Disabling the GC when you have to build a large data structure and re-enabling it after the creation is a normal optimization both in CPython and D. You may also import: import std.c.stdlib: exit; And add this last line to the main: exit(0); to kill garbage collection at the end to remove the final collection time you weren't timing (but was there). Now the total running time is about 0.3 seconds instead of 1.1 seconds. All this means that your code is GC-bound :-) D GC is far worse than the Java one. I will try to optimize other things. Later, bearophile |
Copyright © 1999-2021 by the D Language Foundation