Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
March 13, 2020 Associative Array potential performance pitfall? | ||||
---|---|---|---|---|
| ||||
In my machine the following D code compiled with release flag and LDC performs over 230ms while the similar Go code performs under 120ms. string smallestRepr(const string arg) { import std.format : format; const repeated = format!"%s%s"(arg, arg); string result; result.reserve(arg.length); result = arg; foreach (i; 1 .. arg.length) { const slice = repeated[i .. i + arg.length]; if (slice < result) result = slice; } return result; } unittest { assert("cba".smallestRepr() == "acb"); } void main(const string[] args) { string[][const string] wordTable; import std.stdio : writeln, lines, File; foreach (string word; File(args[1]).lines()) { word = word[0 .. $ - 1]; // strip the newline character const string key = word.smallestRepr(); if (key in wordTable) wordTable[key] ~= word; else wordTable[key] = [word]; } foreach (key; wordTable.keys) { if (wordTable[key].length == 4) { writeln(wordTable[key]); break; } } } --- package main import ( "bufio" "fmt" "os" ) func canonicalize(s string) string { c := s + s[:len(s)-1] best := s for i := 1; i < len(s); i++ { if c[i:i+len(s)] < best { best = c[i : i+len(s)] } } return best } func main() { seen := make(map[string][]string) inputFile, err := os.Open(os.Args[1]) defer inputFile.Close() if err != nil { os.Exit(1) } scanner := bufio.NewScanner(inputFile) for scanner.Scan() { word := scanner.Text() n := canonicalize(word) seen[n] = append(seen[n], word) } for _, words := range seen { if len(words) == 4 { fmt.Printf("%v\n", words) break } } } The input file : https://raw.githubusercontent.com/dolph/dictionary/master/enable1.txthttps://raw.githubusercontent.com/dolph/dictionary/master/enable1.txt Problem description: https://www.reddit.com/r/dailyprogrammer/comments/ffxabb/20200309_challenge_383_easy_necklace_matching Where am I losing performance? |
March 13, 2020 Re: Associative Array potential performance pitfall? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adnan | On Fri, Mar 13, 2020 at 03:40:11AM +0000, Adnan via Digitalmars-d-learn wrote: > In my machine the following D code compiled with release flag and LDC performs over 230ms while the similar Go code performs under 120ms. > > string smallestRepr(const string arg) { > import std.format : format; > > const repeated = format!"%s%s"(arg, arg); .format is dog slow, and not exactly something you want to be calling every iteration. Try instead: auto repeated = arg ~ arg; > string result; > result.reserve(arg.length); > result = arg; This does not do what you think it does. Assigning to a string is a shallow copy; the call to .reserve is superfluous. In fact, it's triggering another allocation, only to throw it away immediately. Just write: auto result = arg; > foreach (i; 1 .. arg.length) { > const slice = repeated[i .. i + arg.length]; > if (slice < result) > result = slice; > } > return result; > } [...] > Where am I losing performance? 1) Calling .format -- it's dog slow. Just use ~. 2) Calling .reserve only to throw away the results immediately. Just skip it. I doubt your AA is the performance bottleneck here. Try the following implementation of .smallestRepr that avoids (1) and (2): string smallestRepr(const string arg) { auto repeated = arg ~ arg; string result = arg; foreach (i; 1 .. arg.length) { const slice = repeated[i .. i + arg.length]; if (slice < result) result = slice; } return result; } Note that `arg ~ arg` may allocate, but it also may not if the current buffer for `arg` is big enough to accomodate both. So you might actually save on some allocations here, that you lose out from calling .format. On my machine, a quick benchmark shows a 2x performance boost. T -- That's not a bug; that's a feature! |
March 13, 2020 Re: Associative Array potential performance pitfall? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adnan | On Friday, 13 March 2020 at 03:40:11 UTC, Adnan wrote:
> Where am I losing performance?
It is nonsensical to say without measuring. Humans are notoriously bad at predicting performance issues. Wrap all your hardworking code into a loop with like 100 iterations, compile and run:
$ perf record -g ./app file.txt && perf report
Measure dmd debug build first, then ldc2 -release -O.
Besides what H. S. Teoh said, these things help:
- disable GC before doing the job and enable it afterwards
- use .byLineCopy instead of lines(), it's faster
- use .byKeyValue when printing the results, it's lazy and returns both key and value
90ms after vs 215ms before
|
March 13, 2020 Re: Associative Array potential performance pitfall? | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On 3/13/20 5:24 AM, H. S. Teoh wrote:
> Note that `arg ~ arg` may allocate, but it also may not if the current
> buffer for `arg` is big enough to accomodate both.
That always allocates. Only appending may avoid allocation:
arg ~= arg;
But, I would instead use ranges if possible to avoid all allocations.
-Steve
|
March 13, 2020 Re: Associative Array potential performance pitfall? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Fri, Mar 13, 2020 at 09:30:16AM -0400, Steven Schveighoffer via Digitalmars-d-learn wrote: > On 3/13/20 5:24 AM, H. S. Teoh wrote: > > Note that `arg ~ arg` may allocate, but it also may not if the current buffer for `arg` is big enough to accomodate both. > > That always allocates. Only appending may avoid allocation: > > arg ~= arg; Ah I see. Mea culpa. > But, I would instead use ranges if possible to avoid all allocations. [...] Initially I thought about avoiding all allocations, but I realized that no matter what, you need to allocate the key for the AA anyway, so I didn't go that route. That said, though, we could probably reduce the number of allocations by various means, such as having the function return char[] instead of string, and calling .idup on-demand as opposed to allocating up-front. This would allow us to reuse a static buffer for `repeated` instead of incurring an allocation each time. Given what the algorithm is doing, this should save some number of allocations if done correctly. T -- Creativity is not an excuse for sloppiness. |
Copyright © 1999-2021 by the D Language Foundation