January 17

On Wednesday, 17 January 2024 at 10:43:22 UTC, Renato wrote:

>

On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:

>

It's not Java writing the file, it's the bash script benchmark.sh:

java -cp "build/util" util.GeneratePhoneNumbers 1000 > phones_1000.txt

Perhaps using this option when running Java will help:

java -DFile.Encoding=UTF-8 ...

I've used powershell env var to set output to utf8, D version now works but java doesn't.

java -Xms20M -Xmx100M -cp build/java Main print words-quarter.txt phones_1000.txt
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 65485 out of bounds for length 10
        at Trie.completeSolution(Main.java:216)
        at Trie.forEachSolution(Main.java:192)
        at PhoneNumberEncoder.encode(Main.java:132)
        at Main.lambda$main$1(Main.java:38)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at Main.main(Main.java:38)
January 17

On Wednesday, 17 January 2024 at 10:50:26 UTC, evilrat wrote:

>

On Wednesday, 17 January 2024 at 10:43:22 UTC, Renato wrote:

>

On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:

>

It's not Java writing the file, it's the bash script benchmark.sh:

java -cp "build/util" util.GeneratePhoneNumbers 1000 > phones_1000.txt

Perhaps using this option when running Java will help:

java -DFile.Encoding=UTF-8 ...

I've used powershell env var to set output to utf8, D version now works but java doesn't.

java -Xms20M -Xmx100M -cp build/java Main print words-quarter.txt phones_1000.txt
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 65485 out of bounds for length 10
        at Trie.completeSolution(Main.java:216)
        at Trie.forEachSolution(Main.java:192)
        at PhoneNumberEncoder.encode(Main.java:132)
        at Main.lambda$main$1(Main.java:38)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at Main.main(Main.java:38)

This is this line:

var digit = chars[ index ] - 48;

That means the input file is still not ASCII (or UTF-8) as it should. Java is reading files with the ASCII encoding so it should've worked fine.

January 17

On Wednesday, 17 January 2024 at 11:20:14 UTC, Renato wrote:

>

That means the input file is still not ASCII (or UTF-8) as it should. Java is reading files with the ASCII encoding so it should've worked fine.

It seems that it is only works with ASCII encoding though.

January 17

On Wednesday, 17 January 2024 at 11:56:19 UTC, evilrat wrote:

>

On Wednesday, 17 January 2024 at 11:20:14 UTC, Renato wrote:

>

That means the input file is still not ASCII (or UTF-8) as it should. Java is reading files with the ASCII encoding so it should've worked fine.

It seems that it is only works with ASCII encoding though.

Yes, that's according to the rules - only ASCII for everything.

January 17
On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote:
> On Tuesday, 16 January 2024 at 22:13:55 UTC, H. S. Teoh wrote:
> > used for the recursive calls. Getting rid of the .format ought to speed it up a bit. Will try that now...
> > 
> 
> That will make no difference for the `count` option which is where your solution was very slow.

Of course it will. Passing the data directly to the callback that bumps a counter is faster than allocating a new string, formatting the data, and then passing it to the callback that bumps a counter.  It may not look like much, but avoiding unnecessary GC allocations means the GC will have less work to do later when a collection is run, thus you save time over the long term.


> To run the slow test manually use the `words_quarter.txt` dictionary (the phone numbers file doesn't matter much - it's all in the dictionary).
> 
> But pls run the benchmarks yourself as I am not going to keep running it for you, and would be nice if you posted your solution on a Gist for example, pasting lots of code in the forum makes it difficult to follow.

I'll push the code to github.


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured megaliths that address all questions by piling on ridiculous internal links in forms which are hideously over-complex." -- Simon St. Laurent on xml-dev
January 17
On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote: [...]
> But pls run the benchmarks yourself as I am not going to keep running it for you, and would be nice if you posted your solution on a Gist for example, pasting lots of code in the forum makes it difficult to follow.

I can't. I spent half an hour trying to get ./benchmark.sh to run, but no matter what it could not compile benchmark_runner. It complains that my rustc is too old and some dependencies do not support it. I tried running the suggested cargo update command to pin the versions but none of them worked.  Since I'm not a Rust user, I'm not feeling particularly motivated right now to spend any more time on this.  Upgrading my rustc isn't really an option because that's the version currently in my distro and I really don't feel like spending more time to install a custom version of rustc just for this benchmark.


T

-- 
Today's society is one of specialization: as you grow, you learn more and more about less and less. Eventually, you know everything about nothing.
January 17

On Wednesday, 17 January 2024 at 16:30:08 UTC, H. S. Teoh wrote:

>

On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote: [...]

>

But pls run the benchmarks yourself as I am not going to keep running it for you, and would be nice if you posted your solution on a Gist for example, pasting lots of code in the forum makes it difficult to follow.

I can't. I spent half an hour trying to get ./benchmark.sh to run, but no matter what it could not compile benchmark_runner. It complains that my rustc is too old and some dependencies do not support it. I tried running the suggested cargo update command to pin the versions but none of them worked. Since I'm not a Rust user, I'm not feeling particularly motivated right now to spend any more time on this. Upgrading my rustc isn't really an option because that's the version currently in my distro and I really don't feel like spending more time to install a custom version of rustc just for this benchmark.

T

I've just updated the Rust version to the benchmark monitor could work on Linux (it only worked on Mac before) :D that's probably why your rustc didn't work, though as the project is still using edition2018 I would've thought even a very old compiler would have worked... anyway, if you ever find yourself actually using Rust, you should use rustup (https://rustup.rs/) which makes it trivial to update Rust.

About the "count" option: I had assumed you didn't call format on the count option as it's never needed, there's nothing to print.

January 17
On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]
> I'll push the code to github.
[...]

Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d


T

-- 
Why do conspiracy theories always come from the same people??
January 18

On Wednesday, 17 January 2024 at 16:54:00 UTC, H. S. Teoh wrote:

>

On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]

>

I'll push the code to github.
[...]

Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d

T

Ok, last time I'm running this for someone else :D

Proc,Run,Memory(bytes),Time(ms)
===> ./rust
./rust,23920640,30
./rust,24018944,147
./rust,24068096,592
./rust,24150016,1187
./rust,7766016,4972
./rust,8011776,46101
===> src/d/dencoder
src/d/dencoder,44154880,42
src/d/dencoder,51347456,87
src/d/dencoder,51380224,273
src/d/dencoder,51462144,441
src/d/dencoder,18644992,4414
src/d/dencoder,18710528,43548

Congratulations on beating Rust :D but remember: you're using a much more efficient algorithm! I must conclude that the Rust translation of the Trie algorithm would be much faster still, unfortunately (you may have noticed that I am on D's side here!).

January 18

On Wednesday, 17 January 2024 at 16:54:00 UTC, H. S. Teoh wrote:

>

On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]

>

I'll push the code to github.
[...]

Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d

T

BTW here's you main function so it can run on the benchmark:

int main(string[] args)
{
    bool countOnly = args.length > 1 ? (() {
        final switch (args[1])
        {
        case "count":
            return true;
        case "print":
            return false;
        }
    })() : false;

    auto dictfile = args.length > 2 ? args[2] : "tests/words.txt";
    auto input = args.length > 3 ? args[3] : "tests/numbers.txt";

    Trie dict = loadDictionary(File(dictfile).byLine);

    if (countOnly)
    {
        size_t count;
        encodePhoneNumbers(File(input).byLine, dict, (phone, match) { count++; });
        writefln("%d", count);
    }
    else
    {
        encodePhoneNumbers(File(input).byLine, dict, (phone, match) {
            writefln("%s: %s", phone, match);
        });
    }

    return 0;
}