Jump to page: 1 24  
Page
Thread overview
Replacing tango.text.Ascii.isearch
Oct 05, 2022
torhu
Oct 05, 2022
torhu
Oct 05, 2022
torhu
Oct 05, 2022
torhu
Oct 05, 2022
Ali Çehreli
Oct 05, 2022
torhu
Oct 06, 2022
Siarhei Siamashka
Oct 06, 2022
Sergey
Oct 06, 2022
rassoc
Oct 06, 2022
torhu
Oct 07, 2022
rassoc
Oct 07, 2022
Siarhei Siamashka
Oct 07, 2022
Siarhei Siamashka
Oct 07, 2022
bachmeier
Oct 07, 2022
Siarhei Siamashka
Oct 08, 2022
rassoc
Oct 09, 2022
Siarhei Siamashka
Oct 09, 2022
rassoc
Oct 13, 2022
bauss
Oct 13, 2022
rikki cattermole
Oct 13, 2022
bauss
Oct 13, 2022
bauss
Oct 13, 2022
rikki cattermole
Oct 13, 2022
bauss
Oct 13, 2022
rikki cattermole
Oct 13, 2022
Patrick Schluter
Oct 25, 2022
Siarhei Siamashka
Oct 25, 2022
rikki cattermole
Oct 26, 2022
Siarhei Siamashka
Oct 26, 2022
rikki cattermole
Oct 26, 2022
Siarhei Siamashka
Oct 26, 2022
rikki cattermole
Oct 26, 2022
Ali Çehreli
Oct 28, 2022
Siarhei Siamashka
Oct 28, 2022
rikki cattermole
October 05, 2022

I need a case-insensitive check to see if a string contains another string for a "quick filter" feature. It should preferrably be perceived as instant by the user, and needs to check a few thousand strings in typical cases. Is a regex the best option, or what would you suggest?

October 05, 2022

On 10/5/22 12:59 PM, torhu wrote:

>

I need a case-insensitive check to see if a string contains another string for a "quick filter" feature. It should preferrably be perceived as instant by the user, and needs to check a few thousand strings in typical cases. Is a regex the best option, or what would you suggest?

https://dlang.org/phobos/std_uni.html#asLowerCase

bool isearch(S1, S2)(S1 haystack, S2 needle)
{
    import std.uni;
    import std.algorithm;
    return haystack.asLowerCase.canFind(needle.asLowerCase);
}

untested.

-Steve

October 05, 2022

On Wednesday, 5 October 2022 at 17:29:25 UTC, Steven Schveighoffer wrote:
[...]

I wanted to do some quick benchmarking to figure out what works.

When I run this:

import std.stdio;
import std.datetime.stopwatch;

void main()
{
	auto sw = StopWatch();	
	sw.stop();
	writeln(sw.peek().toString());
}

It prints this:

>

2 weeks, 6 days, 9 hours, 34 minutes, 43 secs, 214 ms, 946 ╬╝s, and 7 hnsecs

Am I doing something wrong here?

October 05, 2022

On Wednesday, 5 October 2022 at 20:40:46 UTC, torhu wrote:

>

Am I doing something wrong here?

Right, you can instantiate structs without arguments. It's been ten years since I last used D, I was thinking of structs like if they were classes.

October 05, 2022

On Wednesday, 5 October 2022 at 20:45:55 UTC, torhu wrote:

>

On Wednesday, 5 October 2022 at 20:40:46 UTC, torhu wrote:

>

Am I doing something wrong here?

Right, you can instantiate structs without arguments. It's been ten years since I last used D, I was thinking of structs like if they were classes.

I think there should be sensible default here, seems like an easy trap to remove.

October 05, 2022
On 10/5/22 13:40, torhu wrote:

>      auto sw = StopWatch();

Either this:

    auto sw = StopWatch(AutoStart.yes);

or this:

    auto sw = StopWatch();
    sw.start();

Ali


October 05, 2022

On Wednesday, 5 October 2022 at 17:29:25 UTC, Steven Schveighoffer wrote:

>
bool isearch(S1, S2)(S1 haystack, S2 needle)
{
    import std.uni;
    import std.algorithm;
    return haystack.asLowerCase.canFind(needle.asLowerCase);
}

untested.

-Steve

I did some basic testing, and regex was two orders of magnitude faster. So now I know, I guess.

October 06, 2022

On Wednesday, 5 October 2022 at 21:50:32 UTC, torhu wrote:

>

I did some basic testing, and regex was two orders of magnitude faster. So now I know, I guess.

Substring search functionality is currently in a very bad shape in Phobos. I discovered this myself a few weeks ago when I was trying to solve the https://www.facebook.com/codingcompetitions/hacker-cup/2022/round-1/problems/A2 puzzle using D language. A part of the solution requires a fast substring search (actually a subarray search) and Phobos doesn't offer anything with even remotely acceptable performance.

Phobos does have a Boyer-Moore implementation. This algorithm is historically famous in computer science as one of the first attempts to optimize substring search, but it also has pathologically bad performance on certain input data and probably shouldn't be recommended for any practical use nowadays. The users of old versions of Python discovered this too: https://codeforces.com/blog/entry/106849?#comment-952483

The standard 'find' function from the following simple example also becomes awfully slow when arrays 'a' and 'b' are large and/or are adversarially constructed:

import std;
void main() {
  auto a = [1, 2, 3, 4];
  auto b = [2, 3];
  writefln("Is %s a subarray of %s? The answer is %s.",
           b, a, a.find(b).empty ? "no" : "yes");
}

I think that the best fit for your use case is the https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm and there's an old issue in bugzilla about this: https://issues.dlang.org/show_bug.cgi?id=16066

BTW, if anyone is curious, one of the possible solutions for the hacker-cup/2022/round-1/problems/A2 puzzle in D language can be found here: https://codeforces.com/blog/entry/106768?#comment-952808

October 06, 2022

On Thursday, 6 October 2022 at 08:15:10 UTC, Siarhei Siamashka wrote:

>

On Wednesday, 5 October 2022 at 21:50:32 UTC, torhu wrote:

Please don’t tell us that D will be slower than Python again?)

October 06, 2022
On 10/5/22 23:50, torhu via Digitalmars-d-learn wrote:
> I did some basic testing, and regex was two orders of magnitude faster. So now I know, I guess.

And what kind of testing was that? Mind to share? Because I did the following real quick and wasn't able to measure a "two orders of magnitude" difference. Sure, the regex version came on top, but they were both faster than the ruby baseline I cooked up.

First, generate a word file with 100k entries of various lengths:

$> dmd -run words.d foobaz 100000
---
import std;

string randomWord(ulong n) {
    static chars = letters.array;
    return generate!(() => chars.choice).take(n).text;
}

void main(string[] args) {
    enforce(args.length == 3, "Usage: dmd -run words.d needle num");

    auto f = File("words.txt", "w");
    foreach (i; 0..args[2].to!ulong) {
        ulong n = uniform(0, 50), m = uniform(0, 50);
        if (i % 2 == 0)
            f.writeln(randomWord(n), args[1], randomWord(m));
        else
            f.writeln(randomWord(n + m));
    }
}
---

And then for the actual measuring:

$> dmd -O -version={range,regex} -of=search-{range,regex} search.d
$> ldc -O -d-version={range,regex} -of=search-{range,regex}-ldc search.d
$> time ./search-{range,regex}{,-ldc} foobaz
---
import std;

void main(string[] args) {
    enforce(args.length == 2, "Usage: search 'needle'");

    version (regex)
        auto rx = regex(args[1], "i");
    else version (range)
        auto needle = args[1].asLowerCase.text;
    else
        static assert(0, "use -version={regex,range}");

    ulong matches;
    foreach (line; File("words.txt").byLine) {
        version (regex)
            if (line.matchFirst(rx))
                matches++;
        version (range)
            if (line.asLowerCase.canFind(needle))
                matches++;
    }
    writeln(matches);
}
---
« First   ‹ Prev
1 2 3 4