Jump to page: 1 25  
Page
Thread overview
Why is std.regex slow, well here is one reason!
Feb 23, 2023
Walter Bright
Feb 23, 2023
Walter Bright
Feb 24, 2023
Max Samukha
Feb 24, 2023
Walter Bright
Feb 24, 2023
Max Samukha
Feb 24, 2023
Walter Bright
Feb 24, 2023
Adam D Ruppe
Feb 24, 2023
Andrea Fontana
Feb 25, 2023
Herbie Melbourne
Mar 02, 2023
Walter Bright
Mar 03, 2023
Kagamin
Mar 02, 2023
Dmitry Olshansky
Mar 02, 2023
Dmitry Olshansky
Mar 02, 2023
Walter Bright
Mar 03, 2023
Dmitry Olshansky
Mar 03, 2023
GrimMaple
Mar 03, 2023
Patrick Schluter
Mar 04, 2023
Max Samukha
Feb 24, 2023
H. S. Teoh
Feb 24, 2023
Walter Bright
Feb 25, 2023
Patrick Schluter
Feb 25, 2023
Herbie Melbourne
Feb 27, 2023
FeepingCreature
Feb 25, 2023
Dmitry Olshansky
Mar 03, 2023
Robert Schadek
Mar 03, 2023
Dmitry Olshansky
Mar 03, 2023
Robert Schadek
Feb 23, 2023
Johan
Feb 24, 2023
Johan
Feb 24, 2023
Ali Çehreli
Feb 24, 2023
Johan
Feb 25, 2023
Johan
Mar 01, 2023
Johan
February 23, 2023

As well all know std.regex slows down our builds even if all you're doing is importing it.

So on Discord we were chatting and I got annoyed about it enough to look into it (which as we all know is a good way to make me do something about it).

To start off with lets do some base timings with dmd.

Here is my test module, disable the regex call as required.

import std.regex;

void main() {
    auto r = regex(`[a-z]`); // remove me
}

To compile this its 2.2s, to compile it without the regex call its 1.2s.

Okay that's quite a big jump but at least we're using it. Now on to modifying std.regex or should I say std.uni.

That's right, we will be modifying std.uni not std.regex!

All we need to do is add -version=std_uni_bootstrap to our call to dmd to get this working and apply the changes at the end of this post.

Now the times are 1.2s and 0.9s.

Why is turning on bootstrap version in std.uni decreasing compile times so significantly? This is almost certainly because of the Unicode tables being compressed. std.regex is triggering decompression and bringing a whole pile of logic that wouldn't be required otherwise. Which costs an awful lot CPU and ram during CTFE. newCTFE anyone?

If you want to repeat, you'll need the below changes to std.uni (just add at bottom of file).

public:
version(std_uni_bootstrap) {
    int icmp(S1, S2)(S1 r1, S2 r2) { return 0;}
    dchar toLower()(dchar c) { return c; }
    dchar toUpper()(dchar c) { return c; }
    void toLowerInPlace(C)(ref C[] s){}
    void toUpperInPlace(C)(ref C[] s){}
    size_t graphemeStride(C)(const scope C[] input, size_t index) {return 0;}
    bool isGraphical()(dchar c) { return false;}
    struct unicode {
        static @property auto opDispatch(string name)() {
            return CodepointSet.init;
        }

        static CodepointSet parseSet(Range)(ref Range range, bool casefold=false) {
            return CodepointSet.init;
        }

        static CodepointSet parsePropertySpec(Range)(ref Range p,
        bool negated, bool casefold) {
         return CodepointSet.init;
        }
        static dchar parseControlCode(Parser)(ref Parser p) {
        return 0;
        }
    }
    alias Escapables = AliasSeq!('[', ']', '\\', '^', '$', '.', '|', '?', ',', '-',
    ';', ':', '#', '&', '%', '/', '<', '>', '`',  '*', '+', '(', ')', '{', '}',  '~');

    struct Stack(T) {
    @safe:
    T[] data;
    @property bool empty(){ return data.empty; }

    @property size_t length(){ return data.length; }

    void push(T val){ data ~= val;  }

    @trusted T pop()
    {
        assert(!empty);
        auto val = data[$ - 1];
        data = data[0 .. $ - 1];
        if (!__ctfe)
            cast(void) data.assumeSafeAppend();
        return val;
    }

    @property ref T top()
    {
        assert(!empty);
        return data[$ - 1];
    }
    }

    bool isAlpha()(dchar c) {return false;}
    CodepointSet wordCharacter()() { return CodepointSet.init;}
    dchar parseUniHex(Range)(ref Range str, size_t maxDigit) {
        return 0;
    }
    auto simpleCaseFoldings()(dchar ch) {
            static struct Range
    {
    @safe pure nothrow:
        uint idx; //if == uint.max, then read c.
        union
        {
            dchar c; // == 0 - empty range
            uint len;
        }
        @property bool isSmall() const { return idx == uint.max; }

        this(dchar ch)
        {
            idx = uint.max;
            c = ch;
        }

        this(uint start, uint size)
        {
            idx = start;
            len = size;
        }

        @property dchar front() const
        {
            return 0;
        }

        @property bool empty() const
        {
            if (isSmall)
            {
                return c == 0;
            }
            return len == 0;
        }

        @property size_t length() const
        {
            if (isSmall)
            {
                return c == 0 ? 0 : 1;
            }
            return len;
        }

        void popFront()
        {
            if (isSmall)
                c = 0;
            else
            {
                idx++;
                len--;
            }
        }
    }
    return Range.init;
    }
}
February 23, 2023
> std.regex is triggering decompression and bringing a whole pile of logic that wouldn't be required otherwise

This is good detective work. At minimum, please file a bugzilla issue with your analysis.

How about making the decompression code lazy?
February 24, 2023
On 24/02/2023 9:26 AM, Walter Bright wrote:
>  > std.regex is triggering decompression and bringing a whole pile of logic that wouldn't be required otherwise
> 
> This is good detective work. At minimum, please file a bugzilla issue with your analysis.

Sounds good, its just not the best idea to do it when you should be asleep ;)

> How about making the decompression code lazy?

It should already be lazy. Its just the wrong type of lazy.

Everything about std.uni and its tables is about tradeoffs. It is designed to be opt-in and to be small in binary. If you didn't care about binary sizes it would be easy enough to have it all in ROM ready to go, but it'll be over 8mb if you did that (mine is).

On that note, I recently looked at Unicode symbols for identifiers; we can shrink the is alpha LUT in dmd to ~1/9th its current size by updating to C11 :)

Unicode keeps growing, which is good for compilers, but horrible for standard libraries!
February 23, 2023

On Thursday, 23 February 2023 at 17:06:30 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

As well all know std.regex slows down our builds even if all you're doing is importing it.

So on Discord we were chatting and I got annoyed about it enough to look into it (which as we all know is a good way to make me do something about it).

To start off with lets do some base timings with dmd.

Here is my test module, disable the regex call as required.

import std.regex;

void main() {
    auto r = regex(`[a-z]`); // remove me
}

To compile this its 2.2s, to compile it without the regex call its 1.2s.

Can you try compiling this with LDC's --ftime-trace?

On my machine, it takes much shorter than 2.2s, and std.uni.unicode.parseSet!(Parser!(string, CodeGen)).parseSet (the only big std.uni piece) takes about 1/6th of the total semantic analysis time (-o-).

I'm curious to hear whether --ftime-trace would have helped you for this or not :)

Thanks!
Cheers,
Johan

February 23, 2023
On 2/23/2023 12:50 PM, Richard (Rikki) Andrew Cattermole wrote:
> Everything about std.uni and its tables is about tradeoffs. It is designed to be opt-in and to be small in binary. If you didn't care about binary sizes it would be easy enough to have it all in ROM ready to go, but it'll be over 8mb if you did that (mine is).

Another way is to generate the tables into a separate file when Phobos is built, and import that file.


> On that note, I recently looked at Unicode symbols for identifiers; we can shrink the is alpha LUT in dmd to ~1/9th its current size by updating to C11 :)

Let's do it!


> Unicode keeps growing, which is good for compilers, but horrible for standard libraries!

Unicode is a brilliant idea, but its doom comes from the execrable decision to apply semantic meaning to glyphs.
February 24, 2023
On Thursday, 23 February 2023 at 23:11:56 UTC, Walter Bright wrote:

>
> Unicode is a brilliant idea, but its doom comes from the execrable decision to apply semantic meaning to glyphs.

Unicode did not start that. For example, all Cyrillic encodings encode Latin А, K, H, etc. differently than the similarly looking Cyrillic counterparts. Whether that decision was execrable is highly debatable.
February 24, 2023
On 24/02/2023 12:11 PM, Walter Bright wrote:
> On 2/23/2023 12:50 PM, Richard (Rikki) Andrew Cattermole wrote:
>> Everything about std.uni and its tables is about tradeoffs. It is designed to be opt-in and to be small in binary. If you didn't care about binary sizes it would be easy enough to have it all in ROM ready to go, but it'll be over 8mb if you did that (mine is).
> 
> Another way is to generate the tables into a separate file when Phobos is built, and import that file.

We already do this. But instead we just commit the generated files, which is a lot better than not having to modify them by hand... *shudder*

>> On that note, I recently looked at Unicode symbols for identifiers; we can shrink the is alpha LUT in dmd to ~1/9th its current size by updating to C11 :)
> 
> Let's do it!

I probably should've looked at C23 draft spec to see how they were doing it before saying something like this. Because they are not doing the ranges anymore. Its a lot more complicated with TR31.

https://open-std.org/JTC1/SC22/WG14/www/docs/n3054.pdf

This will be in the realm of a rewrite I think and certainly DIP territory (guess I'm on it for the export/symbol/shared library DIP).

That DIP keeps getting larger and larger... with same scope, who knew those innocent looking symbols, all in their tables could be so complicated!
February 24, 2023
I'm going to be totally honest, I have no idea how to use that information.

Its not in a format that is easy to figure out.

What I would want is stuff like this:

```
module
|- function
| |- initialize template module thing @~200ms
| | |- ran CTFE on thingie @~150us
```

Give me that, and this type of hunting would be a cake walk I think.

Seeing what triggers a template to instantiate or CTFE is just as important as knowing how long it took.
February 24, 2023

On Friday, 24 February 2023 at 10:52:37 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

I'm going to be totally honest, I have no idea how to use that information.

Its not in a format that is easy to figure out.

The format is chromium's: https://www.chromium.org/developers/how-tos/trace-event-profiling-tool/

If you use the Chrome browser, go to about:tracing.
See here for some screenshots of what it looks like: https://aras-p.info/blog/2019/01/16/time-trace-timeline-flame-chart-profiler-for-Clang/

I have not yet seen a tool that converts it into ASCII as you propose, might be a nice project for someone to work on :)

(the chromium time trace format is used by other tools too, notably Clang, so the ASCII tool would be appreciated by more than just LDC users)

cheers,
Johan

February 25, 2023
Okay I have found something that can be improved very easily!

std.regex.internal.parser:

Inside of Parser struct:

```d
    @trusted void error(string msg)
    {
        import std.array : appender;
        import std.format.write : formattedWrite;
        auto app = appender!string();

        app ~= msg;
        app ~= "\nPattern with error: `";
        app ~= origin[0..$-pat.length];
        app ~= "` <--HERE-- `";
        app ~= pat;
        app ~= "`";

        throw new RegexException(app.data);
    }
```

That'll cut out ~100ms by removing formattedWrite!


Oooo ``static immutable CharMatcher matcher = CharMatcher(wordCharacter);`` is causing 541ms of slowness. And that line isn't showing up in the profile, so that could be improved, we need to also have CTFE initialization of say globals in it.

Next big jump for the above:

```d
@property auto wordMatcher()()
{
    return CharMatcher(unicode.Alphabetic | unicode.Mn | unicode.Mc
        | unicode.Me | unicode.Nd | unicode.Pc);
}
```

Add some pure annotations to CharMatcher and BitTable constructors.

These two things take out a good 700ms!

Looks like constructors are not showing up at all. KickStart from std.regex.internal.kickstart is not showing up for postprocess from std.regex.internal.parser. Not that we can do anything there just by simply removing stuff (text call shows up but it doesn't benefit much).

Okay looks like I'm at the 62ms mark. There is certainly more things to do but its starting to get into premature optimize territory individually, I'll do a PR for the above sets of changes.
« First   ‹ Prev
1 2 3 4 5