November 02, 2022
On Thu, Nov 03, 2022 at 12:34:11AM +1300, rikki cattermole via Digitalmars-d wrote:
> On 03/11/2022 12:20 AM, Hipreme wrote:
> > The greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now? Are you looking for fixes or an entire rework on it?
> 
> A feature that is known to have been useless is ctRegex, that needs to be deprecated. Perhaps that'll help things once removed?

While ctRegex probably should be removed, I don't think that's the problem.  Even when you don't use ctRegex, using regex() alone slows down compile times by 2-3 seconds.  I think it may be the excessive use of nested templates / CTFE deep inside std.regex's internal implementation.  I'm not sure if this can be fixed without rewriting from scratch (which we don't want to do -- that would be too big of an effort), but perhaps some careful profiling of the compiler might help pinpoint the most egregious parts of the code that could be improved.


T

-- 
Designer clothes: how to cover less by paying more.
November 02, 2022

On Wednesday, 2 November 2022 at 11:20:52 UTC, Hipreme wrote:

>

On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:

>

Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues.

Since lots of people ask what they can do to help push D language forward I thought one great way is to take on the responsibility for std modules that have lost their maintainers.

In particuar I willing to guide a volonteer into the low-level pits of std.regex and std.uni and hopefully let him or her continue the work I once envisioned for them or maybe choosing a different track of evolution altogether. Anyhow I'm willing to spend the time to transfer the knowledge so that at minimum there is someone more active than me to hold the line. std.regex is 2011's product with all of language bugs and quirks of that time, std.uni is 2012 and pretty much in the same position.

Anyway reply to this message or mail me

dmitry at olshansky dot me

--
Dmitry Olshansky

The greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now?

Basically this most likely has to do with static immutable tables initialized at compile-time and hence invoking heavy CTFE. Lazy initialization could be an option. Again someone have to look into it to be certain.

>

Are you looking for fixes or an entire rework on it?

What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.

November 02, 2022
On Wednesday, 2 November 2022 at 15:32:56 UTC, H. S. Teoh wrote:
> On Thu, Nov 03, 2022 at 12:34:11AM +1300, rikki cattermole via Digitalmars-d wrote:
>> On 03/11/2022 12:20 AM, Hipreme wrote:
>> > The greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now? Are you looking for fixes or an entire rework on it?
>> 
>> A feature that is known to have been useless is ctRegex, that needs to be deprecated.

I guess CTFEing big tables didn’t work since it’s been 10 years and we are exactly where it started - a proof of concept that is incredibly slow to compile with minor speed gains at run-time.

>> Perhaps that'll help things once removed?
>
> While ctRegex probably should be removed, I don't think that's the problem.  Even when you don't use ctRegex, using regex() alone slows down compile times by 2-3 seconds.  I think it may be the excessive use of nested templates / CTFE deep inside std.regex's internal implementation.

Regex is fairly simple in its use of templates - the whole thing is templated by Char which is hardly a big problem considering that Phobos is made of templates.

> I'm not sure if this can be fixed without rewriting from scratch (which we don't want to do -- that would be too big of an effort), but perhaps some careful profiling of the compiler might help pinpoint the most egregious parts of the code that could be improved.

I do not think one needs to go that deep, just look for immutable globals since that’s where the CTFE is which is synonym for slow.
>
> T


November 02, 2022
On Wednesday, 2 November 2022 at 18:57:27 UTC, Dmitry Olshansky wrote:
> On Wednesday, 2 November 2022 at 15:32:56 UTC, H. S. Teoh wrote:
>> On Thu, Nov 03, 2022 at 12:34:11AM +1300, rikki cattermole via Digitalmars-d wrote:
>>> On 03/11/2022 12:20 AM, Hipreme wrote:
>>
>> I'm not sure if this can be fixed without rewriting from scratch (which we don't want to do -- that would be too big of an effort), but perhaps some careful profiling of the compiler might help pinpoint the most egregious parts of the code that could be improved.
>
> I do not think one needs to go that deep, just look for immutable globals since that’s where the CTFE is which is synonym for slow.

Like this thingie:
https://github.com/dlang/phobos/blob/bf3ff35b8f1d40cb70a7584a563dc731a2c3ddad/std/regex/internal/ir.d#L52


November 03, 2022
On 03/11/2022 7:46 AM, Dmitry Olshansky wrote:
>> Are you looking for fixes or an entire rework on it?
> 
> What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.

One of the libraries I listed from day one that ImportC had to support is sljit.

They have their own regex implementation which of course is JIT'd.

It would be a good candidate to be included in Phobos.

https://github.com/zherczeg/sljit
November 03, 2022
On Thursday, 3 November 2022 at 05:39:17 UTC, rikki cattermole wrote:
> On 03/11/2022 7:46 AM, Dmitry Olshansky wrote:
>>> Are you looking for fixes or an entire rework on it?
>> 
>> What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
>
> One of the libraries I listed from day one that ImportC had to support is sljit.
>
> They have their own regex implementation which of course is JIT'd.
>
> It would be a good candidate to be included in Phobos.
>
> https://github.com/zherczeg/sljit

Having a JIT in Phobos would be fantastic. On the other hand if doing it in std is not a requirement doing a regex dub package that depends on e.g. this JIT library should work as well.

November 03, 2022
On Thursday, 3 November 2022 at 05:39:17 UTC, rikki cattermole wrote:
> On 03/11/2022 7:46 AM, Dmitry Olshansky wrote:
>>> Are you looking for fixes or an entire rework on it?
>> 
>> What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
>
> One of the libraries I listed from day one that ImportC had to support is sljit.
>
> They have their own regex implementation which of course is JIT'd.
>
> It would be a good candidate to be included in Phobos.
>
> https://github.com/zherczeg/sljit

Uses inline assembly so it’s pretty unlikely.
November 04, 2022
On 04/11/2022 5:58 AM, Dave P. wrote:
> Uses inline assembly so it’s pretty unlikely.

We could upstream disables and use our own implementation to replace it.

It shouldn't be a problem.
1 2 3
Next ›   Last »