Jump to page: 1 2
Thread overview
[Issue 7442] New: ctRegex!`\p{Letter}` uses a lot memory in compilation
Feb 05, 2012
kennytm@gmail.com
Mar 20, 2012
Jay Norwood
Apr 11, 2012
Don
Apr 11, 2012
Dmitry Olshansky
Apr 13, 2012
Don
Apr 13, 2012
Dmitry Olshansky
Apr 17, 2012
Dmitry Olshansky
Apr 17, 2012
Dmitry Olshansky
Apr 19, 2012
Dmitry Olshansky
Apr 29, 2012
Don
Apr 29, 2012
Dmitry Olshansky
Mar 14, 2013
Dmitry Olshansky
Jun 07, 2013
Jameson
February 05, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442

           Summary: ctRegex!`\p{Letter}` uses a lot memory in compilation
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: CTFE
          Severity: critical
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: kennytm@gmail.com


--- Comment #0 from kennytm@gmail.com 2012-02-05 02:30:37 PST ---
Test case:

------------------------
import std.regex;
enum bug7442 = ctRegex!`\p{Letter}`;
------------------------

Compile with `dmd -c test7442.d`. DMD will very quickly use up all system memory. (I'll try to reduce the test case later.)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 20, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442


Jay Norwood <jayn@prismnet.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jayn@prismnet.com


--- Comment #1 from Jay Norwood <jayn@prismnet.com> 2012-03-19 20:11:58 PDT ---
I see probably a related issue when compiling either of these expressions in win7

void wcpx(string fn)
{
    enum ctr =  ctRegex!(r"\w+","g");
}

void wcpx(string fn)
{
    enum ctr =  regex(r"\w+","g");
}

------ Build started: Project: a7, Configuration: Release Win32 ------
Building Release\a7.exe...
Error: out of memory
Building Release\a7.exe failed!

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 25, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442


josvanuden@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |josvanuden@gmail.com


--- Comment #2 from josvanuden@gmail.com 2012-03-25 12:18:47 PDT ---
Same here (win7). Either of these cause out of memory error:

   enum r1 = ctRegex!`\w`;
   enum r2 = ctRegex!`\W`;

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 11, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442


Don <clugdbug@yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|CTFE                        |
                 CC|                            |clugdbug@yahoo.com.au
          Component|DMD                         |Phobos


--- Comment #3 from Don <clugdbug@yahoo.com.au> 2012-04-11 01:06:51 PDT ---
This is slow because the code makes 15000 concatenations (and I didn't check
what else it does, there may be other performance issues). That would be poor
performance even in runtime code.
The memory usage is exacerbated by CTFE bug 1382 and bug 6498, but basically
this is a Phobos bug.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 11, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com


--- Comment #4 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-11 03:25:39 PDT ---
15000 ? Wow. At any rate I tried hard to make it so that the thing never reallocates unless required (but CTFE has no support for assumeSafeAppend) so these should mostly be an in-place appends.

Still I'll recheck if the R-T version is fast/slow maybe I missed something.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442



--- Comment #5 from Don <clugdbug@yahoo.com.au> 2012-04-13 07:35:32 PDT ---
(In reply to comment #4)
> 15000 ? Wow. At any rate I tried hard to make it so that the thing never reallocates unless required (but CTFE has no support for assumeSafeAppend) so these should mostly be an in-place appends.
> 
> Still I'll recheck if the R-T version is fast/slow maybe I missed something.

By the way, if you go into the compiler source, file interpret.c, line 34 is
#define SHOWPERFORMANCE 0
If you change that 0 to 1, and then recompile, it will tell you some basic CTFE
stats, eg how many assignments it performed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442



--- Comment #6 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-13 11:34:43 PDT ---
(In reply to comment #5)
> 
> By the way, if you go into the compiler source, file interpret.c, line 34 is
> #define SHOWPERFORMANCE 0
> If you change that 0 to 1, and then recompile, it will tell you some basic CTFE
> stats, eg how many assignments it performed.

Cool. I'm going to use this extensively. Might as well reduce ctRegex as some kind of benchmark for CTFE.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 17, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442



--- Comment #7 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-17 04:26:21 PDT ---
I investigated this further and conclude that there are 2 factors at work.

I removed few thousands of codepoints from Letter, so it doesn't run out of RAM
outright.(see code below)
Also separated parse from build steps.

Here are collected stats on CTFE.

parse only:

        ---- CTFE Performance ----
max call depth = 20     max stack = 44
array allocs = 2761     assignments = 430837

build:
        ---- CTFE Performance ----
max call depth = 20     max stack = 73
array allocs = 8264     assignments = 1293421

Parsing creates all the datastructures for unicode table machinery
it takes slightly less then half of all allocs already.
Another thing to notice is it fetures higher allocations per assigment.

Then comes the codegen step and it's CTFE only and far more alloc happy.
Frankly I see no way to reduce all of this alloc fun because of COW
that will ruin any attempt to preallocate buffer for generated code.
Am I right that arrays do dup on every write?

--- test program ---

import std.regex;
void main(){
    version(parse)
        static r = regex(set);
    else //build
        static r = ctRegex!set;
}

enum set =
`[A-Za-z\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02C1\u02C6-\u02D1\u02E0-\u02E4\u02EC\u02EE\u0370-\u0374\u0376\u0377
\u037A-\u037D\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03F5\u03F7-\u0481\u048A-\u0527\u0531-\u0556\u0559\u0561-\u0587\u05D0-\u05EA
\u05F0-\u05F2\u0620-\u064A\u066E\u066F\u0671-\u06D3\u06D5\u06E5\u06E6\u06EE\u06EF\u06FA-\u06FC\u06FF\u0710\u0712-\u072F\u074D-\u07A5
\u07B1\u07CA-\u07EA\u07F4\u07F5\u07FA\u0800-\u0815\u081A\u0824\u0828\u0840-\u0858\u08A0\u08A2-\u08AC\u0904-\u0939\u093D\u0950\u0958-\u0961
\u0971-\u0977\u0979-\u097F\u0985-\u098C\u098F\u0990\u0993-\u09A8\u09AA-\u09B0\u09B2\u09B6-\u09B9\u09BD\u09CE\u09DC\u09DD\u09DF-\u09E1
\u09F0\u09F1\u0A05-\u0A0A\u0A0F\u0A10\u0A13-\u0A28\u0A2A-\u0A30\u0A32\u0A33\u0A35\u0A36\u0A38\u0A39\u0A59-\u0A5C\u0A5E\u0A72-\u0A74
\u0A85-\u0A8D\u0A8F-\u0A91\u0A93-\u0AA8\u0AAA-\u0AB0\u0AB2\u0AB3\u0AB5-\u0AB9\u0ABD\u0AD0\u0AE0\u0AE1\u0B05-\u0B0C\u0B0F\u0B10
\u0B13-\u0B28\u0B2A-\u0B30\u0B32\u0B33\u0B35-\u0B39\u0B3D\u0B5C\u0B5D\u0B5F-\u0B61\u0B71\u0B83\u0B85-\u0B8A\u0B8E-\u0B90
\u0B92-\u0B95\u0B99\u0B9A\u0B9C\u0B9E\u0B9F\u0BA3\u0BA4\u0BA8-\u0BAA\u0BAE-\u0BB9\u0BD0\u0C05-\u0C0C\u0C0E-\u0C10\u0C12-\u0C28
\u0C2A-\u0C33\u0C35-\u0C39\u0C3D\u0C58\u0C59\u0C60\u0C61\u0C85-\u0C8C\u0C8E-\u0C90\u0C92-\u0CA8\u0CAA-\u0CB3\u0CB5-\u0CB9
\u0CBD\u0CDE\u0CE0\u0CE1\u0CF1\u0CF2\u0D05-\u0D0C\u0D0E-\u0D10\u0D12-\u0D3A\u0D3D\u0D4E\u0D60\u0D61\u0D7A-\u0D7F\u0D85-\u0D96
\u0D9A-\u0DB1\u0DB3-\u0DBB\u0DBD\u0DC0-\u0DC6\u0E01-\u0E30\u0E32\u0E33\u0E40-\u0E46\u0E81\u0E82\u0E84\u0E87\u0E88\u0E8A\u0E8D
\u0E94-\u0E97\u0E99-\u0E9F\u0EA1-\u0EA3\u0EA5\u0EA7\u0EAA\u0EAB\u0EAD-\u0EB0\u0EB2\u0EB3\u0EBD\u0EC0-\u0EC4\u0EC6\u0EDC-\u0EDF
\u0F00\u0F40-\u0F47\u0F49-\u0F6C\u0F88-\u0F8C\u1000-\u102A\u103F\u1050-\u1055\u105A-\u105D\u1061\u1065\u1066\u106E-\u1070\u1075-\u1081
\u108E\u10A0-\u10C5\u10C7\u10CD\u10D0-\u10FA\u10FC-\u1248\u124A-\u124D\u1250-\u1256\u1258\u125A-\u125D\u1260-\u1288\u128A-\u128D
\u1290-\u12B0\u12B2-\u12B5\u12B8-\u12BE\u12C0\u12C2-\u12C5\u12C8-\u12D6\u12D8-\u1310\u1312-\u1315\u1318-\u135A\u1380-\u138F\u13A0-\u13F4
\u1401-\u166C\u166F-\u167F\u1681-\u169A\u16A0-\u16EA\u1700-\u170C\u170E-\u1711\u1720-\u1731\u1740-\u1751\u1760-\u176C\u176E-\u1770\u1780-\u17B3
\u17D7\u17DC\u1820-\u1877\u1880-\u18A8\u18AA\u18B0-\u18F5\u1900-\u191C\u1950-\u196D\u1970-\u1974\u1980-\u19AB\u19C1-\u19C7\u1A00-\u1A16\u1A20-\u1A54
\u1AA7\u1B05-\u1B33\u1B45-\u1B4B\u1B83-\u1BA0\u1BAE\u1BAF\u1BBA-\u1BE5\u1C00-\u1C23\u1C4D-\u1C4F\u1C5A-\u1C7D\u1CE9-\u1CEC\u1CEE-\u1CF1\u1CF5\u1CF6
\u1D00-\u1DBF\u1E00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FBC\u1FBE\u1FC2-\u1FC4
\u1FC6-\u1FCC\u1FD0-\u1FD3\u1FD6-\u1FDB\u1FE0-\u1FEC\u1FF2-\u1FF4\u1FF6-\u1FFC\u2071\u207F\u2090-\u209C\u2102\u2107\u210A-\u2113\u2115\u2119-\u211D\u2124
\u2126\u2128\u212A-\u212D\u212F-\u2139\u213C-\u213F\u2145-\u2149\u214E\u2183\u2184\u2C00-\u2C2E\u2C30-\u2C5E\u2C60-\u2CE4\u2CEB-\u2CEE\u2CF2
\u2CF3\u2D00-\u2D25\u2D27\u2D2D\u2D30-\u2D67\u2D6F\u2D80-\u2D96\u2DA0-\u2DA6\u2DA8-\u2DAE\u2DB0-\u2DB6\u2DB8-\u2DBE\u2DC0-\u2DC6\u2DC8-\u2DCE
\u2DD0-\u2DD6\u2DD8-\u2DDE\u2E2F\u3005\u3006\u3031-\u3035\u303B\u303C\u3041-\u3096\u309D-\u309F\u30A1-\u30FA\u30FC-\u30FF\u3105-\u312D\u3131-\u318E
\u31A0-\u31BA\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FCC\uA000-\uA48C\uA4D0-\uA4FD\uA500-\uA60C\uA610-\uA61F\uA62A\uA62B\uA640-\uA66E\uA67F-\uA697\uA6A0-\uA6E5
\uA717-\uA71F\uA722-\uA788\uA78B-\uA78E\uA790-\uA793\uA7A0-\uA7AA\uA7F8-\uA801\uA803-\uA805\uA807-\uA80A\uA80C-\uA822\uA840-\uA873\uA882-\uA8B3\uA8F2-\uA8F7
\uA8FB\uA90A-\uA925\uA930-\uA946\uA960-\uA97C\uA984-\uA9B2\uA9CF\uAA00-\uAA28\uAA40-\uAA42\uAA44-\uAA4B\uAA60-\uAA76\uAA7A\uAA80-\uAAAF\uAAB1\uAAB5
\uAAB6\uAAB9-\uAABD\uAAC0\uAAC2\uAADB-\uAADD\uAAE0-\uAAEA\uAAF2-\uAAF4\uAB01-\uAB06\uAB09-\uAB0E\uAB11-\uAB16\uAB20-\uAB26\uAB28-\uAB2E\uABC0-\uABE2
\uAC00-\uD7A3\uD7B0-\uD7C6\uD7CB-\uD7FB\uF900-\uFA6D\uFA70-\uFAD9\uFB00-\uFB06\uFB13-\uFB17\uFB1D\uFB1F-\uFB28\uFB2A-\uFB36\uFB38-\uFB3C\uFB3E\uFB40
\uFB41\uFB43\uFB44\uFB46-\uFBB1\uFBD3-\uFD3D\uFD50-\uFD8F\uFD92-\uFDC7\uFDF0-\uFDFB\uFE70-\uFE74\uFE76-\uFEFC\uFF21-\uFF3A\uFF41-\uFF5A\uFF66-\uFFBE
\uFFC2-\uFFC7\uFFCA-\uFFCF\uFFD2-\uFFD7\uFFDA-\uFFDC\U00010000-\U0001000B\U0001000D-\U00010026\U00010028-\U0001003A\U0001003C\U0001003D\U0001003F-\U0001004D
\U00010050-\U0001005D\U00010080-\U000100FA\U00010280-\U0001029C\U000102A0-\U000102D0\U00010300-\U0001031E\U00010330-\U00010340\U00010342-\U00010349\U00010380-\U0001039D
\U000103A0-\U000103C3\U000103C8-\U000103CF\U00010400-\U0001049D\U00010800-\U00010805\U00010808\U0001080A-\U00010835\U00010837\U00010838\U0001083C\U0001083F-\U00010855
\U00010900-\U00010915\U00010920-\U00010939\U00010980-\U000109B7\U000109BE\U000109BF\U00010A00\U00010A10-\U00010A13\U00010A15-\U00010A17\U00010A19-\U00010A33
\U00010A60-\U00010A7C\U00010B00-\U00010B35\U00010B40-\U00010B55\U00010B60-\U00010B72\U00010C00-\U00010C48\U00011003-\U00011037\U00011083-\U000110AF
\U000110D0-\U000110E8\U00011103-\U00011126\U00011183-\U000111B2\U000111C1-\U000111C4\U00011680-\U000116AA\U00012000-\U0001236E\U00013000-\U0001342E\U00016800-\U00016A38
\U00016F00-\U00016F44\U00016F50\U00016F93-\U00016F9F\U0001B000\U0001B001\U0001D400-\U0001D454\U0001D456-\U0001D49C\U0001D49E\U0001D49F\U0001D4A2\U0001D4A5\U0001D4A6\U0001D4A9-\U0001D4AC
\U0001D4AE-\U0001D4B9\U0001D4BB\U0001D4BD-\U0001D4C3\U0001D4C5-\U0001D505\U0001D507-\U0001D50A\U0001D50D-\U0001D514\U0001D516-\U0001D51C\U0001D51E-\U0001D539\U0001D53B-\U0001D53E
\U0001D540-\U0001D544\U0001D546\U0001D54A-\U0001D550\U0001D552-\U0001D6A5\U0001D6A8-\U0001D6C0\U0001D6C2-\U0001D6DA\U0001D6DC-\U0001D6FA\U0001D6FC-\U0001D714
\U0001D716-\U0001D734\U0001D736-\U0001D74E\U0001D750-\U0001D76E\U0001D770-\U0001D788\U0001D78A-\U0001D7A8\U0001D7AA-\U0001D7C2\U0001D7C4-\U0001D7CB
\U0001EE00-\U0001EE03\U0001EE05-\U0001EE1F\U0001EE21\U0001EE22\U0001EE24\U0001EE27\U0001EE29-\U0001EE32\U0001EE34-\U0001EE37\U0001EE39\U0001EE3B\U0001EE42
\U0001EEA1-\U0001EEA3\U0001EEA5-\U0001EEA9\U0001EEAB-\U0001EEBB\U00020000-\U0002A6D6\U0002A700-\U0002B734\U0002B740-\U0002B81D\U0002F800-\U0002FA1D]`;

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 17, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442



--- Comment #8 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-17 04:35:31 PDT ---
Created an attachment (id=1096)
Benchmark unicode Trie

Benchmark runs core part of parse step with huge character classes. Currently it chokes on 2 iterations. Unless it can swallow at least 5 CTFE _parsing_ is almost unusable.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 19, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7442


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thornik@gmail.com


--- Comment #9 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-19 08:53:00 PDT ---
*** Issue 7928 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2