View mode: basic / threaded / horizontal-split · Log in · Help
February 05, 2012
[Issue 7442] New: ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442

          Summary: ctRegex!`\p{Letter}` uses a lot memory in compilation
          Product: D
          Version: D2
         Platform: All
       OS/Version: All
           Status: NEW
         Keywords: CTFE
         Severity: critical
         Priority: P2
        Component: DMD
       AssignedTo: nobody@puremagic.com
       ReportedBy: kennytm@gmail.com


--- Comment #0 from kennytm@gmail.com 2012-02-05 02:30:37 PST ---
Test case:

------------------------
import std.regex;
enum bug7442 = ctRegex!`\p{Letter}`;
------------------------

Compile with `dmd -c test7442.d`. DMD will very quickly use up all system
memory. (I'll try to reduce the test case later.)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 20, 2012
[Issue 7442] ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442


Jay Norwood <jayn@prismnet.com> changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
                CC|                            |jayn@prismnet.com


--- Comment #1 from Jay Norwood <jayn@prismnet.com> 2012-03-19 20:11:58 PDT ---
I see probably a related issue when compiling either of these expressions in
win7

void wcpx(string fn)
{
   enum ctr =  ctRegex!(r"\w+","g");
}

void wcpx(string fn)
{
   enum ctr =  regex(r"\w+","g");
}

------ Build started: Project: a7, Configuration: Release Win32 ------
Building Release\a7.exe...
Error: out of memory
Building Release\a7.exe failed!

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 25, 2012
[Issue 7442] ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442


josvanuden@gmail.com changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
                CC|                            |josvanuden@gmail.com


--- Comment #2 from josvanuden@gmail.com 2012-03-25 12:18:47 PDT ---
Same here (win7). Either of these cause out of memory error:

  enum r1 = ctRegex!`\w`;
  enum r2 = ctRegex!`\W`;

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 11, 2012
[Issue 7442] ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442


Don <clugdbug@yahoo.com.au> changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
          Keywords|CTFE                        |
                CC|                            |clugdbug@yahoo.com.au
         Component|DMD                         |Phobos


--- Comment #3 from Don <clugdbug@yahoo.com.au> 2012-04-11 01:06:51 PDT ---
This is slow because the code makes 15000 concatenations (and I didn't check
what else it does, there may be other performance issues). That would be poor
performance even in runtime code.
The memory usage is exacerbated by CTFE bug 1382 and bug 6498, but basically
this is a Phobos bug.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 11, 2012
[Issue 7442] ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
                CC|                            |dmitry.olsh@gmail.com


--- Comment #4 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-11 03:25:39 PDT ---
15000 ? Wow. At any rate I tried hard to make it so that the thing never
reallocates unless required (but CTFE has no support for assumeSafeAppend) so
these should mostly be an in-place appends.

Still I'll recheck if the R-T version is fast/slow maybe I missed something.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 13, 2012
[Issue 7442] ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442



--- Comment #5 from Don <clugdbug@yahoo.com.au> 2012-04-13 07:35:32 PDT ---
(In reply to comment #4)
> 15000 ? Wow. At any rate I tried hard to make it so that the thing never
> reallocates unless required (but CTFE has no support for assumeSafeAppend) so
> these should mostly be an in-place appends.
> 
> Still I'll recheck if the R-T version is fast/slow maybe I missed something.

By the way, if you go into the compiler source, file interpret.c, line 34 is
#define SHOWPERFORMANCE 0
If you change that 0 to 1, and then recompile, it will tell you some basic CTFE
stats, eg how many assignments it performed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 13, 2012
[Issue 7442] ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442



--- Comment #6 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-13 11:34:43 PDT ---
(In reply to comment #5)
> 
> By the way, if you go into the compiler source, file interpret.c, line 34 is
> #define SHOWPERFORMANCE 0
> If you change that 0 to 1, and then recompile, it will tell you some basic CTFE
> stats, eg how many assignments it performed.

Cool. I'm going to use this extensively. Might as well reduce ctRegex as some
kind of benchmark for CTFE.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 17, 2012
[Issue 7442] ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442



--- Comment #7 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-17 04:26:21 PDT ---
I investigated this further and conclude that there are 2 factors at work.

I removed few thousands of codepoints from Letter, so it doesn't run out of RAM
outright.(see code below)
Also separated parse from build steps.

Here are collected stats on CTFE.

parse only:

       ---- CTFE Performance ----
max call depth = 20     max stack = 44
array allocs = 2761     assignments = 430837

build:
       ---- CTFE Performance ----
max call depth = 20     max stack = 73
array allocs = 8264     assignments = 1293421

Parsing creates all the datastructures for unicode table machinery
it takes slightly less then half of all allocs already.
Another thing to notice is it fetures higher allocations per assigment.

Then comes the codegen step and it's CTFE only and far more alloc happy. 
Frankly I see no way to reduce all of this alloc fun because of COW
that will ruin any attempt to preallocate buffer for generated code. 
Am I right that arrays do dup on every write?

--- test program ---

import std.regex;
void main(){
   version(parse)
       static r = regex(set);
   else //build
       static r = ctRegex!set;
}

enum set =
`[A-Za-z\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02C1\u02C6-\u02D1\u02E0-\u02E4\u02EC\u02EE\u0370-\u0374\u0376\u0377
\u037A-\u037D\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03F5\u03F7-\u0481\u048A-\u0527\u0531-\u0556\u0559\u0561-\u0587\u05D0-\u05EA
\u05F0-\u05F2\u0620-\u064A\u066E\u066F\u0671-\u06D3\u06D5\u06E5\u06E6\u06EE\u06EF\u06FA-\u06FC\u06FF\u0710\u0712-\u072F\u074D-\u07A5
\u07B1\u07CA-\u07EA\u07F4\u07F5\u07FA\u0800-\u0815\u081A\u0824\u0828\u0840-\u0858\u08A0\u08A2-\u08AC\u0904-\u0939\u093D\u0950\u0958-\u0961
\u0971-\u0977\u0979-\u097F\u0985-\u098C\u098F\u0990\u0993-\u09A8\u09AA-\u09B0\u09B2\u09B6-\u09B9\u09BD\u09CE\u09DC\u09DD\u09DF-\u09E1
\u09F0\u09F1\u0A05-\u0A0A\u0A0F\u0A10\u0A13-\u0A28\u0A2A-\u0A30\u0A32\u0A33\u0A35\u0A36\u0A38\u0A39\u0A59-\u0A5C\u0A5E\u0A72-\u0A74
\u0A85-\u0A8D\u0A8F-\u0A91\u0A93-\u0AA8\u0AAA-\u0AB0\u0AB2\u0AB3\u0AB5-\u0AB9\u0ABD\u0AD0\u0AE0\u0AE1\u0B05-\u0B0C\u0B0F\u0B10
\u0B13-\u0B28\u0B2A-\u0B30\u0B32\u0B33\u0B35-\u0B39\u0B3D\u0B5C\u0B5D\u0B5F-\u0B61\u0B71\u0B83\u0B85-\u0B8A\u0B8E-\u0B90
\u0B92-\u0B95\u0B99\u0B9A\u0B9C\u0B9E\u0B9F\u0BA3\u0BA4\u0BA8-\u0BAA\u0BAE-\u0BB9\u0BD0\u0C05-\u0C0C\u0C0E-\u0C10\u0C12-\u0C28
\u0C2A-\u0C33\u0C35-\u0C39\u0C3D\u0C58\u0C59\u0C60\u0C61\u0C85-\u0C8C\u0C8E-\u0C90\u0C92-\u0CA8\u0CAA-\u0CB3\u0CB5-\u0CB9
\u0CBD\u0CDE\u0CE0\u0CE1\u0CF1\u0CF2\u0D05-\u0D0C\u0D0E-\u0D10\u0D12-\u0D3A\u0D3D\u0D4E\u0D60\u0D61\u0D7A-\u0D7F\u0D85-\u0D96
\u0D9A-\u0DB1\u0DB3-\u0DBB\u0DBD\u0DC0-\u0DC6\u0E01-\u0E30\u0E32\u0E33\u0E40-\u0E46\u0E81\u0E82\u0E84\u0E87\u0E88\u0E8A\u0E8D
\u0E94-\u0E97\u0E99-\u0E9F\u0EA1-\u0EA3\u0EA5\u0EA7\u0EAA\u0EAB\u0EAD-\u0EB0\u0EB2\u0EB3\u0EBD\u0EC0-\u0EC4\u0EC6\u0EDC-\u0EDF
\u0F00\u0F40-\u0F47\u0F49-\u0F6C\u0F88-\u0F8C\u1000-\u102A\u103F\u1050-\u1055\u105A-\u105D\u1061\u1065\u1066\u106E-\u1070\u1075-\u1081
\u108E\u10A0-\u10C5\u10C7\u10CD\u10D0-\u10FA\u10FC-\u1248\u124A-\u124D\u1250-\u1256\u1258\u125A-\u125D\u1260-\u1288\u128A-\u128D
\u1290-\u12B0\u12B2-\u12B5\u12B8-\u12BE\u12C0\u12C2-\u12C5\u12C8-\u12D6\u12D8-\u1310\u1312-\u1315\u1318-\u135A\u1380-\u138F\u13A0-\u13F4
\u1401-\u166C\u166F-\u167F\u1681-\u169A\u16A0-\u16EA\u1700-\u170C\u170E-\u1711\u1720-\u1731\u1740-\u1751\u1760-\u176C\u176E-\u1770\u1780-\u17B3
\u17D7\u17DC\u1820-\u1877\u1880-\u18A8\u18AA\u18B0-\u18F5\u1900-\u191C\u1950-\u196D\u1970-\u1974\u1980-\u19AB\u19C1-\u19C7\u1A00-\u1A16\u1A20-\u1A54
\u1AA7\u1B05-\u1B33\u1B45-\u1B4B\u1B83-\u1BA0\u1BAE\u1BAF\u1BBA-\u1BE5\u1C00-\u1C23\u1C4D-\u1C4F\u1C5A-\u1C7D\u1CE9-\u1CEC\u1CEE-\u1CF1\u1CF5\u1CF6
\u1D00-\u1DBF\u1E00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FBC\u1FBE\u1FC2-\u1FC4
\u1FC6-\u1FCC\u1FD0-\u1FD3\u1FD6-\u1FDB\u1FE0-\u1FEC\u1FF2-\u1FF4\u1FF6-\u1FFC\u2071\u207F\u2090-\u209C\u2102\u2107\u210A-\u2113\u2115\u2119-\u211D\u2124
\u2126\u2128\u212A-\u212D\u212F-\u2139\u213C-\u213F\u2145-\u2149\u214E\u2183\u2184\u2C00-\u2C2E\u2C30-\u2C5E\u2C60-\u2CE4\u2CEB-\u2CEE\u2CF2
\u2CF3\u2D00-\u2D25\u2D27\u2D2D\u2D30-\u2D67\u2D6F\u2D80-\u2D96\u2DA0-\u2DA6\u2DA8-\u2DAE\u2DB0-\u2DB6\u2DB8-\u2DBE\u2DC0-\u2DC6\u2DC8-\u2DCE
\u2DD0-\u2DD6\u2DD8-\u2DDE\u2E2F\u3005\u3006\u3031-\u3035\u303B\u303C\u3041-\u3096\u309D-\u309F\u30A1-\u30FA\u30FC-\u30FF\u3105-\u312D\u3131-\u318E
\u31A0-\u31BA\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FCC\uA000-\uA48C\uA4D0-\uA4FD\uA500-\uA60C\uA610-\uA61F\uA62A\uA62B\uA640-\uA66E\uA67F-\uA697\uA6A0-\uA6E5
\uA717-\uA71F\uA722-\uA788\uA78B-\uA78E\uA790-\uA793\uA7A0-\uA7AA\uA7F8-\uA801\uA803-\uA805\uA807-\uA80A\uA80C-\uA822\uA840-\uA873\uA882-\uA8B3\uA8F2-\uA8F7
\uA8FB\uA90A-\uA925\uA930-\uA946\uA960-\uA97C\uA984-\uA9B2\uA9CF\uAA00-\uAA28\uAA40-\uAA42\uAA44-\uAA4B\uAA60-\uAA76\uAA7A\uAA80-\uAAAF\uAAB1\uAAB5
\uAAB6\uAAB9-\uAABD\uAAC0\uAAC2\uAADB-\uAADD\uAAE0-\uAAEA\uAAF2-\uAAF4\uAB01-\uAB06\uAB09-\uAB0E\uAB11-\uAB16\uAB20-\uAB26\uAB28-\uAB2E\uABC0-\uABE2
\uAC00-\uD7A3\uD7B0-\uD7C6\uD7CB-\uD7FB\uF900-\uFA6D\uFA70-\uFAD9\uFB00-\uFB06\uFB13-\uFB17\uFB1D\uFB1F-\uFB28\uFB2A-\uFB36\uFB38-\uFB3C\uFB3E\uFB40
\uFB41\uFB43\uFB44\uFB46-\uFBB1\uFBD3-\uFD3D\uFD50-\uFD8F\uFD92-\uFDC7\uFDF0-\uFDFB\uFE70-\uFE74\uFE76-\uFEFC\uFF21-\uFF3A\uFF41-\uFF5A\uFF66-\uFFBE
\uFFC2-\uFFC7\uFFCA-\uFFCF\uFFD2-\uFFD7\uFFDA-\uFFDC\U00010000-\U0001000B\U0001000D-\U00010026\U00010028-\U0001003A\U0001003C\U0001003D\U0001003F-\U0001004D
\U00010050-\U0001005D\U00010080-\U000100FA\U00010280-\U0001029C\U000102A0-\U000102D0\U00010300-\U0001031E\U00010330-\U00010340\U00010342-\U00010349\U00010380-\U0001039D
\U000103A0-\U000103C3\U000103C8-\U000103CF\U00010400-\U0001049D\U00010800-\U00010805\U00010808\U0001080A-\U00010835\U00010837\U00010838\U0001083C\U0001083F-\U00010855
\U00010900-\U00010915\U00010920-\U00010939\U00010980-\U000109B7\U000109BE\U000109BF\U00010A00\U00010A10-\U00010A13\U00010A15-\U00010A17\U00010A19-\U00010A33
\U00010A60-\U00010A7C\U00010B00-\U00010B35\U00010B40-\U00010B55\U00010B60-\U00010B72\U00010C00-\U00010C48\U00011003-\U00011037\U00011083-\U000110AF
\U000110D0-\U000110E8\U00011103-\U00011126\U00011183-\U000111B2\U000111C1-\U000111C4\U00011680-\U000116AA\U00012000-\U0001236E\U00013000-\U0001342E\U00016800-\U00016A38
\U00016F00-\U00016F44\U00016F50\U00016F93-\U00016F9F\U0001B000\U0001B001\U0001D400-\U0001D454\U0001D456-\U0001D49C\U0001D49E\U0001D49F\U0001D4A2\U0001D4A5\U0001D4A6\U0001D4A9-\U0001D4AC
\U0001D4AE-\U0001D4B9\U0001D4BB\U0001D4BD-\U0001D4C3\U0001D4C5-\U0001D505\U0001D507-\U0001D50A\U0001D50D-\U0001D514\U0001D516-\U0001D51C\U0001D51E-\U0001D539\U0001D53B-\U0001D53E
\U0001D540-\U0001D544\U0001D546\U0001D54A-\U0001D550\U0001D552-\U0001D6A5\U0001D6A8-\U0001D6C0\U0001D6C2-\U0001D6DA\U0001D6DC-\U0001D6FA\U0001D6FC-\U0001D714
\U0001D716-\U0001D734\U0001D736-\U0001D74E\U0001D750-\U0001D76E\U0001D770-\U0001D788\U0001D78A-\U0001D7A8\U0001D7AA-\U0001D7C2\U0001D7C4-\U0001D7CB
\U0001EE00-\U0001EE03\U0001EE05-\U0001EE1F\U0001EE21\U0001EE22\U0001EE24\U0001EE27\U0001EE29-\U0001EE32\U0001EE34-\U0001EE37\U0001EE39\U0001EE3B\U0001EE42
\U0001EEA1-\U0001EEA3\U0001EEA5-\U0001EEA9\U0001EEAB-\U0001EEBB\U00020000-\U0002A6D6\U0002A700-\U0002B734\U0002B740-\U0002B81D\U0002F800-\U0002FA1D]`;

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 17, 2012
[Issue 7442] ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442



--- Comment #8 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-17 04:35:31 PDT ---
Created an attachment (id=1096)
Benchmark unicode Trie

Benchmark runs core part of parse step with huge character classes. 
Currently it chokes on 2 iterations. 
Unless it can swallow at least 5 CTFE _parsing_ is almost unusable.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 19, 2012
[Issue 7442] ctRegex!`\p{Letter}` uses a lot memory in compilation
http://d.puremagic.com/issues/show_bug.cgi?id=7442


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
                CC|                            |thornik@gmail.com


--- Comment #9 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-04-19 08:53:00 PDT ---
*** Issue 7928 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home