Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
July 16, 2010 [Issue 4474] New: Safer stdin.byLine() | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=4474 Summary: Safer stdin.byLine() Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: bearophile_hugs@eml.cc --- Comment #0 from bearophile_hugs@eml.cc 2010-07-16 16:21:32 PDT --- This is relative to page 16-17 of The D Programming Language. It explains stdin.byLine() and possible 'rather hard to find' bugs caused by not duplicating the input data. If I use D to write 20-lines long scripts I really don't want to remember to dup all things (in D1 code I sometimes end up dupping too much, to be on the safe side). So I suggest a different API for the line reading: - stdin.byLineMutable() (or another similar name, longer than "byLine" that makes it clear it doesn't copy): for the current behaviour that avoids a memory allocation for each line read. This is faster but it's less safe. - stdin.byLine(): that allocates a new string for each line, this is safer, as in Python (Python also uses heuristics to speed up this method as much as possible, because this is often a very common and performance-critical operation in scripts). All D default design policy says that unsafe but faster things need to be asked for, and the default things must be less bug-prone. If I write a small D script I can use byLine(), hoping to avoid some bugs. If later I see profiling shows me it's too much slow, I can replace the byLine() with the other method and optimize the code, carefully, removing some heap allocations. (An alternative design strategy is to keep just the byLine() method, but give it an optional default argument, like stdin.byLine(bool copy=True) or stdin.byLine(bool COPY=True)(), that on default copies the line with a new memory allocation.) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 17, 2010 [Issue 4474] Safer stdin.byLine() | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=4474 Andrei Alexandrescu <andrei@metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrei@metalanguage.com --- Comment #1 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-07-17 08:00:52 PDT --- byLine is safe. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 17, 2010 [Issue 4474] Better stdin.byLine() | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=4474 --- Comment #2 from bearophile_hugs@eml.cc 2010-07-17 08:29:20 PDT --- OK, changed title in "Better" instead of "Safer". -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 17, 2010 [Issue 4474] Better stdin.byLine() | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=4474 --- Comment #3 from bearophile_hugs@eml.cc 2010-07-17 09:10:49 PDT --- This is a small test program (dmd v2.047): import std.string, std.stdio; void main() { int[string] aa; foreach (line; stdin.byLine()) foreach (word; line.split()) aa[word]++; foreach (word, freq; aa) writeln(freq, " ", word); } Running with itself as input data: test.exe < test.d Prints: 1 eln(fr 1 q, " ", wo 1 writeln 1 } 1 " 1 } 1 } 1 writeln 2 wri 1 wri 1 ", word); )) 1 , w 1 q, " ", word); 1 eln(fr 1 q, " 1 freq, 1 ", 1 eln(freq, " 1 writeln(fr 1 word); 1 writeln(freq, 1 fre 1 e This shows that byLine() is bug-prone (unsafe). While this program: import std.string, std.stdio; void main() { int[string] aa; foreach (line; stdin.byLine()) foreach (word; line.split()) aa[word.dup]++; foreach (word, freq; aa) writeln(freq, " ", word); } Prints a more correct output: 1 (word, 1 std.stdio; 1 int[string] 1 } 1 " 1 void 1 import 3 foreach 1 main() 1 aa) 1 line.split()) 1 stdin.byLine()) 1 (line; 1 freq; 1 (word; 1 ", 1 std.string, 1 word); 1 writeln(freq, 1 aa[word.dup]++; 1 aa; 1 { It's easy to forget dupping/idupping. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 17, 2010 [Issue 4474] Better stdin.byLine() | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=4474 --- Comment #4 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-07-17 11:06:02 PDT --- That example is the manifestation of another bug: http://d.puremagic.com/issues/show_bug.cgi?id=2954 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 17, 2010 [Issue 4474] Better stdin.byLine() | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=4474 --- Comment #5 from bearophile_hugs@eml.cc 2010-07-17 11:46:28 PDT --- If you think this bug report is invalid and byLine() is safe (because the type system is enough, being able to tell apart char[] and string), then you can close this bug report. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 25, 2010 [Issue 4474] Better stdin.byLine() | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=4474 bearophile_hugs@eml.cc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID --- Comment #6 from bearophile_hugs@eml.cc 2010-07-24 19:07:33 PDT --- Bug closed because Andrei says byLine() is safe :-) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation