Thread overview | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 27, 2017 [Issue 17229] File.byChunk w/ stdout.lockingTextWriter is very slow | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 --- Comment #1 from Jon Degenhardt <jrdemail2000-dlang@yahoo.com> --- (In reply to Jon Degenhardt from comment #0) > Results for the count 9's program, against the 2.7, 14 million line file: > byLine: 8.98 seconds > byChunk: 1.64 seconds Update: The byLine Count 9s program given above is being affected by autodecode. Changing counting line as follows: from: chunkedStream.each!(x => count += x.count('9')); to: chunkedStream.each!(x => count += (cast(ubyte[])x).count('9')); Gives updated times: byLine: 3.13 byChunk: 1.64 This is more consistent. byChunk is faster than byLine in the Count 9s task (read and access data), by not by 5x. The 15x performance deficit of byChunk relative to byLine on the file copy task remains. -- |
February 28, 2017 [Issue 17229] File.byChunk w/ stdout.lockingTextWriter is very slow | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 Steven Schveighoffer <schveiguy@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |schveiguy@yahoo.com --- Comment #2 from Steven Schveighoffer <schveiguy@yahoo.com> --- The issue is that LockingTextWriter accepts anything but character range. What is happening (I think) is that lockingTextWriter.put(someUbyteArray) is actually putting each ubyte into the stream one at a time after converting each ubyte into a dchar through integer promotion. This can't be the intention. I think instead of the constraint is(ElementType!A : const(dchar)) what we need is something more like: is(Unqual!(ElementType!A) == dchar) This might break code, but I think code that is already broken, such as the example, no? -- |
March 01, 2017 [Issue 17229] File.byChunk w/ stdout.lockingTextWriter is very slow | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 --- Comment #3 from Jon Degenhardt <jrdemail2000-dlang@yahoo.com> --- I've confirmed that File.byChunk with lockingTextWriter corrupts utf-8 encoded files. I used the unicode test file: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt and the example given with the File.byChunk documentation: // Efficient file copy, 1MB at a time. import std.algorithm, std.stdio; void main() { stdin.byChunk(1024 * 1024).copy(stdout.lockingTextWriter()); } This file copy program corrupts the unicode characters as described in Steven's comment. This is a quite problematic, both because of character corruption and because it is an example in the documentation. The new method, lockingBinaryWriter, copies the file correctly. It is available starting with 2.073.1. lockingBinaryWriter also copies the file quickly, eliminating the performance issue. It is appears from the PR for lockingBinaryWriter (https://github.com/dlang/phobos/pull/2011) that there was discussion of the roles of Binary and Text writer. Regardless of availability of the lockingBinaryWriter, the lockingTextWriter certainly looks broken when used with the ubyte data type. Personally, I think it makes sense for lockingTextWriter to assume ubyte arrays are correctly encoded, or perhaps are utf-8 encoded. This would potentially allow newline translation, something that the lockingBinaryWriter would presumably not do. -- |
March 01, 2017 [Issue 17229] File.byChunk (ubyte) w/ stdout.lockingTextWriter corrupts utf-8 data (and is very slow) | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 Jon Degenhardt <jrdemail2000-dlang@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|File.byChunk w/ |File.byChunk (ubyte) w/ |stdout.lockingTextWriter is |stdout.lockingTextWriter |very slow |corrupts utf-8 data (and is | |very slow) --- Comment #4 from Jon Degenhardt <jrdemail2000-dlang@yahoo.com> --- Changing the subject to reflect the more serious problem, corruption of utf-8 encoded data. As described in one of the comments, this corruption occurs in a file copy example in the documentation. -- |
March 01, 2017 [Issue 17229] File.byChunk (ubyte) w/ stdout.lockingTextWriter corrupts utf-8 data (and is very slow) | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 --- Comment #5 from anonymous4 <dfj1esp02@sneakemail.com> --- static assert(is(ubyte:dchar)); This assert succeeds. Is it intended? It's why LockingTextWriter accepts bytes event though it's a text writer. -- |
March 01, 2017 [Issue 17229] File.byChunk (ubyte) w/ stdout.lockingTextWriter corrupts utf-8 data (and is very slow) | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 --- Comment #6 from Steven Schveighoffer <schveiguy@yahoo.com> --- (In reply to anonymous4 from comment #5) > static assert(is(ubyte:dchar)); > This assert succeeds. Is it intended? It's why LockingTextWriter accepts > bytes event though it's a text writer. Yes, in the compiler, ubyte and dchar are fundamentally unsigned integer types. You can implicitly convert via integer promotion. Technically, char can integer promote to dchar as well. However, foreach(dchar d; someCharArray) is specialized in the compiler to go through auto decoding. The forums contain volumes of battles on auto-decoding and how the choice has affected D, I don't think we need to rehash it here. What we need to do is make it so obviously wrong code is not accepted for this function. I'll try and get a PR together. -- |
March 02, 2017 [Issue 17229] File.byChunk (ubyte) w/ stdout.lockingTextWriter corrupts utf-8 data (and is very slow) | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 Steven Schveighoffer <schveiguy@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |pull Hardware|x86 |All OS|Mac OS X |All Severity|enhancement |major --- Comment #7 from Steven Schveighoffer <schveiguy@yahoo.com> --- PR: https://github.com/dlang/phobos/pull/5229 -- |
March 17, 2017 [Issue 17229] File.byChunk (ubyte) w/ stdout.lockingTextWriter corrupts utf-8 data (and is very slow) | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 --- Comment #8 from github-bugzilla@puremagic.com --- Commits pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/9af2d0267f054503dc5f73410b1ab0d0e418f006 Fix issue 17229 - do not use integer promotion to translate ubyte ranges to dchar ranges. https://github.com/dlang/phobos/commit/ca20f0b667fa7f9602eee7ed18c23f56b2f06f34 Add unittest for Bug 17229 https://github.com/dlang/phobos/commit/3568cb38e709d03dcc894358a977c042248aab2d Merge pull request #5229 from schveiguy/fixtextwriter Fix issue 17229 - File.byChunk (ubyte) w/ stdout.lockingTextWriter corrupts utf-8 data (and is very slow) merged-on-behalf-of: Jack Stouffer <jack@jackstouffer.com> -- |
March 22, 2017 [Issue 17229] File.byChunk (ubyte) w/ stdout.lockingTextWriter corrupts utf-8 data (and is very slow) | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 --- Comment #9 from github-bugzilla@puremagic.com --- Commits pushed to stable at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/9af2d0267f054503dc5f73410b1ab0d0e418f006 Fix issue 17229 - do not use integer promotion to translate ubyte ranges https://github.com/dlang/phobos/commit/ca20f0b667fa7f9602eee7ed18c23f56b2f06f34 Add unittest for Bug 17229 https://github.com/dlang/phobos/commit/3568cb38e709d03dcc894358a977c042248aab2d Merge pull request #5229 from schveiguy/fixtextwriter -- |
August 07, 2017 [Issue 17229] File.byChunk (ubyte) w/ stdout.lockingTextWriter corrupts utf-8 data (and is very slow) | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=17229 --- Comment #10 from github-bugzilla@puremagic.com --- Commits pushed to newCTFE at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/9af2d0267f054503dc5f73410b1ab0d0e418f006 Fix issue 17229 - do not use integer promotion to translate ubyte ranges https://github.com/dlang/phobos/commit/ca20f0b667fa7f9602eee7ed18c23f56b2f06f34 Add unittest for Bug 17229 https://github.com/dlang/phobos/commit/3568cb38e709d03dcc894358a977c042248aab2d Merge pull request #5229 from schveiguy/fixtextwriter -- |
Copyright © 1999-2021 by the D Language Foundation