March 20, 2015
> You may want to answer there, not here. I've also posted a response.
>
> Andrei

Nitpick: Your solutions that use readText validate their input and the python version probably doesn't. You could mention that (I cannot comment on SO).

Interestingly readText is faster than byChunck.joiner regardless.

Nitpick 2: http://www.unicode.org/versions/Unicode7.0.0/ch05.pdf (chapter 5.8) splitLines is still incomplete, missing to break on U+0085, U+000B, U+000C. Would a PR for this be accepted?

I'd say the coolest answer to this question would have been: D has not only one of the fastest, but the only correct solution to this that works with UTF8, UTF16 and UTF32 at the same time.
March 20, 2015
On Thursday, 19 March 2015 at 00:52:12 UTC, Walter Bright wrote:
> On 3/18/2015 5:34 PM, cym13 wrote:
>> Maybe there should be a "part 2" to the C-to-D little tutorial, one that shows
>> how to code at a higher level introducing gently functional structures instead
>> of just teaching how to write C in D. To stop thinking about steps to think
>> about transformations isn't an easy thing.
>
> Andrei and I talked about this, how we are failing to point out how to idiomatically use D to advantage. We need to do a lot better at this.
related:
https://p0nce.github.io/d-idioms/
http://wiki.dlang.org/Tutorials
March 20, 2015
On 3/20/2015 3:59 AM, Paulo Pinto wrote:
> On Friday, 20 March 2015 at 10:50:44 UTC, Walter Bright wrote:
>> Since 'line' is never referred to again after constructed, even a simple
>> optimizer could elide it.
>>
>> It would be easy to test - accumulate the lines in an array, and check the times.
>
> Which the default Python implementation doesn't have, hence my comment.

After all these years, the default Python implementation doesn't do fairly basic optimizations? I find that a bit hard to believe.

> Also even if it did have one, it cannot elide it as it cannot guarantee the
> semantics of the generators/iterators side effects will be kept.

I wonder why keeping a string would be a side effect.

I'm not saying you're wrong, I don't know Python well enough to make such a judgement. It just causes me to raise one eyebrow like Spock if it does work this way.

March 20, 2015
On 3/20/2015 8:25 AM, weaselcat wrote:
> All of the content on rosettacode appears to be licensed under GNU FDL, I
> believe it would just have to be released under the GNU FDL or a similar
> copyleft license that fulfills the GNU FDL.

http://www.gnu.org/licenses/fdl-1.2.html

Sigh, looks like we can't use it.
March 20, 2015
On Friday, 20 March 2015 at 20:34:36 UTC, Walter Bright wrote:
> On 3/20/2015 8:25 AM, weaselcat wrote:
>> All of the content on rosettacode appears to be licensed under GNU FDL, I
>> believe it would just have to be released under the GNU FDL or a similar
>> copyleft license that fulfills the GNU FDL.
>
> http://www.gnu.org/licenses/fdl-1.2.html
>
> Sigh, looks like we can't use it.

all content on the dlang wiki is under the same license already?
March 20, 2015
On Friday, 20 March 2015 at 20:31:49 UTC, Walter Bright wrote:
> On 3/20/2015 3:59 AM, Paulo Pinto wrote:
>> On Friday, 20 March 2015 at 10:50:44 UTC, Walter Bright wrote:
>>> Since 'line' is never referred to again after constructed, even a simple
>>> optimizer could elide it.
>>>
>>> It would be easy to test - accumulate the lines in an array, and check the times.
>>
>> Which the default Python implementation doesn't have, hence my comment.
>
> After all these years, the default Python implementation doesn't do fairly basic optimizations? I find that a bit hard to believe.
>
>> Also even if it did have one, it cannot elide it as it cannot guarantee the
>> semantics of the generators/iterators side effects will be kept.
>
> I wonder why keeping a string would be a side effect.
>
> I'm not saying you're wrong, I don't know Python well enough to make such a judgement. It just causes me to raise one eyebrow like Spock if it does work this way.

The side effect is not keeping the string, rather generating it.

for var in exp:
   do_something()

if exp represents a iteratable or a generator, even if var is thrown away the loop needs to be preserved to keep the semantics of calling next() on the instance object that executes the  for..in loop.

Put other way, does DMD throw away foreach loops even if the compiler cannot prove if  opApply () or popFront() generate side effects, assuming the variable isn't being used?

--
Paulo
March 21, 2015
On 3/19/15 4:29 PM, weaselcat wrote:
> voldemort types sort of feel like a hack to work around the lack of real
> compile-time interfaces(concepts,)

No, they're largely unrelated. -- Andrei
March 21, 2015
On 3/20/15 12:42 AM, Paulo Pinto wrote:
> On Friday, 20 March 2015 at 05:17:11 UTC, Walter Bright wrote:
>> On 3/19/2015 9:59 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?=
>> <ola.fosheim.grostad+dlang@gmail.com>" wrote:
>>> On Thursday, 19 March 2015 at 00:42:51 UTC, weaselcat wrote:
>>>> On Wednesday, 18 March 2015 at 12:59:17 UTC, bearophile wrote:
>>>>> High level constructs in D are often slower than low-level code, so
>>>>> in some
>>>>> cases you don't want to use them.
>>>>
>>>> I actually found that LDC does an _amazing_ job of optimizing high
>>>> level
>>>> constructs and converting "low level" code to higher level
>>>> functional code
>>>> resulted in minor speedups in a lot of cases.
>>>>
>>>>
>>>> (Other performance benefits include the algorithm primitives being
>>>> extensively
>>>> optimized in phobos.)
>>>
>>> If the code/compiler generates suboptimal code in the first place then
>>> improvements can be somewhat random. But if you write code with good
>>> cache
>>> locality, filling the pipeline properly then there is no alternative
>>> to going
>>> low level.
>>>
>>> Btw, take a look at this:
>>> http://stackoverflow.com/questions/28922323/improving-line-wise-i-o-operations-in-d
>>>
>>>
>>> That's really bad marketing...
>>
>> Sigh. The Python version:
>> -----------
>> import sys
>>
>> if __name__ == "__main__":
>>     if (len(sys.argv) < 2):
>>         sys.exit()
>>     infile = open(sys.argv[1])
>>     linect = 0
>>     for line in infile:
>>         linect += 1
>>     print "There are %d lines" % linect
>> ----------
>> does not allocate memory. The splitLines() version:
>> [...] cutted
>
> Of course it does allocate memory.

Yah, and uses reference counting for management. -- Andrei

March 21, 2015
On 3/20/15 3:50 AM, Walter Bright wrote:
> On 3/20/2015 12:42 AM, Paulo Pinto wrote:
>> On Friday, 20 March 2015 at 05:17:11 UTC, Walter Bright wrote:
>>> Sigh. The Python version:
>>> -----------
>>> import sys
>>>
>>> if __name__ == "__main__":
>>>     if (len(sys.argv) < 2):
>>>         sys.exit()
>>>     infile = open(sys.argv[1])
>>>     linect = 0
>>>     for line in infile:
>>>         linect += 1
>>>     print "There are %d lines" % linect
>>> ----------
>>> does not allocate memory. The splitLines() version:
>>> [...] cutted
>>
>> Of course it does allocate memory.
>>
>> Python's "for...in" makes use of iterators and generators, already
>> there you
>> have some allocations going around.
>>
>> Not only that, there is one string being allocated for each line in
>> the file
>> being read, even if it isn't used.
>
> Since 'line' is never referred to again after constructed, even a simple
> optimizer could elide it.

It's not elided to the best of my knowledge. But it's reference counted so it goes from 0 to 1 and back. A simple caching allocator will have no trouble with this pattern. -- Andrei


March 21, 2015
On 3/20/15 10:26 AM, Tobias Pankrath wrote:
>> You may want to answer there, not here. I've also posted a response.
>>
>> Andrei
>
> Nitpick: Your solutions that use readText validate their input and the
> python version probably doesn't. You could mention that (I cannot
> comment on SO).

Yah, nitpicks should go there too. We need to have an understanding that statistically everybody is on SO and nobody here :o).

> Interestingly readText is faster than byChunck.joiner regardless.

I'm not if it does the checking.

> Nitpick 2: http://www.unicode.org/versions/Unicode7.0.0/ch05.pdf
> (chapter 5.8) splitLines is still incomplete, missing to break on
> U+0085, U+000B, U+000C. Would a PR for this be accepted?

Prolly. Walter?

> I'd say the coolest answer to this question would have been: D has not
> only one of the fastest, but the only correct solution to this that
> works with UTF8, UTF16 and UTF32 at the same time.

Well then write that answer.


Andrei