August 08, 2010
"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message news:i3ldk4$2ci0$1@digitalmars.com...
>
> Very nice! You may as well guard the write with an if (result != fileStr). With control source etc. in the mix it's always polite to not touch files unless you are actually modifying them.
>

I'm fairly sure SVN doesn't commit touched files unless there are actual changes. (Or maybe it's TortoiseSVN that adds that intelligence?)


August 08, 2010
"Norbert Nemec" <Norbert@Nemec-online.de> wrote in message news:i3lq17$99u$1@digitalmars.com...
>I usually do the same thing with a shell pipe
> expand | sed 's/ *$//;s/\r$//;s/\r/\n/'
>

Filed under "Why I don't like regex for non-trivial things" ;)


August 08, 2010
bearophile wrote:
> In the D code I have added an idup to make the comparison more fair, because
> in the Python code the "line" is a true newly allocated line, you can safely
> use it as dictionary key.

So it is with byLine, too. You've burdened D with double the amount of allocations.

Also, I object in general to this method of making things "more fair". Using a less efficient approach in X because Y cannot use such an approach is not a legitimate comparison.
August 08, 2010
"bearophile" <bearophileHUGS@lycos.com> wrote in message news:i3lb30$26vf$1@digitalmars.com...
> Jonathan M Davis:
>> I would have thought that being more idomatic would have resulted in
>> slower code
>> than what Walter did, but interestingly enough, both programs are faster
>> with my
>> code. They might take more memory though. I'm not quite sure how to check
>> that.
>> In any cases, you wanted some idiomatic D2 solutions, so there you go.
>
> Your code looks better.
>
> My (probably controversial) opinion on this is that the idiomatic D solution for those text "scripts" is to use a scripting language, as Python :-)
>

I can respect that. Personally, though, I find a lot of value in not needing to switch languages for that sort of thing. Too much "context switch" for my brain ;)


August 08, 2010
"Walter Bright" <newshound2@digitalmars.com> wrote in message news:i3mpnb$2hcf$1@digitalmars.com...
> bearophile wrote:
>> In the D code I have added an idup to make the comparison more fair,
>> because
>> in the Python code the "line" is a true newly allocated line, you can
>> safely
>> use it as dictionary key.
>
> So it is with byLine, too. You've burdened D with double the amount of allocations.
>

I thought byLine just re-uses the same buffer each time?


August 08, 2010
Walter Bright:
> bearophile wrote:
> > In the D code I have added an idup to make the comparison more fair, because in the Python code the "line" is a true newly allocated line, you can safely use it as dictionary key.
> 
> So it is with byLine, too. You've burdened D with double the amount of allocations.

I think you are wrong two times:

1) byLine() doesn't return a newly allocated line, you can see it with this small program:

import std.stdio: File, writeln;

void main(string[] args) {
    char[][] lines;
    auto file = File(args[1]);
    foreach (rawLine; file.byLine()) {
        writeln(rawLine.ptr);
        lines ~= rawLine;
    }
    file.close();
}


Its output shows that all "strings" (char[]) share the same pointer:

14E5E00
14E5E00
14E5E00
14E5E00
14E5E00
14E5E00
14E5E00
...


2) You can't use the result of rawLine() as string key for an associative array, as you I have said you can in Python. Currently you can, but according to Andrei this is a bug. And if it's not a bug then I'll reopen this closed bug 4474:

http://d.puremagic.com/issues/show_bug.cgi?id=4474


> Also, I object in general to this method of making things "more fair". Using a less efficient approach in X because Y cannot use such an approach is not a legitimate comparison.

I generally agree, but this it not the case.
In some situations you indeed don't need a newly allocated string for each loop, because for example you just want to read them and process them and not change/store them. You can't do this in Python, but this is not what I want to test. As I have explained in bug 4474 this behaviour is useful but it is acceptable only if explicitly requested by the programmer, and not as default one. The language is safe, as Andrei explains there, because you are supposed to idup the char[] to use it as key for an associative array (if your associative array is declared as int[char[]] then it can accept such rawLine() as keys, but you can clearly see those aren't strings. This is why I have closed bug 4474).

Bye,
bearophile
August 08, 2010
On 08/08/2010 12:28 PM, Nick Sabalausky wrote:
> "Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org>  wrote in message
> news:i3ldk4$2ci0$1@digitalmars.com...
>>
>> Very nice! You may as well guard the write with an if (result != fileStr).
>> With control source etc. in the mix it's always polite to not touch files
>> unless you are actually modifying them.
>>
>
> I'm fairly sure SVN doesn't commit touched files unless there are actual
> changes. (Or maybe it's TortoiseSVN that adds that intelligence?)

It doesn't, but it still shows them as changed etc.

Andrei
August 08, 2010
bearophile wrote:
> Walter Bright:
>> bearophile wrote:
>>> In the D code I have added an idup to make the comparison more fair,
>>> because in the Python code the "line" is a true newly allocated line, you
>>> can safely use it as dictionary key.
>> So it is with byLine, too. You've burdened D with double the amount of
>> allocations.
> 
> I think you are wrong two times:
> 
> 1) byLine() doesn't return a newly allocated line, you can see it with this
> small program:
> 
> import std.stdio: File, writeln;
> 
> void main(string[] args) { char[][] lines; auto file = File(args[1]); foreach
> (rawLine; file.byLine()) { writeln(rawLine.ptr); lines ~= rawLine; } file.close(); }
> 
> 
> Its output shows that all "strings" (char[]) share the same pointer:
> 
> 14E5E00 14E5E00 14E5E00 14E5E00 14E5E00 14E5E00 14E5E00 ...

eh, you're right. the phobos documentation for byLine needs to be fixed.


> You can't do this in Python, but this is not what I want to test.

If you want to conclude that Python is better at processing files, you need to show it using each language doing it a way well suited to that language, rather than burdening one so it uses the same method as the less powerful one.
August 08, 2010
Walter Bright:
> If you want to conclude that Python is better at processing files, you need to show it using each language doing it a way well suited to that language, rather than burdening one so it uses the same method as the less powerful one.

byLine() yields a char[], so if you want to do most kinds of strings processing or you want to store the line (or parts of it), you have to idup it. So in this case Python is not significantly less powerful than D.

You can of course use the raw char[], but then you lose the advantages advertised when you have introduced the safer immutable D2 strings. And in many situations you have to dup the char[] anyway, otherwise your have all kinds of bugs, that Python lacks. In D1 to avoid it I used to use dup more often than necessary. I have explained this in the bug 4474.

In this newsgroup my purpose it to show D faults, suggest improvements, etc. In this case my purpose was just to show that byLine()+idup is slow. And you have to thankful for my benchmarks. In my dlibs1 for D1 I have a xio module that reads files by line that is faster than iterating on a BufferedFile, so it's not a limit of the language, it's Phobos that has a performance bug that can be improved.

Bye,
bearophile
August 08, 2010
Andrei used to!string() in an early example in TDPL for some line-by-line processing. I'm not sure of the advantages/disadvantages of to!type vs .dup.

On Sun, Aug 8, 2010 at 11:44 PM, bearophile <bearophileHUGS@lycos.com>wrote:

> Walter Bright:
> > If you want to conclude that Python is better at processing files, you
> need to
> > show it using each language doing it a way well suited to that language,
> rather
> > than burdening one so it uses the same method as the less powerful one.
>
> byLine() yields a char[], so if you want to do most kinds of strings processing or you want to store the line (or parts of it), you have to idup it. So in this case Python is not significantly less powerful than D.
>
> You can of course use the raw char[], but then you lose the advantages advertised when you have introduced the safer immutable D2 strings. And in many situations you have to dup the char[] anyway, otherwise your have all kinds of bugs, that Python lacks. In D1 to avoid it I used to use dup more often than necessary. I have explained this in the bug 4474.
>
> In this newsgroup my purpose it to show D faults, suggest improvements, etc. In this case my purpose was just to show that byLine()+idup is slow. And you have to thankful for my benchmarks. In my dlibs1 for D1 I have a xio module that reads files by line that is faster than iterating on a BufferedFile, so it's not a limit of the language, it's Phobos that has a performance bug that can be improved.
>
> Bye,
> bearophile
>