August 08, 2010
> so it's not a limit of the language, it's Phobos that has a performance bug that can be improved.

I don't know where the performance bug is, maybe it's a matter of GC, not a Phobos performance bug.

Bye,
bearophile
August 08, 2010
Nick Sabalausky, el  8 de agosto a las 13:31 me escribiste:
> "Norbert Nemec" <Norbert@Nemec-online.de> wrote in message news:i3lq17$99u$1@digitalmars.com...
> >I usually do the same thing with a shell pipe
> > expand | sed 's/ *$//;s/\r$//;s/\r/\n/'
> >
> 
> Filed under "Why I don't like regex for non-trivial things" ;)

Those regex are non-trivial?

Maybe you're confusing sed statements with regex, in that sed program, there are 3 trivial regex:

regex	replace with
 *$     (nothing)
\r$	(nothing)
\r	\n

They are the most trivial regex you'd ever find! =)

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Vaporeso sostenía a rajacincha la teoría del No-Water, la cual le pertenecía y versaba lo siguiente: "Para darle la otra mejilla al fuego, éste debe ser apagado con alpargatas apenas húmedas".
August 08, 2010
Andrei Alexandrescu, el  8 de agosto a las 14:44 me escribiste:
> On 08/08/2010 12:28 PM, Nick Sabalausky wrote:
> >"Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org>  wrote in message news:i3ldk4$2ci0$1@digitalmars.com...
> >>
> >>Very nice! You may as well guard the write with an if (result != fileStr). With control source etc. in the mix it's always polite to not touch files unless you are actually modifying them.
> >>
> >
> >I'm fairly sure SVN doesn't commit touched files unless there are actual changes. (Or maybe it's TortoiseSVN that adds that intelligence?)
> 
> It doesn't, but it still shows them as changed etc.

Nope, not really:

/tmp$ svnadmin create x
/tmp$ svn co file:///tmp/x xwc
Revisión obtenida: 0
/tmp$ cd xwc/
/tmp/xwc$ echo hello > hello
/tmp/xwc$ svn add hello
A         hello
/tmp/xwc$ svn commit -m 'test'
Añadiendo      hello
Transmitiendo contenido de archivos .
Commit de la revisión 1.
/tmp/xwc$ touch hello
/tmp/xwc$ svn status
/tmp/xwc$ echo changed > hello
/tmp/xwc$ svn status
M       hello
/tmp/xwc$

(sorry about the Spanish messages, I saw them after copying the test and I'm too lazy to repeat them changing the LANG environment variable :)


You might want to set the mtime to the same as the original file for build purposes though (you know you're changing the file in a way it doesn't really change its semantics, so you might want to avoid unnecessary recompilation).

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
... los cuales son susceptibles a una creciente variedad de ataques previsibles,
tales como desbordamiento del tampón, falsificación de parámetros, ...
	-- Stealth - ISS LLC - Seguridad de IT
August 08, 2010
On Sun, 08 Aug 2010 16:44:09 -0500, bearophile <bearophileHUGS@lycos.com> wrote:

> Walter Bright:
>> If you want to conclude that Python is better at processing files, you need to
>> show it using each language doing it a way well suited to that language, rather
>> than burdening one so it uses the same method as the less powerful one.
>
> byLine() yields a char[], so if you want to do most kinds of strings processing or you want to store the line (or parts of it), you have to idup it. So in this case Python is not significantly less powerful than D.
>
> [snip] And you have to [be] thankful for my benchmarks. [snip]
>
> Bye,
> bearophile

<g> What's next? Will you demand attribution like the time Andrei presented the ranges design?
August 08, 2010
Yao G.:
> <g> What's next? Will you demand attribution like the time Andrei presented the ranges design?

Of course. In the end all D will be mine <evil laugh with echo effects> :-)

Bye,
bearophile
August 08, 2010
On Sun, 08 Aug 2010 17:27:04 -0500, bearophile <bearophileHUGS@lycos.com> wrote:

> Yao G.:
>> <g> What's next? Will you demand attribution like the time Andrei
>> presented the ranges design?
>
> Of course. In the end all D will be mine <evil laugh with echo effects> :-)
>
> Bye,
> bearophile

 :D That was a good comeback.


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
August 09, 2010
On 08/08/2010 04:44 PM, bearophile wrote:
> Walter Bright:
>> If you want to conclude that Python is better at processing files, you need to
>> show it using each language doing it a way well suited to that language, rather
>> than burdening one so it uses the same method as the less powerful one.
>
> byLine() yields a char[], so if you want to do most kinds of strings processing or you want to store the line (or parts of it), you have to idup it. So in this case Python is not significantly less powerful than D.
>
> You can of course use the raw char[], but then you lose the advantages advertised when you have introduced the safer immutable D2 strings. And in many situations you have to dup the char[] anyway, otherwise your have all kinds of bugs, that Python lacks. In D1 to avoid it I used to use dup more often than necessary. I have explained this in the bug 4474.
>
> In this newsgroup my purpose it to show D faults, suggest improvements, etc. In this case my purpose was just to show that byLine()+idup is slow. And you have to thankful for my benchmarks. In my dlibs1 for D1 I have a xio module that reads files by line that is faster than iterating on a BufferedFile, so it's not a limit of the language, it's Phobos that has a performance bug that can be improved.

Thanks for your analysis. Where does xio derive its performance advantage from?

Andrei
August 09, 2010
On 08/08/2010 04:48 PM, Andrej Mitrovic wrote:
> Andrei used to!string() in an early example in TDPL for some
> line-by-line processing. I'm not sure of the advantages/disadvantages of
> to!type vs .dup.

For example, to!string(someString) does not duplicate the string.

Andrei

August 09, 2010
On 08/08/2010 05:17 PM, Yao G. wrote:
> On Sun, 08 Aug 2010 16:44:09 -0500, bearophile
> <bearophileHUGS@lycos.com> wrote:
>
>> Walter Bright:
>>> If you want to conclude that Python is better at processing files,
>>> you need to
>>> show it using each language doing it a way well suited to that
>>> language, rather
>>> than burdening one so it uses the same method as the less powerful one.
>>
>> byLine() yields a char[], so if you want to do most kinds of strings
>> processing or you want to store the line (or parts of it), you have to
>> idup it. So in this case Python is not significantly less powerful
>> than D.
>>
>> [snip] And you have to [be] thankful for my benchmarks. [snip]
>>
>> Bye,
>> bearophile
>
> <g> What's next? Will you demand attribution like the time Andrei
> presented the ranges design?

Well I understand his frustration. I asked him for a comparison and he took the time to write one and play with it. I think the proper answer to that is to see what we can do to improve the situation, not defend the status quo. Whatever the weaknesses of the benchmark are they should be fixed, and then whatever weaknesses the library has they should be addressed.

Andrei
August 09, 2010
Andrei:

>Where does xio derive its performance advantage from?<

I'd like to give you a good answer, but I can't. dlibs1 (that you can found online still) has a Python Licence, so to create xio.xfile() I have just translated to D1 the C code of the CPython implementation code of the file object I have already linked here.

I think it minimizes heap allocations, the performance is tuned for a line length found to be the "average one" for normal files. So I presume if your text file has very short lines (like 5 chars each) or very long ones (like 1000 chars each) it becomes less efficient.

So it's probably a matter of good usage of the C I/O functions and probably a more efficient management by the GC.

Phobos is Boost Licence, but I don't think Python devs can get mad if you take a look at how Python reads lines lazily :-) Someone has tried to implement a Python-style associative array in a similar way.

Bye,
bearophile