View mode: basic / threaded / horizontal-split · Log in · Help
February 18, 2012
[your code here]
// Outputs a randomly selected line from standard input with equal
// likelihood.
import std.random;
import std.stdio;

void main() {
	auto n = 0;
	string choice;
	foreach (line; stdin.byLine()) {
		n++;
		if (uniform(0,n) == 0)
			choice = line.idup;
	}
	writeln(choice);
}


P.S. Proof that the probability any line is selected is exactly 1/n
(where n is the total number of lines read) is left as an exercise for
the reader. ;-)

P.S.S. The .idup is a bit ugly, but necessary, since apparently byLine()
calls readln() with a static buffer, so choice will be silently
overwritten if the .idup is omitted.


T

-- 
Ruby is essentially Perl minus Wall.
February 18, 2012
Re: [your code here]
On 18.02.2012 02:39, H. S. Teoh wrote:
> // Outputs a randomly selected line from standard input with equal
> // likelihood.
> import std.random;
> import std.stdio;
>
> void main() {
> 	auto n = 0;
> 	string choice;
> 	foreach (line; stdin.byLine()) {
> 		n++;
> 		if (uniform(0,n) == 0)
> 			choice = line.idup;
> 	}
> 	writeln(choice);
> }
>
>
> P.S. Proof that the probability any line is selected is exactly 1/n
> (where n is the total number of lines read) is left as an exercise for
> the reader. ;-)

Assuming that by "any" you mean "any particular", you would have to read 
all the lines first. Otherwise, if the code selects the first line with 
probability 1/K, then I can just input some other number of lines.


> P.S.S. The .idup is a bit ugly, but necessary, since apparently byLine()
> calls readln() with a static buffer, so choice will be silently
> overwritten if the .idup is omitted.

That sounds ominous. One should never have to be aware of low level 
details in order to do simple string assignment or initialization, when 
the source already is a string. Does one really have to do that in D?


Cheers & hth.,

- Alf
February 18, 2012
Re: [your code here]
On 02/18/2012 02:52 AM, Alf P. Steinbach wrote:
> On 18.02.2012 02:39, H. S. Teoh wrote:
>> // Outputs a randomly selected line from standard input with equal
>> // likelihood.
>> import std.random;
>> import std.stdio;
>>
>> void main() {
>>     auto n = 0;
>>     string choice;
>>     foreach (line; stdin.byLine()) {
>>         n++;
>>         if (uniform(0,n) == 0)
>>             choice = line.idup;
>>     }
>>     writeln(choice);
>> }
>>
>>
>> P.S. Proof that the probability any line is selected is exactly 1/n
>> (where n is the total number of lines read) is left as an exercise for
>> the reader. ;-)
>
> Assuming that by "any" you mean "any particular", you would have to read
> all the lines first. Otherwise, if the code selects the first line with
> probability 1/K, then I can just input some other number of lines.
>
>
>> P.S.S. The .idup is a bit ugly, but necessary, since apparently byLine()
>> calls readln() with a static buffer, so choice will be silently
>> overwritten if the .idup is omitted.

If the .idup is omitted the code does not compile. It does not silently 
misbehave. That is why we have a const system.

>
> That sounds ominous. One should never have to be aware of low level
> details in order to do simple string assignment or initialization, when
> the source already is a string.

The source is not a string, it is a char[].

> Does one really have to do that in D?
>
>
> Cheers & hth.,
>
> - Alf

File.byLine re-uses the buffer in order to be more efficient. This 
appears in the documentation, and the buffer is typed appropriately.
February 18, 2012
File Api [Was: [your code here]]
H. S. Teoh:

> P.S.S. The .idup is a bit ugly, but necessary, since apparently byLine()
> calls readln() with a static buffer, so choice will be silently
> overwritten if the .idup is omitted.

An alternative File API that to me looks nice. This is a first part, it's for scripting-like or not high performance purposes, it looks essentially like Python code, every line is a newly allocated string:

import std.stdio;
void main() {
   string[] lines;
   foreach (line; File("data.dat")) {
       static assert(is(line == string));
       lines ~= line;
   }
}

If you don't want a new buffer every line you use something like this:

import std.stdio;
void main() {
   string[] lines;
   foreach (line; File("data.dat").fastLines()) {
       static assert(is(line == char[]));
       lines ~= line.idup;
   }
}

So on default it's handy, short and safe, and with a method you avoid an allocation every line and get a mutable char[].

Maybe even this works, downloads a page and scans its lines, but maybe it's better to add a bit of extra safety to this:
foreach (line; File("http://www.dlang.org/faq.html")) {}

Bye,
bearophile
February 18, 2012
Re: File Api [Was: [your code here]]
On 02/18/2012 03:20 AM, bearophile wrote:
> H. S. Teoh:
>
>> P.S.S. The .idup is a bit ugly, but necessary, since apparently byLine()
>> calls readln() with a static buffer, so choice will be silently
>> overwritten if the .idup is omitted.
>
> An alternative File API that to me looks nice. This is a first part, it's for scripting-like or not high performance purposes, it looks essentially like Python code, every line is a newly allocated string:
>
> import std.stdio;
> void main() {
>      string[] lines;
>      foreach (line; File("data.dat")) {
>          static assert(is(line == string));
>          lines ~= line;
>      }
> }
>
> If you don't want a new buffer every line you use something like this:
>
> import std.stdio;
> void main() {
>      string[] lines;
>      foreach (line; File("data.dat").fastLines()) {
>          static assert(is(line == char[]));
>          lines ~= line.idup;
>      }
> }
>
> So on default it's handy, short and safe, and with a method you avoid an allocation every line and get a mutable char[].
>
> Maybe even this works, downloads a page and scans its lines, but maybe it's better to add a bit of extra safety to this:
> foreach (line; File("http://www.dlang.org/faq.html")) {}
>
> Bye,
> bearophile

Note that what we have now is clear, handy, quite short and safe and 
efficient. What you propose takes away the clarity and efficiency parts. 
foreach(foo; File("data.dat")) {} // by what does this iterate?
February 18, 2012
Re: File Api [Was: [your code here]]
Timon Gehr:

> Note that what we have now is clear, handy, quite short and safe and 
> efficient. What you propose takes away the clarity and efficiency parts. 

What we have now:
- by Line() is explicit, but if you don't use it is not that clear, if a Python programmer iterates on File it gives an almost mysterious error message:
test.d(4): Error: invalid foreach aggregate (File __ctmp834 = 0;

, __ctmp834).this("data.dat","rb")
- It's less handy because you have to add idup if you need strings (and most times you need a string);
- It's longer than what I have suggested;
- It's not safe, because if you need to work mutable char[], it yields the same buffer. This is a bug-prone default. A better API allocates every line on default, and not allocates it on request. (This is how I designed my dlibs1). It's "safe" only if you need a string, because it forces you to convert it to a string with idup.


> foreach(foo; File("data.dat")) {} // by what does this iterate?

Iterating the lines of a file is a very common operation, so you remember what it does.

Anyway, if you don't like File to be iterable, I'd like that attempt to give better error message, and byLine() to yield a newly allocated string (and another "fast"-annotated method that yields a slice of the same mutable buffer).

Bye,
bearophile
February 18, 2012
Re: [your code here]
> Assuming that by "any" you mean "any particular", you would 
> have to read all the lines first. Otherwise, if the code 
> selects the first line with probability 1/K, then I can just 
> input some other number of lines.

I'm not sure if I understood your post correctly, but it seems to 
me that you didn't read the code carefully. The code selects the 
last read line with probability 1/n, where n is the number of 
strings already read, not the number of all strings. Say you have 
have already read n strings and they are all selected with 
probability 1/n. Now you read a new string and select it with 
probability  1 / (n+1). The probability that you didn't select 
the new string is then n / (n + 1) and the first n strings are 
selected with probability 1 / n * n / (n + 1) = 1 / (n + 1), so 
you now have n+1 lines all selected with probability 1 / (n + 1). 
So if the property that input strings are all selected with equal 
probability holds for n, it must also hold for n+1. Since it 
obviously holds for n = 1 (we just select the only string with 
probability 1 in that case), it must hold for all n.
Top | Discussion index | About this forum | D home