January 31, 2013
On 01/31/2013 02:27 PM, bioinfornatics wrote:
> On Thursday, 31 January 2013 at 22:20:25 UTC, Ali Çehreli wrote:
>> This is not related to your actual problem but I have noticed that you
>> have side-effects in your FastqReader.front. I think you will benefit
>> from a design where front simply returns the front element and all of
>> the side-effects are inside popFront().
>>
>> Ali
>
> but as fastq instance used to iterate and fastq instance called are not
> same, any fastq method where depend the position in given range won't
> work. you need to return all possible value that you could be used

Apparently I didn't understand the code. :)

My comments should be generally correct: Calling front() multiple times should return the same element and it should not change the state of the range.

Ali

February 01, 2013
> Apparently I didn't understand the code. :)
>
> My comments should be generally correct: Calling front() multiple times should return the same element and it should not change the state of the range.
>
> Ali

my code works like you said you ca
February 01, 2013
> Apparently I didn't understand the code. :)
>
> My comments should be generally correct: Calling front() multiple times should return the same element and it should not change the state of the range.
>
> Ali

My code works like you said you can call front multiple time he will return same thing. I check in front the lettern return to set or not the state and the section number. this maye should move to popFront.

To explain, i iterate over a fastq file with a memory mapped file. Then i iterate letter by letter and i need to return the given letter and if this letter is wich line type are. I do not use \n or \r\n to identify a line as fastq format allow witespace and newline int sequence and quality line. Each time i see a new identifier line i increase the counter to said at the end they are xxx sections in this file.
As is a memmory mapped file i read ubyte, by example 64 is @. 64 could be a quality letter or the letter to identify indentifiers start.
So i need to count how many sequence letter is in this section to count number of quality because they are same number.

As i use a memory mapped file i won't copy my struct for able to loop as i do not want map the file twice that is rather a big problem for a big file this a perf issue. memory mapped file is used to read fastly a file so is a nonsense
February 01, 2013
My code works like you said you can call front multiple time he
will return same thing. I check in front the letter return to
set or not the state and the section number. this maye should
move to popFront.

To explain, i iterate over a fastq file with a memory mapped
file. Then i iterate letter by letter. I need to return the
given letter and in wich line type are. I do not
use \n or \r\n to identify a line as fastq format allow witespace
and newline into sequence and quality lines. Each time i see a new
identifier line i increase the counter to said at the end they
are xxx sections in this file.
As is a memmory mapped file i read ubyte, by example 64 is @. 64
could be a quality letter or the letter to identify indentifiers
start.
So i need to count how many sequence letters are in this section to
count number of quality because they are same number (whispace should not to be count but skiped).

As i use a memory mapped file i won't copy my struct for able to
loop as i do not want map the file twice that is rather a big
problem for a big file this a perf issue. memory mapped file is
used to read fastly a file so is a nonsense

February 01, 2013
On Friday, 1 February 2013 at 09:09:10 UTC, bioinfornatics wrote:
> As i use a memory mapped file i won't copy my struct for able to
> loop as i do not want map the file twice that is rather a big
> problem for a big file this a perf issue. memory mapped file is
> used to read fastly a file so is a nonsense

In a word, the problem is that your "FastqReader" is both container and range. Those are complettly different notions that should not be confused.

The fastqreader should be a container, that holds your binary payload. You should be able to extract a range from the container, and iterate on the container, without modifying the container.

Currently, iterating on your fastqreader will mutate it. This is very bad. The only cases I know of where this happens are with pure input ranges, but in those cases, you virtually never have access to the underlying container (which usually doesn't exist anyways).

--------
The easy workaround is to start by renaming your "popFront" into "removeFront": popFront is an iteration primitive. removeFront is a container primitive. By changing this name, you fastqreader will cease to adhere to any range interface, protecting you from wrong usage.

Once there, you need to define a Range, or just have "opSlice()" and "opSlice(size_t, size_t)" return said range.

Ideally, I'd avoid defining an actual "Range" type, and simply return the string (as you are doing).

However, given you seem to be doing something with dna, and require random access, I'm not sure "string" is the best (string is unicode aware, so doesn't actually adhere to RA). I'd stick to just returning a "const(ubyte)[]".

February 01, 2013
On Friday, 1 February 2013 at 09:09:10 UTC, bioinfornatics wrote:
> As i use a memory mapped file i won't copy my struct for able to
> loop as i do not want map the file twice that is rather a big
> problem for a big file this a perf issue. memory mapped file is
> used to read fastly a file so is a nonsense

Oh yeah: MmFile is a class, so copying it is free. The problem though is that it is shallow, so modying a struct copy will (partially) modify the original struct.
1 2
Next ›   Last »