about std.csv and derived format

Feb 29, 2012

bioinfornatics

Feb 29, 2012

bioinfornatics

Feb 29, 2012

Mar 01, 2012

Mar 01, 2012

Mar 01, 2012

Mar 01, 2012

Mar 01, 2012

Dear, I would like to parse this file: http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt struct Bed{ string chrom; // 0 size_t chromStart; // 1 size_t chromEnd; // 2 string name; // 3 size_t score; // 4 char strand; // 5 size_t thickStart; // 6 size_t thickEnd; // 7 size_t[3] itemRgb; // 8 size_t blockCount; // 9 size_t blockSizes; // 10 size_t blockStarts; // 11 } In more fields 3 to 11 are optional. Then you can have: * field 0 - 3 * field 0 - 4 * field 0 - 5 ... to 0 - 12

Le mercredi 29 février 2012 à 12:42 +0100, bioinfornatics a écrit : > Dear, > > I would like to parse this file: http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt > > struct Bed{ > string chrom; // 0 > size_t chromStart; // 1 > size_t chromEnd; // 2 > string name; // 3 > size_t score; // 4 > char strand; // 5 > size_t thickStart; // 6 > size_t thickEnd; // 7 > size_t[3] itemRgb; // 8 > size_t blockCount; // 9 > size_t blockSizes; // 10 > size_t blockStarts; // 11 > } > > In more fields 3 to 11 are optional. Then you can have: > * field 0 - 3 > * field 0 - 4 > * field 0 - 5 > ... to 0 - 12 > line 0 -> 2 into ItemRGBDemo.txt are metadata so they should be parsed by hand. browser position chr7:127471196-127495720 browser hide all track name="ItemRGBDemo" description="Item RGB demonstration" visibility=2 itemRgb="On" My problem is: - need to parse data in csv format - how manage with optional field

On Wednesday, 29 February 2012 at 11:51:29 UTC, bioinfornatics wrote: > Le mercredi 29 février 2012 à 12:42 +0100, bioinfornatics a écrit : >> Dear, >> >> I would like to parse this file: >> http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt > My problem is: > - need to parse data in csv format > - how manage with optional field It looks like the data is tab delimited so separator is a tab. There are no optional fields in CSV, but you can disable exceptions. auto records = csvReader!(Bed,Malformed.ignore)(str,'\t');

Le mercredi 29 février 2012 à 13:23 +0100, Jesse Phillips a écrit : > On Wednesday, 29 February 2012 at 11:51:29 UTC, bioinfornatics wrote: > > Le mercredi 29 février 2012 à 12:42 +0100, bioinfornatics a écrit : > >> Dear, > >> > >> I would like to parse this file: http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt > > > My problem is: > > - need to parse data in csv format > > - how manage with optional field > > It looks like the data is tab delimited so separator is a tab. There are no optional fields in CSV, but you can disable exceptions. > > auto records = csvReader!(Bed,Malformed.ignore)(str,'\t'); thanks jesse; how i can convert inputRange return type to Bed ? csvReader return a type that change dynamycally so if i use a template function the type is never same and i can't hard write a copy to Bed type. example if i use BedData3 or BedData4: ------------------------- struct BedData3{ string chrom; // 0 size_t chromStart; // 1 size_t chromEnd; // 2 string name; // 3 } struct BedData4{ string chrom; // 0 size_t chromStart; // 1 size_t chromEnd; // 2 string name; // 3 size_t score; // 4 } ------------------------ i have try to deal with ReturnType but i fail. paste https://gist.github.com/1946288 at line 294 bedReader take ane BedData3 tp 11 then at line 338 how get an array of record and store this array into struct bed line 192 thanks a lot

March 01, 2012

Re: about std.csv and derived format

Posted by bioinfornatics
in reply to bioinfornatics

Permalink

bioinfornatics

Posted in reply to bioinfornatics

Permalink

Le jeudi 01 mars 2012 à 01:52 +0100, bioinfornatics a écrit :
> Le mercredi 29 février 2012 à 13:23 +0100, Jesse Phillips a écrit :
> > On Wednesday, 29 February 2012 at 11:51:29 UTC, bioinfornatics wrote:
> > > Le mercredi 29 février 2012 à 12:42 +0100, bioinfornatics a écrit :
> > >> Dear,
> > >> 
> > >> I would like to parse this file: http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt
> > 
> > > My problem is:
> > > - need to parse data in csv format
> > > - how manage with optional field
> > 
> > It looks like the data is tab delimited so separator is a tab. There are no optional fields in CSV, but you can disable exceptions.
> > 
> > auto records = csvReader!(Bed,Malformed.ignore)(str,'\t');
> 
> thanks jesse;
> 
> how i can convert inputRange return type to Bed ?
> csvReader return a type that change dynamycally so if i use a template
> function the type is never same and i can't hard write a copy to Bed
> type.
> example if i use BedData3 or BedData4:
> 
> -------------------------
> struct BedData3{
>     string    chrom;        // 0
>     size_t    chromStart;   // 1
>     size_t    chromEnd;     // 2
>     string    name;         // 3
> }
> 
> struct BedData4{
>     string    chrom;        // 0
>     size_t    chromStart;   // 1
>     size_t    chromEnd;     // 2
>     string    name;         // 3
>     size_t    score;        // 4
> }
> ------------------------
> 
> i have try to deal with ReturnType but i fail.
> 
> paste https://gist.github.com/1946288
> 
> at line 294 bedReader take ane BedData3 tp 11
> then at line 338 how get an array of record and store this array into
> struct bed line 192
> 
> 
> thanks a lot
> 

It is ok i have found a way maybe is not an efficient way but it works: https://gist.github.com/1946669

a minor bug exist for parse track line will be fixed tomorrow. time to bed


Big thanks to all

On Thursday, 1 March 2012 at 02:07:44 UTC, bioinfornatics wrote: > It is ok i have found a way maybe is not an efficient way but it works: > https://gist.github.com/1946669 > > a minor bug exist for parse track line will be fixed tomorrow. time to > bed > > > Big thanks to all You can edit a gist instead of creating a new. This seems like a very fragile implementation, and hard to follow. My quick untested code: auto str = readText(filePath); // Ignoring first three lines. str = array(str.util(newline).until(newline).until(newline)); auto bedInstances = csvReader!(BedData11,Malformed.ignore)(str,'\t'); But if you must keep the separate structs, I don't have any better suggestions.

Le jeudi 01 mars 2012 à 04:36 +0100, Jesse Phillips a écrit : > On Thursday, 1 March 2012 at 02:07:44 UTC, bioinfornatics wrote: > > > It is ok i have found a way maybe is not an efficient way but > > it works: > > https://gist.github.com/1946669 > > > > a minor bug exist for parse track line will be fixed tomorrow. > > time to > > bed > > > > > > Big thanks to all > > You can edit a gist instead of creating a new. > > This seems like a very fragile implementation, and hard to follow. My quick untested code: > > auto str = readText(filePath); > > // Ignoring first three lines. > str = array(str.util(newline).until(newline).until(newline)); > > auto bedInstances = csvReader!(BedData11,Malformed.ignore)(str,'\t'); > > But if you must keep the separate structs, I don't have any better suggestions. and how convert bedInstances input array to BedData11[] ? Add a constructo to BedData11 and use std.algorithm.map? map!"BedData11(a.filed1, a.filed2...)"(bedInstances);

Forums