dataframe implementations (page 2)

On Wednesday, 18 November 2015 at 22:46:01 UTC, jmh530 wrote: > My sense is that any data frame implementation should try to build on the work that's being done with n-dimensional slices. I've been watching that development, but I don't have a feel for where it could be applied in this case, since it appears to be focused on multi-dimensional slices of the same data type, slicing up a single range. The dataframes often consist of different data types by column. How did you see the nd slices being used? Maybe the nd slices could be applied if you considered each row to be the same structure, and slice by rows rather than operating on columns. Pandas supports a multi-dimension panel. Maybe this would be the application for nd slices by row.

On Thursday, 19 November 2015 at 06:33:06 UTC, Jay Norwood wrote: > On Wednesday, 18 November 2015 at 22:46:01 UTC, jmh530 wrote: >> My sense is that any data frame implementation should try to build on the work that's being done with n-dimensional slices. > > I've been watching that development, but I don't have a feel for where it could be applied in this case, since it appears to be focused on multi-dimensional slices of the same data type, slicing up a single range. > > The dataframes often consist of different data types by column. > > How did you see the nd slices being used? > > Maybe the nd slices could be applied if you considered each row to be the same structure, and slice by rows rather than operating on columns. Pandas supports a multi-dimension panel. > Maybe this would be the application for nd slices by row. You might not build on the nd slice type itself, but implementing the same API (where possible/appropriate) would be good.

On Thursday, 19 November 2015 at 06:33:06 UTC, Jay Norwood wrote: > On Wednesday, 18 November 2015 at 22:46:01 UTC, jmh530 wrote: >> My sense is that any data frame implementation should try to build on the work that's being done with n-dimensional slices. > > I've been watching that development, but I don't have a feel for where it could be applied in this case, since it appears to be focused on multi-dimensional slices of the same data type, slicing up a single range. > > The dataframes often consist of different data types by column. > > How did you see the nd slices being used? > > Maybe the nd slices could be applied if you considered each row to be the same structure, and slice by rows rather than operating on columns. Pandas supports a multi-dimension panel. > Maybe this would be the application for nd slices by row. How about using a nd slice of Variant(s), or a more specialized type Algebraic type? [1]: http://dlang.org/phobos/std_variant

On Thursday, 19 November 2015 at 06:33:06 UTC, Jay Norwood wrote: > > Maybe the nd slices could be applied if you considered each row to be the same structure, and slice by rows rather than operating on columns. Pandas supports a multi-dimension panel. > Maybe this would be the application for nd slices by row. I meant in the sense that Pandas is built upon Numpy.

On Thursday, 19 November 2015 at 22:14:01 UTC, ZombineDev wrote: > On Thursday, 19 November 2015 at 06:33:06 UTC, Jay Norwood wrote: >> On Wednesday, 18 November 2015 at 22:46:01 UTC, jmh530 wrote: >>> My sense is that any data frame implementation should try to build on the work that's being done with n-dimensional slices. >> >> I've been watching that development, but I don't have a feel for where it could be applied in this case, since it appears to be focused on multi-dimensional slices of the same data type, slicing up a single range. >> >> The dataframes often consist of different data types by column. >> >> How did you see the nd slices being used? >> >> Maybe the nd slices could be applied if you considered each row to be the same structure, and slice by rows rather than operating on columns. Pandas supports a multi-dimension panel. >> Maybe this would be the application for nd slices by row. > > How about using a nd slice of Variant(s), or a more specialized type Algebraic type? > > [1]: http://dlang.org/phobos/std_variant Not sure it is a great idea to use a variant as the basic option when very often you will know that every cell in a particular column will be of the same type.

December 03, 2015

Re: dataframe implementations

Posted by Jay Norwood
in reply to Laeeth Isharc

Permalink

Jay Norwood

Posted in reply to Laeeth Isharc

Permalink

On Saturday, 21 November 2015 at 14:16:26 UTC, Laeeth Isharc wrote:
>
> Not sure it is a great idea to use a variant as the basic option when very often you will know that every cell in a particular column will be of the same type.

I'm reading today about an n-dim extension to pandas named xray.  Maybe should try to understand how that fits.  They support io from netCDF, and are making extensions to support blocked input using dask, so they can process data larger than in-memory limits.

http://xray.readthedocs.org/en/stable/data-structures.html
https://www.continuum.io/content/xray-dask-out-core-labeled-arrays-python

In general, pandas and xray are supporting with the requirement of pulling in data from storage of initially unknown column and index names and data types.  Julia throws in support of jit compilation and specialized operations for different data types.

It seems to me that D's strength would be in a quick compile, which would then allow you to replace the dictionary tag implementations and variants with something that used compile time symbol names and data types. Seems like that would provide more efficient processing, as well as better tab completion support when creating expressions.

Forums