Pandas like features - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Pandas like features

Thread overview

Pandas like features
Oct 23, 2020 bioinfornatics
Oct 23, 2020 Imperatorn
Oct 23, 2020 mw
Oct 23, 2020 mw
Oct 23, 2020 mw
Oct 23, 2020 mw
Oct 25, 2020 jmh530
Oct 23, 2020 bachmeier
Oct 23, 2020 bioinfornatics
Oct 24, 2020 Russel Winder
Oct 24, 2020 bioinfornatics
Oct 24, 2020 Russel Winder
Oct 24, 2020 Andre Pany
Oct 27, 2020 Paulo Pinto
Oct 27, 2020 Ola Fosheim Grøstad
Oct 30, 2020 mw
Oct 30, 2020 mw
Oct 24, 2020 9il
Oct 25, 2020 jmh530
Oct 26, 2020 jmh530
Oct 24, 2020 data pulverizer
Oct 24, 2020 James Blachly
Oct 27, 2020 glis-glis
Oct 25, 2020 jmh530
Oct 25, 2020 bachmeier
Oct 26, 2020 jmh530
Oct 26, 2020 Paul Backus
Oct 26, 2020 bachmeier
Oct 29, 2020 jmh530
Oct 29, 2020 Russel Winder
Oct 30, 2020 jmh530
Oct 30, 2020 Russel Winder
Oct 30, 2020 Abdulhaq
Oct 30, 2020 bachmeier
Nov 03, 2020 Laeeth Isharc
Nov 05, 2020 data pulverizer
Nov 05, 2020 bachmeier
Nov 05, 2020 jmh530
Nov 05, 2020 bachmeier
Nov 05, 2020 data pulverizer
Nov 05, 2020 data pulverizer
Nov 05, 2020 jmh530
Nov 05, 2020 data pulverizer
Nov 05, 2020 bachmeier
Nov 05, 2020 data pulverizer
Nov 05, 2020 data pulverizer
Nov 12, 2020 bachmeier
Nov 12, 2020 bachmeier
Nov 13, 2020 data pulverizer
Nov 13, 2020 bachmeier
Nov 14, 2020 data pulverizer
Nov 14, 2020 Timon Gehr
Oct 30, 2020 Ola Fosheim Grøstad

October 23, 2020

Pandas like features

Posted by bioinfornatics

bioinfornatics

As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion.

So I thinks really that D could play a big role in this field with MIR and dcompute.

1/ what is the state of Magpie which was a GSoC 2019:
 - Mir Data Analysis and Processing Library

2/ does the scientific computing field is something that D language want to grow ?

Thanks

Best regards

October 23, 2020

Re: Pandas like features

Posted by Imperatorn
in reply to bioinfornatics

Imperatorn

Posted in reply to bioinfornatics

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
> As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion.
>
> So I thinks really that D could play a big role in this field with MIR and dcompute.
>
> 1/ what is the state of Magpie which was a GSoC 2019:
>  - Mir Data Analysis and Processing Library
>
> 2/ does the scientific computing field is something that D language want to grow ?
>
> Thanks
>
> Best regards

2. Yes!

October 23, 2020

Re: Pandas like features

Posted by mw
in reply to bioinfornatics

mw

Posted in reply to bioinfornatics

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
> As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion.
>
> So I thinks really that D could play a big role in this field with MIR and dcompute.
>
> 1/ what is the state of Magpie which was a GSoC 2019:
>  - Mir Data Analysis and Processing Library
>
> 2/ does the scientific computing field is something that D language want to grow ?
>

I think it's definitely the biggest area and opportunities for D to become more popular. GIL, lack of performance, and huge memory bloat are such pain in Python.

Probably the best way to move forward is to provide libmir as a Numpy/Pandas *drop-in* replacement. (And I've suggested to rename Mir as NumD from a marketing / promotional perspective).

For the time being, from the language/lib user's perspective, we can just use D/libmir to pre-process the data, and maybe save the result as csv/npz for further processing (by ... Python). Build or wrap something like tensorflow, I think will need much more resource than the D community current have, also I'm not sure if it worth the effort.

And from the language perspective, maybe D should adopt Python/Numpy's array indexing syntax, specifically:

1) use Python's arr[start:end], in addition to D's arr[start..end]

2) and also allow negative index, instead of [$-1]. (This $ is an improvement of Java/C++'s arr[arr.length -1], but still is less convenient than Python’s negative index syntax).

Python gained such popularity in scientific computing in the past ~10 years is not an accident, actually Guido made that happen by extending Python's syntax:

https://en.wikipedia.org/wiki/NumPy#History

"""
The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier.[6]
"""

Maybe Walter should join one of such SIGs as well :-)

October 23, 2020

Re: Pandas like features

Posted by mw
in reply to mw

mw

Posted in reply to mw

On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
> And from the language perspective, maybe D should adopt Python/Numpy's array indexing syntax, specifically:
>
> 1) use Python's arr[start:end], in addition to D's arr[start..end]
>
> 2) and also allow negative index, instead of [$-1]. (This $ is an improvement of Java/C++'s arr[arr.length -1], but still is less convenient than Python’s negative index syntax).
>
> Python gained such popularity in scientific computing in the past ~10 years is not an accident, actually Guido made that happen by extending Python's syntax:
>
> https://en.wikipedia.org/wiki/NumPy#History
>
> """
> The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier.[6]
> """
>
> Maybe Walter should join one of such SIGs as well :-)

Let me further quote from [6]

"""
During these early years, there was considerable interaction between the standard and scientific Python communities. In fact, Guido van Rossum, Python's Benevolent Dictator For Life (BDFL), was an active member of the matrix-sig. This close interaction resulted in Python gaining new features and syntax specifically needed by the scientific Python community. While there were miscellaneous changes, such as the addition of complex numbers, many changes focused on providing a more succinct and easier to read syntax for array manipulation. For instance, the parenthesis around tuples were made optional so that array elements could be accessed through, for example, a[0,1] instead of a[(0,1)]. The slice syntax gained a step argument— a[::2] instead of just a[:], for example—and an ellipsis operator, which is useful when dealing with multidimensional data structures.
"""

[6] https://www.computer.org/csdl/magazine/cs/2011/02/mcs2011020009/13rRUx0xPMx

October 23, 2020

Re: Pandas like features

Posted by bachmeier
in reply to bioinfornatics

bachmeier

Posted in reply to bioinfornatics

On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
> As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion.
>
> So I thinks really that D could play a big role in this field with MIR and dcompute.
>
> 1/ what is the state of Magpie which was a GSoC 2019:
>  - Mir Data Analysis and Processing Library
>
> 2/ does the scientific computing field is something that D language want to grow ?
>
> Thanks
>
> Best regards

There is some activity in this space:
https://code.dlang.org/?sort=updated&category=library.scientific

This project doesn't seem too active, but it was an earlier attempt:
http://dlangscience.github.io/

October 23, 2020

Re: Pandas like features

Posted by mw
in reply to mw

mw

Posted in reply to mw

On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
> 1) use Python's arr[start:end], in addition to D's arr[start..end]

BTW, in Python arr[start:end:step], how / if it's possible for this `step` in now D?

October 23, 2020

Re: Pandas like features

Posted by bioinfornatics
in reply to bachmeier

bioinfornatics

Posted in reply to bachmeier

On Friday, 23 October 2020 at 22:48:16 UTC, bachmeier wrote:
> On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
>> As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion.
>>
>> So I thinks really that D could play a big role in this field with MIR and dcompute.
>>
>> 1/ what is the state of Magpie which was a GSoC 2019:
>>  - Mir Data Analysis and Processing Library
>>
>> 2/ does the scientific computing field is something that D language want to grow ?
>>
>> Thanks
>>
>> Best regards
>
> There is some activity in this space:
> https://code.dlang.org/?sort=updated&category=library.scientific
>
> This project doesn't seem too active, but it was an earlier attempt:
> http://dlangscience.github.io/

To me a scientific library need to be HPC oriented, able
- to perform // computation on CPU or GPU
- to use divide and conquer strategy in order to compute over multinode
- to have dataframe features
- to have scipy features
A such library would be awesome as at these time python slowness become more and more important as data grow exponentially year after year

October 23, 2020

Re: Pandas like features

Posted by mw
in reply to mw

mw

Posted in reply to mw

On Friday, 23 October 2020 at 22:53:29 UTC, mw wrote:
> On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
>> 1) use Python's arr[start:end], in addition to D's arr[start..end]
>
> BTW, in Python arr[start:end:step], how / if it's possible for this `step` in now D?


(Today I'm in the mood of a language historian :-)

Some of Guido's early discussion of Python array index:

Slices
https://mail.python.org/pipermail/matrix-sig/1996-April/000553.html

Pseudo Indices
https://mail.python.org/pipermail/matrix-sig/1996-January/000331.html

Mutli-dimensional indexing and other comments
https://mail.python.org/pipermail/matrix-sig/1995-October/000077.html

A problem with slicing
https://mail.python.org/pipermail/matrix-sig/1995-September/000042.html

October 24, 2020

Re: Pandas like features

Posted by Russel Winder
in reply to bioinfornatics

Russel Winder

Posted in reply to bioinfornatics

Attachments:

signature.asc (This is a digitally signed message part)

On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via Digitalmars-d wrote: […]
> To me a scientific library need to be HPC oriented, able
> - to perform // computation on CPU or GPU
> - to use divide and conquer strategy in order to compute over
> multinode
> - to have dataframe features
> - to have scipy features
> A such library would be awesome as at these time python slowness
> become more and more important as data grow exponentially year
> after year

Acting somewhat as "Devil's Advocate"…

Why not just use Chapel https://chapel-lang.org/ – it is a programming language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing.

I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there.

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

October 24, 2020

Re: Pandas like features

Posted by bioinfornatics
in reply to Russel Winder

bioinfornatics

Posted in reply to Russel Winder

On Saturday, 24 October 2020 at 09:29:46 UTC, Russel Winder wrote:
> On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via Digitalmars-d wrote: […]
>> To me a scientific library need to be HPC oriented, able
>> - to perform // computation on CPU or GPU
>> - to use divide and conquer strategy in order to compute over
>> multinode
>> - to have dataframe features
>> - to have scipy features
>> A such library would be awesome as at these time python slowness
>> become more and more important as data grow exponentially year
>> after year
>
> Acting somewhat as "Devil's Advocate"…
>
> Why not just use Chapel https://chapel-lang.org/ – it is a programming language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing.
>
> I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there.

Maybe, anyway since years D search the killer app. Really I thanks thisr area it is perfect for D.
Data Business analysis is so important in this day in science, economy and other D could be a good choice.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation