Thread overview | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
October 23, 2020 Pandas like features | ||||
---|---|---|---|---|
| ||||
As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. So I thinks really that D could play a big role in this field with MIR and dcompute. 1/ what is the state of Magpie which was a GSoC 2019: - Mir Data Analysis and Processing Library 2/ does the scientific computing field is something that D language want to grow ? Thanks Best regards |
October 23, 2020 Re: Pandas like features | ||||
---|---|---|---|---|
| ||||
Posted in reply to bioinfornatics | On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
> As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion.
>
> So I thinks really that D could play a big role in this field with MIR and dcompute.
>
> 1/ what is the state of Magpie which was a GSoC 2019:
> - Mir Data Analysis and Processing Library
>
> 2/ does the scientific computing field is something that D language want to grow ?
>
> Thanks
>
> Best regards
2. Yes!
|
October 23, 2020 Re: Pandas like features | ||||
---|---|---|---|---|
| ||||
Posted in reply to bioinfornatics | On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote: > As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. > > So I thinks really that D could play a big role in this field with MIR and dcompute. > > 1/ what is the state of Magpie which was a GSoC 2019: > - Mir Data Analysis and Processing Library > > 2/ does the scientific computing field is something that D language want to grow ? > I think it's definitely the biggest area and opportunities for D to become more popular. GIL, lack of performance, and huge memory bloat are such pain in Python. Probably the best way to move forward is to provide libmir as a Numpy/Pandas *drop-in* replacement. (And I've suggested to rename Mir as NumD from a marketing / promotional perspective). For the time being, from the language/lib user's perspective, we can just use D/libmir to pre-process the data, and maybe save the result as csv/npz for further processing (by ... Python). Build or wrap something like tensorflow, I think will need much more resource than the D community current have, also I'm not sure if it worth the effort. And from the language perspective, maybe D should adopt Python/Numpy's array indexing syntax, specifically: 1) use Python's arr[start:end], in addition to D's arr[start..end] 2) and also allow negative index, instead of [$-1]. (This $ is an improvement of Java/C++'s arr[arr.length -1], but still is less convenient than Python’s negative index syntax). Python gained such popularity in scientific computing in the past ~10 years is not an accident, actually Guido made that happen by extending Python's syntax: https://en.wikipedia.org/wiki/NumPy#History """ The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier.[6] """ Maybe Walter should join one of such SIGs as well :-) |
October 23, 2020 Re: Pandas like features | ||||
---|---|---|---|---|
| ||||
Posted in reply to mw | On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote: > And from the language perspective, maybe D should adopt Python/Numpy's array indexing syntax, specifically: > > 1) use Python's arr[start:end], in addition to D's arr[start..end] > > 2) and also allow negative index, instead of [$-1]. (This $ is an improvement of Java/C++'s arr[arr.length -1], but still is less convenient than Python’s negative index syntax). > > Python gained such popularity in scientific computing in the past ~10 years is not an accident, actually Guido made that happen by extending Python's syntax: > > https://en.wikipedia.org/wiki/NumPy#History > > """ > The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier.[6] > """ > > Maybe Walter should join one of such SIGs as well :-) Let me further quote from [6] """ During these early years, there was considerable interaction between the standard and scientific Python communities. In fact, Guido van Rossum, Python's Benevolent Dictator For Life (BDFL), was an active member of the matrix-sig. This close interaction resulted in Python gaining new features and syntax specifically needed by the scientific Python community. While there were miscellaneous changes, such as the addition of complex numbers, many changes focused on providing a more succinct and easier to read syntax for array manipulation. For instance, the parenthesis around tuples were made optional so that array elements could be accessed through, for example, a[0,1] instead of a[(0,1)]. The slice syntax gained a step argument— a[::2] instead of just a[:], for example—and an ellipsis operator, which is useful when dealing with multidimensional data structures. """ [6] https://www.computer.org/csdl/magazine/cs/2011/02/mcs2011020009/13rRUx0xPMx |
October 23, 2020 Re: Pandas like features | ||||
---|---|---|---|---|
| ||||
Posted in reply to bioinfornatics | On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote: > As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion. > > So I thinks really that D could play a big role in this field with MIR and dcompute. > > 1/ what is the state of Magpie which was a GSoC 2019: > - Mir Data Analysis and Processing Library > > 2/ does the scientific computing field is something that D language want to grow ? > > Thanks > > Best regards There is some activity in this space: https://code.dlang.org/?sort=updated&category=library.scientific This project doesn't seem too active, but it was an earlier attempt: http://dlangscience.github.io/ |
October 23, 2020 Re: Pandas like features | ||||
---|---|---|---|---|
| ||||
Posted in reply to mw | On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote:
> 1) use Python's arr[start:end], in addition to D's arr[start..end]
BTW, in Python arr[start:end:step], how / if it's possible for this `step` in now D?
|
October 23, 2020 Re: Pandas like features | ||||
---|---|---|---|---|
| ||||
Posted in reply to bachmeier | On Friday, 23 October 2020 at 22:48:16 UTC, bachmeier wrote:
> On Friday, 23 October 2020 at 19:31:08 UTC, bioinfornatics wrote:
>> As a researcher in BioInformatics I use a lot python numpy pandas and scipy. But I am bored by the slowness of python even with cpython code thanks to the GIL and un-optimized tail recursion.
>>
>> So I thinks really that D could play a big role in this field with MIR and dcompute.
>>
>> 1/ what is the state of Magpie which was a GSoC 2019:
>> - Mir Data Analysis and Processing Library
>>
>> 2/ does the scientific computing field is something that D language want to grow ?
>>
>> Thanks
>>
>> Best regards
>
> There is some activity in this space:
> https://code.dlang.org/?sort=updated&category=library.scientific
>
> This project doesn't seem too active, but it was an earlier attempt:
> http://dlangscience.github.io/
To me a scientific library need to be HPC oriented, able
- to perform // computation on CPU or GPU
- to use divide and conquer strategy in order to compute over multinode
- to have dataframe features
- to have scipy features
A such library would be awesome as at these time python slowness become more and more important as data grow exponentially year after year
|
October 23, 2020 Re: Pandas like features | ||||
---|---|---|---|---|
| ||||
Posted in reply to mw | On Friday, 23 October 2020 at 22:53:29 UTC, mw wrote: > On Friday, 23 October 2020 at 22:38:39 UTC, mw wrote: >> 1) use Python's arr[start:end], in addition to D's arr[start..end] > > BTW, in Python arr[start:end:step], how / if it's possible for this `step` in now D? (Today I'm in the mood of a language historian :-) Some of Guido's early discussion of Python array index: Slices https://mail.python.org/pipermail/matrix-sig/1996-April/000553.html Pseudo Indices https://mail.python.org/pipermail/matrix-sig/1996-January/000331.html Mutli-dimensional indexing and other comments https://mail.python.org/pipermail/matrix-sig/1995-October/000077.html A problem with slicing https://mail.python.org/pipermail/matrix-sig/1995-September/000042.html |
October 24, 2020 Re: Pandas like features | ||||
---|---|---|---|---|
| ||||
Posted in reply to bioinfornatics Attachments:
| On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via Digitalmars-d wrote: […] > To me a scientific library need to be HPC oriented, able > - to perform // computation on CPU or GPU > - to use divide and conquer strategy in order to compute over > multinode > - to have dataframe features > - to have scipy features > A such library would be awesome as at these time python slowness > become more and more important as data grow exponentially year > after year Acting somewhat as "Devil's Advocate"… Why not just use Chapel https://chapel-lang.org/ – it is a programming language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing. I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there. -- Russel. =========================================== Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk |
October 24, 2020 Re: Pandas like features | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russel Winder | On Saturday, 24 October 2020 at 09:29:46 UTC, Russel Winder wrote:
> On Fri, 2020-10-23 at 23:00 +0000, bioinfornatics via Digitalmars-d wrote: […]
>> To me a scientific library need to be HPC oriented, able
>> - to perform // computation on CPU or GPU
>> - to use divide and conquer strategy in order to compute over
>> multinode
>> - to have dataframe features
>> - to have scipy features
>> A such library would be awesome as at these time python slowness
>> become more and more important as data grow exponentially year
>> after year
>
> Acting somewhat as "Devil's Advocate"…
>
> Why not just use Chapel https://chapel-lang.org/ – it is a programming language designed to run in parallel contexts and has an awful lot of the stuff other (invariable sequential, cf. C++, D, Rust) programming language have trouble providing.
>
> I am not sure Chapel has pandas style data frames explicitly but I'll bet something equivalent is already in there.
Maybe, anyway since years D search the killer app. Really I thanks thisr area it is perfect for D.
Data Business analysis is so important in this day in science, economy and other D could be a good choice.
|
Copyright © 1999-2021 by the D Language Foundation