December 27, 2014 Re: Data Frames in D - let's not wait for linear algebra; useful today in finance and Internet of Things | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad Attachments:
| On Sat, 2014-12-27 at 14:28 +0000, via Digitalmars-d-learn wrote: […] > I don't disagree in principle, but if an OpenMP supporting compiler can generate code for GPGPU then D will be miles behind for many homogeneous workloads. No-one with resources showed any interest in having a D with GPGPU capability, so I think we can more or less say that C++ has won this arena. Well except that everyone uses C, including the Python folk. I am awaiting the Java play in this space from the IBM folk. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder |
December 27, 2014 Re: Data Frames in D - let's not wait for linear algebra; useful today in finance and Internet of Things | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russel Winder | Russell:
"I think we are agreeing. Very lightweight editor and executor of code
fragments is as good, if not better, that the one line REPL."
Yes - the key for me is that the absence of a shell is by no means a reason to say that D is not suited to this task. One may wish to refine what exists, but that is another question entirely.
"Part of the problem here is tribalism. Most data science people want to
use the same tools that other data science people use, even though the
issue is to differentiate themselves."
Yes - we are answering two different questions. I could not care less about persuading anyone en masse in a broad sector, those who think of themselves as being 'data scientists' included. It's silly, in my view, to think of it as an established field very distinct from others, and with a fixed way of doing things. If for no other reason that things are in flux and the sector is growing quickly, which means that there is room for many different approaches, and it is premature to think the popularity of approach X or Y today means that approach 'D' can't be productive tomorrow.
But as I said, I am less convinced in persuading anyone, and rather more concerned with getting a basic data frame in D up and running because I could certainly use it, and the hard work has been done already. The basics should be an evening's work for an advanced D hacker, but it will probably take me longer than that. In any case, since nobody else has come forward, I will keep working away at it.
"A BLAS library is certainly a precusor, as is very good data
visualization tools, graphs, diagrams etc."
Perhaps a prerequisite to D being seen as a contender, but I don't see how it's a prerequisite just to have a dataframe, which is really a very simple yet incredibly useful thing.
"Go has masses of people putting a lot of effort into Web. It's not the ideas, it's the number of people getting on board and doing things".
Also about the quality of the people. (I have no view about Go, but have a very positive view on D). When things get big there is a danger they get cluttered. That's one blessing for D.
"To get some traction in any of these areas, finance data analysis and
model building, or systems activity, it is all about people doing it,
publicizing it and making things available for others to use".
Yes - so do you have any thoughts on what a data frame structure should look like? I am trying to do and after that will make available.
"But it needs to be better than Julia in some way that makes
others sit up and take notice. There has to be the ability to create
some hype."
Don't care ;) This concept of "what is your edge" is not my cup of tea because I do not see the world in those terms. Something of high quality that's highly productive will over time stand a decent chance of becoming more widely adopted, whereas trying to force it into some kind of marketing framework can prove counterproductive.
Right now, the main thing I care about is solving the problem at hand, because if it solves my problem well then I am pretty sure it will be useful to others too, and be so better than if one had adopted a more consciously 'commercial/marketing' mindset.
I would post the dataframe skeleton here, but it's too embarassing right now and want to read the std.variant library to see what tricks I can learn. (A data series seems kind of like a variant, but with every cell the same type). Obviously in some cases the data frame type is defined at compile time, like a struct, and that's easy. But if you are loading from a file you need to be able to have dynamic typing for the column.
"> I don't believe I agree that we need a perfect multi-dimensional
> rectangular array library to serve as a backend before thinking and doing much on data frames (although it will certainly be very useful when ready).
Also, if there is a ready made C or C++ library that can be made use of,
do it."
Well, the hard parts of arrays themselves (and it's not that fiendishly hard, I would think) seem to need to be tightly integrated with the language, so I don't see how a C/C++ library will help so much. For the linear algebra, yes...
hyping it up.
"I recently discovered a number of hedge funds work solely on moving
average based algorithmic trading. NumPy, SciPy and Pandas all have
variations on this basic algorithm."
Well, having worked for more or less quanty hedge funds since 98, I would think it unlikely that anyone depends only on moving averages although basic old-school trend-following certainly does work - it is just a hard sell to herding institutional investors, and does not fit very well with the concept of a 'career'. (You have to be able to see the five years of subdued returns since 2009 as just part of the cycle, which indeed may be the correct view when one sees markets as a natural phenomenon, but is not the view of asset allocators, or talented people one may want to hire in other areas).
"Perceived to be fast. In fact it isn't anything like as fast as it
should be. NumPy (which underpins Pandas and provides all the data
structures and basic algorithms), is actually quite slow."
Yes - was tired when I wrote, and meant to say Pandas is fast for key things such as parsing large data files eg CSVs - significantly faster than Julia, from what I have seen. And yes - I agree about Numpy, and don't need to be persuaded of the benefits of moving to something else if one can make it slightly less inconvenient. Which is how this conversation started - you really don't need a perfect BLAS implementation/wrapper to start to benefit from a dataframe.
"Guido though is I/O bound rather than CPU bound in his work and doesn't see a need for anything other than multiprocessing for accessing parallelism in Python. Sadly, it can be shown that multiprocessing is slow and inefficient at what it does and it needs replacing".
I cannot claim deep expertise here, but this was one of the things that got me looking at D originally. Just too frustrating trying to fit with the restrictions to write nogil Cython code, knowing that one might need to rewrite when one has mentally long moved on. Ie I feel like I am short options building my platform that way, and I don't like being short options when they don't cost much to buy. Hence D. It also struck me that there was a degree of complacency amongst some Python people, whereas hunger and insecurity may be a spur to greater and more creative efforts.
"In principle this is fertile territory for a new language to take the
stage. Hence Julia." I fear D has missed the boat of this opportunity
now."
I really don't see why one can't just take the next boat arriving in fifteen minutes. Or establish a new boat service going somewhere better that hooks up with the existing network. Conditions are changing so quickly, and the gap between the talk about big data etc and what people have actually done so far so large that to me the field seems wide open. I don't see an alternative acceptable way to do what I would like, so D it will be. And if I think that way today, probably others will have the same thoughts in coming years. (Perhaps not).
"This is worth hyping up, it should be front and centre on teh dlang
pages along with Facebook funding bug fixes."
I agree. Also in a few lines a punchier summary of why Sociomantic use D, what the benefits have been, and how they deal with the standards sorts of hurdles that might have been objections in a more mature and conventional company ("how are you going to hire experienced D programmers").
"But if all the libraries are C , C++ and Fortran, is there any value add
role for D?"
I guess we vote with our feet/fingers. Sounds like you don't find D especially useful (since you don't use it much currently), whereas I do. De gustibus non est disputandum, particularly when tastes reflect being in different situations.
"Lots of C++ system embed Python or Lua for dynamic scripting capability,
lots of Python and R system call out to C. This seems a well established
milieu. Is there a good way for D to, in an evolutionary way establish a
permanent foothold. Certainly it cannot be a revolutionary one."
You write as if Christensen's book "The Innovator's Dilemma" had never been written, and nor had it been a standard textbook in business schools for some years. You may have good arguments as to why he is wrong, or why it doesn't apply to D, but you haven't set them out, as far as I am aware.
Not Russell
"There will sure be some algorithms where numba/cython would do
better (especially if they cannot be easily vectorized), but
that's not the point. The thing about numpy is that it provides a
unified accepted interface (plus a reasonable set of reasonably
fast tools and algorithms) for arrays and buffers for a multitude
of scientific libraries (scipy, pytables, h5py, pandas, scikit-*,
just to name a few), which then makes it much easier to use them
together and write your own ones."
Yes. But one has to start somewhere (if not happy with the python route), and we start to have equivalents of scipy,pytables/h5py. So why not pandas?
"Splunk stuff is just an example of using dataflow networks for
processing data rather than using SQL. The "Big Data using JVM"
community are already on this road, cf. various proprietary frameworks
running over Hadoop and Spark."
Yes - technically, it may well be "nothing more than". But many of the practical problems which have a high commercial return to solving are "nothing more than" quite simple things technically. One doesn't need to be a technical genius to make valuable commercial contributions. And maybe Hadoop and Spark are just the perfect solution for most people (maybe not!), but that certainly leaves some room for others.
So... data frames!?
|
December 27, 2014 Re: Data Frames in D - let's not wait for linear algebra; useful today in finance and Internet of Things | ||||
---|---|---|---|---|
| ||||
Posted in reply to Laeeth Isharc Attachments:
| On Sat, 2014-12-27 at 15:33 +0000, Laeeth Isharc via Digitalmars-d-learn wrote: […] > > I guess we vote with our feet/fingers. Sounds like you don't find D especially useful (since you don't use it much currently), whereas I do. De gustibus non est disputandum, particularly when tastes reflect being in different situations. […] For the avoidance of confusion, the reason I am not using D just now is that I am not actually doing much (other than some training workshops) just now. I was going to use D for a start-up a couple of years ago and Go for a start-up last year, but both projects fell through. These days my only real programming is tinkering with a few toy problems. Oh and tinkering with GPars and Spock, but that is JVM stuff and so likely not interesting to the folk on this list. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder |
December 27, 2014 Re: Data Frames in D - let's not wait for linear algebra; useful today in finance and Internet of Things | ||||
---|---|---|---|---|
| ||||
Posted in reply to Laeeth Isharc Attachments:
| On Sat, 2014-12-27 at 15:33 +0000, Laeeth Isharc via Digitalmars-d-learn wrote: […lots of agreed uncontentious stuff :-) …] > You write as if Christensen's book "The Innovator's Dilemma" had never been written, and nor had it been a standard textbook in business schools for some years. You may have good arguments as to why he is wrong, or why it doesn't apply to D, but you haven't set them out, as far as I am aware. In the post-production world as I know it (Nuke, etc.) The C++/Python combination has never failed to be adequate to the innovation demanded by film makers. In the image processing world the C++/Lua combination has never failed to adapt to the innovation needed by photograph tinkerers. My point was really that the customers have never found an innovative need that the extant platforms couldn't provide. I felt this was somewhat different to the Christensen argument. On the other hand, I may have missed the point… -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder |
December 28, 2014 Re: Data Frames in D - let's not wait for linear algebra; useful today in finance and Internet of Things | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russel Winder | On Saturday, 27 December 2014 at 16:41:04 UTC, Russel Winder via Digitalmars-d-learn wrote:
> On Sat, 2014-12-27 at 15:33 +0000, Laeeth Isharc via Digitalmars-d-learn
> wrote:
> […lots of agreed uncontentious stuff :-) …]
>
>
>> You write as if Christensen's book "The Innovator's Dilemma" had never been written, and nor had it been a standard textbook in business schools for some years. You may have good arguments as to why he is wrong, or why it doesn't apply to D, but you haven't set them out, as far as I am aware.
>
>
> In the post-production world as I know it (Nuke, etc.) The C++/Python
> combination has never failed to be adequate to the innovation demanded
> by film makers. In the image processing world the C++/Lua combination
> has never failed to adapt to the innovation needed by photograph
> tinkerers. My point was really that the customers have never found an
> innovative need that the extant platforms couldn't provide. I felt this
> was somewhat different to the Christensen argument. On the other hand, I
> may have missed the point…
No matter how plugged in a person may be, it is impossible to be aware of everything that is going on, especially in exactly the kind of domains Christensen talks about - ones that aren't by any standard important in a spot sense to the bigger picture, but that critically provide a quiet relatively uncontested niche for the seeds of something to unfold until it is ready to break out into the broader world.
So I think the point is that one shouldn't be bothered one jot by the disinclination of the people you know to want to use D, particularly since you are so plugged in to all these other worlds (and being an insider in a sense that matters today has an opportunity cost because it means one is not spending time and attention speaking to non insiders as much at that instant). New growth will come from the fringes.
I think one should be very worried if the Adam Ruppe of the world would start to say D sucks - nice idea, but just not expressive enough for me, and I am switching back to Ruby and Python. Because that would indicate a loss of ground in the home niche. But somehow I don't think so...! And meantime quietly things continue to develop.
What matters is not the challenges one faces, but how one deals with them. An outpouring of frustration in recent days, and the result is we are going to get better docs, better examples, and who knows what else. That's a sign of health.
Will post code I have in a few days.
Laeeth.
|
December 29, 2014 Re: Data Frames in D - let's not wait for linear algebra; useful today in finance and Internet of Things | ||||
---|---|---|---|---|
| ||||
Posted in reply to Laeeth Isharc | Laeeth - I am not sure exactly what your needs are but I have a fairly complete solution for generic multidimensional interfaces (template-based, bounds checked, RAII-ready, non-integer indices, the whole shebang) that I have been building. Anyway I don't want to spam the forum if I've missed the point of this discussion, but perhaps we could speak about it further over email and you could give me your opinion? I'm at vlevenfeld@gmail.com |
December 29, 2014 Re: Data Frames in D - let's not wait for linear algebra; useful today in finance and Internet of Things | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vlad Levenfeld | On Monday, 29 December 2014 at 04:08:58 UTC, Vlad Levenfeld wrote:
> Laeeth - I am not sure exactly what your needs are but I have a
> fairly complete solution for generic multidimensional interfaces
> (template-based, bounds checked, RAII-ready, non-integer indices,
> the whole shebang) that I have been building. Anyway I don't want
> to spam the forum if I've missed the point of this discussion,
> but perhaps we could speak about it further over email and you
> could give me your opinion? I'm at vlevenfeld@gmail.com
Hi Vlad.
Thanks v much for getting in touch.
Your work sounds very interesting. I will drop you a line in coming days.
Happy new year.
Laeeth.
|
Copyright © 1999-2021 by the D Language Foundation