December 22, 2014
On Saturday, 22 March 2014 at 14:33:02 UTC, TJB wrote:
> On Saturday, 22 March 2014 at 13:10:46 UTC, Daniel Davidson wrote:
>> Data storage for high volume would also be nice. A D implementation of HDF5, via wrappers or otherwise, would be a very useful project. Imagine how much more friendly the API could be in D. Python's tables library makes it very simple. You have to choose a language to not only process and visualize data, but store and access it as well.
>>
>> Thanks
>> Dan
>
> Well, I for one, would be hugely interested in such a thing.  A
> nice D API to HDF5 would be a dream for my data problems.
>
> Did you use HDF5 in your finance industry days then?  Just
> curious.
>
> TJB

Well for HDF5 - the bindings are here now - pre alpha but will get there soone enough - and wrappers coming along also.

Any thoughts/suggestions/help appreciated.  Github here:

https://github.com/Laeeth/d_hdf5


I wonder how much work it would be to port or implement Pandas type functionality in a D library.
December 22, 2014
On Monday, 22 December 2014 at 08:35:59 UTC, Laeeth Isharc wrote:
> On Saturday, 22 March 2014 at 14:33:02 UTC, TJB wrote:
>> On Saturday, 22 March 2014 at 13:10:46 UTC, Daniel Davidson wrote:
>>> Data storage for high volume would also be nice. A D implementation of HDF5, via wrappers or otherwise, would be a very useful project. Imagine how much more friendly the API could be in D. Python's tables library makes it very simple. You have to choose a language to not only process and visualize data, but store and access it as well.
>>>
>>> Thanks
>>> Dan
>>
>> Well, I for one, would be hugely interested in such a thing.  A
>> nice D API to HDF5 would be a dream for my data problems.
>>
>> Did you use HDF5 in your finance industry days then?  Just
>> curious.
>>
>> TJB
>
> Well for HDF5 - the bindings are here now - pre alpha but will get there soone enough - and wrappers coming along also.
>
> Any thoughts/suggestions/help appreciated.  Github here:
>
> https://github.com/Laeeth/d_hdf5
>
>
> I wonder how much work it would be to port or implement Pandas type functionality in a D library.

@Laeeth

As a matter of fact, I've been working on HDF5 bindings for D as well -- I'm done with the binding/wrapping part so far (with automatic throwing of D exceptions whenever errors occur in the C library, and other niceties) and am hacking at the higher level OOP API -- can publish it soon if anyone's interested :) Maybe we can join efforts and make it work (that and standardizing a multi-dimensional array library in D).
December 22, 2014
On Monday, 22 December 2014 at 11:59:11 UTC, aldanor wrote:
> @Laeeth
>
> As a matter of fact, I've been working on HDF5 bindings for D as well -- I'm done with the binding/wrapping part so far (with automatic throwing of D exceptions whenever errors occur in the C library, and other niceties) and am hacking at the higher level OOP API -- can publish it soon if anyone's interested :) Maybe we can join efforts and make it work (that and standardizing a multi-dimensional array library in D).


Oh, well :)  I would certainly be interested to see what you have, even if not finished yet.  My focus was sadly getting something working soon in a sprint, rather than building something excellent later, and I would think your work will be cleaner.

In any case, I would very much be interested in exchanging ideas or working together - on HDF5, on multi-dim or on other projects relating to finance/quant/scientific computing and the like.  So maybe you could send me a link when you are ready - either post here or my email address is my first name at my first name.com

Thanks.
December 22, 2014
On Saturday, 22 March 2014 at 00:14:11 UTC, Daniel Davidson wrote:
> On Friday, 21 March 2014 at 21:14:15 UTC, TJB wrote:
>> Walter,
>>
>> I see that you will be discussing "High Performance Code Using D" at the 2014 DConf. This will be a very welcomed topic for many of us.  I am a Finance Professor.  I currently teach and do research in computational finance.  Might I suggest that you include some finance (say Monte Carlo options pricing) examples?  If you can get the finance industry interested in D you might see a massive adoption of the language.  Many are desperate for an alternative to C++ in that space.
>>
>> Just a thought.
>>
>> Best,
>>
>> TJB
>
> Maybe a good starting point would be to port some of QuantLib and see how the performance compares. In High Frequency Trading I think D would be a tough sell, unfortunately.
>
> Thanks
> Dan

In case it wasn't obvious from the discussion that followed: finance is a broad field with many different kinds of creature within, and there are different kinds of problems faced by different participants.

High Frequency Trading has peculiar requirements (relating to latency, amongst other things) that will not necessarily be representative of other areas.  Even within this area there is a difference between the needs of a Citadel in its option marketmaking activity versus the activity of a pure delta HFT player (although they also overlap).

A JP Morgan that needs to be able to price and calculate risk for large portfolios of convex instruments in its vanilla and exotic options books has different requirements, again.

You would typically use Monte Carlo (or quasi MC) to price more complex products for which there is not a good analytical approximation.  (Or to deal with the fact that volatility is not constant).  So that fits very much with the needs of large banks - and perhaps some hedge funds - but I don't think a typical HFT guy would be all that interested to know about this.  They are different domains.

Quant/CTA funds also have decent computational requirements, but these are not necessarily high frequency.  Winton Capital, for example, is one of the larger hedge funds in Europe by assets, but they have talked publicly about emphasizing longer-term horizons because even in liquid markets there simply is not the liquidity to turn over the volume they would need to to make an impact on their returns.  In this case, whilst execution is always important, the research side of things is where the value gets created.  And its not unusual to have quant funds where every portfolio manager also programs.  (I will not mention names).  One might think that rapid iteration here could have value.

http://www.efinancialcareers.co.uk/jobs-UK-London-Senior_Data_Scientist_-_Quant_Hedge_Fund.id00654869

Fwiw having spoken to a few people the past few weeks, I am struck by how hollowed-out front office has become, both within banks and hedge funds.  It's a nice business when things go well, but there is tremendous operating leverage, and if one builds up fixed costs then losing assets under management and having a poor period of performance (which is part of the game, not necessarily a sign of failure) can quickly mean that you cannot pay people (more than salaries) - which hurts morale and means you risk losing your best people.

So people have responded by paring down quant/research support to producing roles, even when that makes no sense.  (Programmers are not expensive).  In that environment, D may offer attractive productivity without sacrificing performance.
December 22, 2014
On Monday, 22 December 2014 at 12:24:52 UTC, Laeeth Isharc wrote:
>
> In case it wasn't obvious from the discussion that followed: finance is a broad field with many different kinds of creature within, and there are different kinds of problems faced by different participants.
>
> High Frequency Trading has peculiar requirements (relating to latency, amongst other things) that will not necessarily be representative of other areas.  Even within this area there is a difference between the needs of a Citadel in its option marketmaking activity versus the activity of a pure delta HFT player (although they also overlap).
>
> A JP Morgan that needs to be able to price and calculate risk for large portfolios of convex instruments in its vanilla and exotic options books has different requirements, again.
>
> You would typically use Monte Carlo (or quasi MC) to price more complex products for which there is not a good analytical approximation.  (Or to deal with the fact that volatility is not constant).  So that fits very much with the needs of large banks - and perhaps some hedge funds - but I don't think a typical HFT guy would be all that interested to know about this.  They are different domains.
>
> Quant/CTA funds also have decent computational requirements, but these are not necessarily high frequency.  Winton Capital, for example, is one of the larger hedge funds in Europe by assets, but they have talked publicly about emphasizing longer-term horizons because even in liquid markets there simply is not the liquidity to turn over the volume they would need to to make an impact on their returns.  In this case, whilst execution is always important, the research side of things is where the value gets created.  And its not unusual to have quant funds where every portfolio manager also programs.  (I will not mention names).  One might think that rapid iteration here could have value.
>
> http://www.efinancialcareers.co.uk/jobs-UK-London-Senior_Data_Scientist_-_Quant_Hedge_Fund.id00654869
>
> Fwiw having spoken to a few people the past few weeks, I am struck by how hollowed-out front office has become, both within banks and hedge funds.  It's a nice business when things go well, but there is tremendous operating leverage, and if one builds up fixed costs then losing assets under management and having a poor period of performance (which is part of the game, not necessarily a sign of failure) can quickly mean that you cannot pay people (more than salaries) - which hurts morale and means you risk losing your best people.
>
> So people have responded by paring down quant/research support to producing roles, even when that makes no sense.  (Programmers are not expensive).  In that environment, D may offer attractive productivity without sacrificing performance.

I agree with most of these points.

For some reason, people often relate quant finance / high frequency trading with one of the two: either ultra-low-latency execution or option pricing, which is just wrong. In most likelihood, the execution is performed on FPGA co-located grids, so that part is out of question; and options trading is just one of so many things hedge funds do. What takes the most time and effort is the usual "data science" (which in many cases boil down to data munging), as in, managing huge amounts of raw structured/unstructured high-frequency data; extracting the valuable information and learning strategies; implementing fast/efficient backtesting frameworks, simulators etc. The need for "efficiency" here naturally comes from the fact that a typical task in the pipeline requires dozens/hundreds GB of RAM and dozens of hours of runtime on a high-grade box (so noone would really care if that GC is going to stop the world for 0.05 seconds).

In this light, as I see it, D's main advantage is a high "runtime-efficiency / time-to-deploy" ratio (whereas one of the main disadvantages for practitioners would be the lack of standard tools for working with structured multidimensional data + linalg, something like numpy or pandas).

Cheers.
December 22, 2014
On Monday, 22 December 2014 at 13:37:55 UTC, aldanor wrote:
> For some reason, people often relate quant finance / high frequency trading with one of the two: either ultra-low-latency execution or option pricing, which is just wrong. In most likelihood, the execution is performed on FPGA co-located grids, so that part is out of question; and options trading is just one of so many things hedge funds do. What takes the most time and effort is the usual "data science" (which in many cases boil down to data munging), as in, managing huge amounts of raw structured/unstructured high-frequency data; extracting the valuable information and learning strategies;


This description feels too broad. Assume that it is the "data munging" that takes the most time and effort. Included in that usually involves some transformations like (Data -> Numeric Data -> Mathematical Data Procssing -> Mathematical Solutions/Calibrations -> Math consumers (trading systems low frequency/high frequency/in general)). The quantitative "data science" is about turning data into value using numbers. The better you are at first getting to an all numbers world to start analyzing the better off you will be. But once in the all numbers world isn't it all about math, statistics, mathematical optimization, insight, iteration/mining, etc? Isn't that right now the world of R, NumPy, Matlab, etc and more recently now Julia? I don't see D attempting to tackle that at this point. If the bulk of the work for the "data sciences" piece is the maths, which I believe it is, then the attraction of D as a "data sciences" platform is muted. If the bulk of the work is preprocessing data to get to an all numbers world, then in that space D might shine.


> implementing fast/efficient backtesting frameworks, simulators etc. The need for "efficiency" here naturally comes from the fact that a typical task in the pipeline requires dozens/hundreds GB of RAM and dozens of hours of runtime on a high-grade box (so noone would really care if that GC is going to stop the world for 0.05 seconds).
>

What is a backtesting system in the context of Winton Capital? Is it primarily a mathematical backtesting system? If so it still may be better suited to platforms focusing on maths.
December 22, 2014
On Monday, 22 December 2014 at 17:28:39 UTC, Daniel Davidson wrote:
I don't see D attempting to tackle that at this point.
> If the bulk of the work for the "data sciences" piece is the maths, which I believe it is, then the attraction of D as a "data sciences" platform is muted. If the bulk of the work is preprocessing data to get to an all numbers world, then in that space D might shine.
That is one of my points exactly -- the "bulk of the work", as you put it, is quite often the data processing/preprocessing pipeline (all the way from raw data parsing, aggregation, validation and storage to data retrieval, feature extraction, and then serialization, various persistency models, etc). One thing is fitting some model on a pandas dataframe on your lap in an ipython notebook, another thing is running the whole pipeline on massive datasets in production on a daily basis, which often involves very low-level technical stuff, whether you like it or not. Coming up with cool algorithms and doing fancy maths is fun and all, but it doesn't take nearly as much effort as integrating that same thing into an existing production system (or developing one from scratch). (and again, production != execution in this context)

On Monday, 22 December 2014 at 17:28:39 UTC, Daniel Davidson wrote:
> What is a backtesting system in the context of Winton Capital? Is it primarily a mathematical backtesting system? If so it still may be better suited to platforms focusing on maths.
Disclaimer: I don't work for Winton :) Backtesting in trading is usually a very CPU-intensive (and sometimes RAM-intensive) task that can be potentially re-run millions of times to fine-tune some parameters or explore some sensitivities. Another common task is reconciling with how the actual trading system works which is a very low-level task as well.
December 22, 2014
On Monday, 22 December 2014 at 19:25:51 UTC, aldanor wrote:
> On Monday, 22 December 2014 at 17:28:39 UTC, Daniel Davidson wrote:
> I don't see D attempting to tackle that at this point.
>> If the bulk of the work for the "data sciences" piece is the maths, which I believe it is, then the attraction of D as a "data sciences" platform is muted. If the bulk of the work is preprocessing data to get to an all numbers world, then in that space D might shine.
> That is one of my points exactly -- the "bulk of the work", as you put it, is quite often the data processing/preprocessing pipeline (all the way from raw data parsing, aggregation, validation and storage to data retrieval, feature extraction, and then serialization, various persistency models, etc). One thing is fitting some model on a pandas dataframe on your lap in an ipython notebook, another thing is running the whole pipeline on massive datasets in production on a daily basis, which often involves very low-level technical stuff, whether you like it or not. Coming up with cool algorithms and doing fancy maths is fun and all, but it doesn't take nearly as much effort as integrating that same thing into an existing production system (or developing one from scratch). (and again, production != execution in this context)
>
> On Monday, 22 December 2014 at 17:28:39 UTC, Daniel Davidson wrote:
>> What is a backtesting system in the context of Winton Capital? Is it primarily a mathematical backtesting system? If so it still may be better suited to platforms focusing on maths.
> Disclaimer: I don't work for Winton :) Backtesting in trading is usually a very CPU-intensive (and sometimes RAM-intensive) task that can be potentially re-run millions of times to fine-tune some parameters or explore some sensitivities. Another common task is reconciling with how the actual trading system works which is a very low-level task as well.


From what I have learned in Skills Matter presentations, for that type of use cases, D has to fight against Scala/F# code running in Hadoop/Spark/Azure clusters, backed up by big data databases.

--
Paulo
December 22, 2014
On Monday, 22 December 2014 at 19:25:51 UTC, aldanor wrote:
> On Monday, 22 December 2014 at 17:28:39 UTC, Daniel Davidson wrote:
> I don't see D attempting to tackle that at this point.
>> If the bulk of the work for the "data sciences" piece is the maths, which I believe it is, then the attraction of D as a "data sciences" platform is muted. If the bulk of the work is preprocessing data to get to an all numbers world, then in that space D might shine.
> That is one of my points exactly -- the "bulk of the work", as you put it, is quite often the data processing/preprocessing pipeline (all the way from raw data parsing, aggregation, validation and storage to data retrieval, feature extraction, and then serialization, various persistency models, etc).

I don't know about low frequency which is why I asked about Winton. Some of this is true in HFT but it is tough to break that pipeline that exists in C++. Take live trading vs backtesting: you require all that data processing before getting to the math of it to be as low latency as possible for live trading which is why you use C++ in the first place. To break into that pipeline with another language like D to add value, say for backtesting, is risky not just because the duplication of development cost but also the risk of live not matching backtesting.

Maybe you have some ideas in mind where D would help that data processing pipeline, so some specifics might help?
December 23, 2014
Hi.

Sorry if this is a bit long, but perhaps it may be interesting to one or two.

On Monday, 22 December 2014 at 22:00:36 UTC, Daniel Davidson wrote:
> On Monday, 22 December 2014 at 19:25:51 UTC, aldanor wrote:
>> On Monday, 22 December 2014 at 17:28:39 UTC, Daniel Davidson wrote:
>> I don't see D attempting to tackle that at this point.
>>> If the bulk of the work for the "data sciences" piece is the maths, which I believe it is, then the attraction of D as a "data sciences" platform is muted. If the bulk of the work is preprocessing data to get to an all numbers world, then in that space D might shine.
>> That is one of my points exactly -- the "bulk of the work", as you put it, is quite often the data processing/preprocessing pipeline (all the way from raw data parsing, aggregation, validation and storage to data retrieval, feature extraction, and then serialization, various persistency models, etc).
>
> I don't know about low frequency which is why I asked about Winton. Some of this is true in HFT but it is tough to break that pipeline that exists in C++. Take live trading vs backtesting: you require all that data processing before getting to the math of it to be as low latency as possible for live trading which is why you use C++ in the first place. To break into that pipeline with another language like D to add value, say for backtesting, is risky not just because the duplication of development cost but also the risk of live not matching backtesting.
>
> Maybe you have some ideas in mind where D would help that data processing pipeline, so some specifics might help?

I have been working as a PM for quantish buy side places since 98, after starting in a quant trading role on sell side in 96, with my first research summer job in 93.  Over time I have become less quant and more discretionary, so I am less in touch with the techniques the cool kids are using when it doesn't relate to what I do.  But more generally there is a kind of silo mentality where in a big firm people in different groups don't know much about what the guy sitting at the next bank of desks might be doing, and even within groups the free flow of ideas might be a lot less than you might think
  Against that, firms with a pure research orientation may be a touch different, which just goes hex again to say that from the outside it may be difficult to make useful generalisations.

A friend of mine who wrote certain parts of the networking stack in linux is interviewing with HFT firms now, so I may have a better idea about whether D might be of interest.  He has heard of D but suggests Java instead.  (As a general option, not for HFT).  Even smart people can fail to appreciate beauty ;)

I think its public that GS use a python like language internally, JPM do use python for what you would expect, and so do AHL (one of the largest lower freq quant firms).  More generally, in every field, but especially in finance, it seems like the data processing aspect is going to be key - not just a necessary evil.  Yes, once you have it up and running you can tick it off, but it is going to be some years before you start to tick off items faster than they appear.  Look at what Bridgewater are doing with gauging real time economic activity (and look at Google Flu prediction if one starts to get too giddy - it worked and then didn't).

There is a spectrum of different qualities of data.   What is most objective is not necessarily what is most interesting.  Yet work on affect, media, and sentiment analysis is in its very early stages.  One can do much better than just affect bad, buy stocks once they stop going down...  Someone that asked me to help with something are close to Twitter, and I have heard the number of firms and rough breakdown by sector taking their full feed.  It is shockingly small in the financial services field, and that's probably in part just that it takes people time to figure out something new.

Ravenpack do interesting work from the point of view of a practitioner, and I heard a talk by their former technical architect, and he really seemed to know his stuff.  Not sure what they use as a platform.

I can't see why the choice of language will affect your back testing results (except that it is painful to write good algorithms in a klunky language and risk of bugs higher - but that isn't what you meant).

Anyway, back to D and finance.  I think this mental image people have of back testing as being the originating driver of research may be mistaken.  Its funny but sometimes it seems the moment you take a scientist out of his lab and put him on a trading floor he wants to know if such and such beats transaction costs.  But what you are trying to do is understand certain dynamics, and one needs to understand that markets are non linear and have highly unstable parameters.  So one must be careful about just jumping to a back test.  (And then of course, questions of risk management and transaction costs really matter also).

To a certain extent one must recognise that the asset management business has a funny nature. (This does not apply to many HFT firms that manage partners money),   It doesn't take an army to make a lot of money with good people because of the intrinsic intellectual leverage of the business.  But to do that one needs capital, and investors expect to see something tangible for the fees if you are managing size.  Warren Buffett gets away with having a tiny organisation because he is Buffett, but that may be harder for a quant firm.  So since intelligent enough people are cheap, and investors want you to hire people, it can be tempting to hire that army after all and set them to work on projects that certainly cover their costs but really may not be big determinants of variations in investment outcomes.  Ie one shouldn't mistake the number of projects for what is truly important.

I agree that it is setting up and keeping everything in production running smoothly that creates a challenge.  So it's not just a question of doing a few studies in R.  And the more ways of looking at the world, the harder  you have to think about how to combine them.  Spreadsheets don't cut the mustard anymore - they haven't for years, yet it emerged even recently with the JPM whale that lack of integrity in the spreadsheet worsened communication problems between departments (risk especially).   Maybe pypy and numpy will pick up all of  slack, but I am not so sure.

In spreadsheet world (where one is a user, not a pro), one never finishes and says finally I am done building sheets. One question leads to another in the face of an unfolding and generative reality.  It's the same with quant tools for trading.  Perhaps that means value to tooling suited to rapid iteration and building of robust code that won't need later to be totally rewritten from scratch later.

At one very big US hf I worked with, the tools were initially written in Perl (some years back).  They weren't pretty, but they worked, and were fast and robust enough.  I has many new features I needed for my trading strategy.  But the owner - who liked to read about ideas on the internet - came to the conclusion that Perl was not institutional quality and that we should therefore cease new development and rewrite everything in C++.  Two years later a new guy took over the larger group, and one way or the other everyone left.  I never got my new tools, and that certainly didn't help on the investment front.  After he left a year after that they scrapped the entire code base and bought Murex as nobody could understand what they had.

If we had had D then, its possible the outcome might have been different.

So in any case, hard to generalise, and better to pick a few sympathetic people that see in D a possible solution to their pain, and use patterns will emerge organically out of that.  I am happy to help where I can, and that is somewhat my own perspective - maybe D can help me solve my pain of tools not up to scratch because good investment tool design requires investment and technology skills to be combined in one person whereas each of these two are rare found on their own.  (D makes a vast project closer to brave than foolhardy),

It would certainly be nice to have matrices, but I also don't think it would be right to say D is dead in water here because it is so far behind.  It also seems like the cost of writing such a library is v small vs possible benefit.

One final thought.  It's very hard to hire good young people.  We had 1500 cvs for one job with very impressive backgrounds - French grande ecoles, and the like.  But ask a chap how he would sort a list of books without a library, and results were shocking,  seems like looking amongst D programmers is a nice heuristic, although perhaps the pool is too small for now.  Not hiring now, but was thinking about for future.