Thread overview
Yahoo Finance Scraper
Jun 12, 2020
Selim
Jun 12, 2020
jmh530
Jun 14, 2020
Selim
Jun 12, 2020
Jan Hönig
Jun 14, 2020
Selim
June 12, 2020
I wrote a small Yahoo finance scraper and wanted to share with the community. I have been using D for a while and I think contributing something to the community is good. There is an example main script and a unit test. Those should get you going. It currently saves the scraped data as a json file under the executable's folder. I might add a public method to access individual data columns inside json in the following days too.

All mistakes are my own and I appreciate any feedback.

https://github.com/SelimOzel/YahooMinerD

Best,
Selim
June 12, 2020
On Friday, 12 June 2020 at 18:22:28 UTC, Selim wrote:
> I wrote a small Yahoo finance scraper and wanted to share with the community. I have been using D for a while and I think contributing something to the community is good. There is an example main script and a unit test. Those should get you going. It currently saves the scraped data as a json file under the executable's folder. I might add a public method to access individual data columns inside json in the following days too.
>
> All mistakes are my own and I appreciate any feedback.
>
> https://github.com/SelimOzel/YahooMinerD
>
> Best,
> Selim

Thanks! There was a period there where you couldn't use the yahoo API, glad to see that people can use it again.

I haven't run it myself yet, but I have a few comments for potential changes and enhancements.

Why do you use a class for YahooMinerD (also the name YahooFinanceD might get more people to use)? It doesn't look like you are using any inheritance. I don't see any reason not to change to a struct and avoid new.

It looks like you have a lot of writeln statements. While these could be helpful, they will also prevent those functions from ever being @nogc. You could use an approach like below and give the user the opportunity to avoid the writelns and allow for attribute inference elsewhere.

@nogc void fooImpl(bool val)()
    if (val)
{

}

void fooImpl(bool val)()
    if (!val)
{
    import std.stdio: writeln;
    writeln("here");
}

void foo(bool val = false)()
{
    fooImpl!val;
}

@nogc void main() {
    foo!true;
}

It looks like the primary way to get the data is from the WriteToJson, correct? What if you want to use the data without writing the JSON to file? For instance, I want to get the data and put it in my own database, or I just want to get the data, do some calculations, and then not save it. There should be a way to get the data out of there without writing to file. std.json can be used for parsing the JSON and there are other libraries out there.

The WriteToJSON method should also allow writing events or prices without needing to write both.

You might consider adding the ability to control the frequency (instead of just daily).
June 12, 2020
On Friday, 12 June 2020 at 18:22:28 UTC, Selim wrote:
> I wrote a small Yahoo finance scraper and wanted to share with the community. I have been using D for a while and I think contributing something to the community is good. There is an example main script and a unit test. Those should get you going. It currently saves the scraped data as a json file under the executable's folder. I might add a public method to access individual data columns inside json in the following days too.
>
> All mistakes are my own and I appreciate any feedback.
>
> https://github.com/SelimOzel/YahooMinerD
>
> Best,
> Selim

This could be a really cool tool to play with.

For the writing out part, maybe the class should accept a function or a delegate, or some template hook, to write out the data, so the user can define it itself. Your WriteToJson could then be an example for that.
June 14, 2020
On Friday, 12 June 2020 at 19:10:06 UTC, jmh530 wrote:

> Why do you use a class for YahooMinerD (also the name YahooFinanceD might get more people to use)? It doesn't look like you are using any inheritance. I don't see any reason not to change to a struct and avoid new.

Thanks! I liked the new name and updated it. Yeah, I realized that about classes and converted to struct. I didn't like the new there neither.


> It looks like you have a lot of writeln statements. While these could be helpful, they will also prevent those functions from ever being @nogc. You could use an approach like below and give the user the opportunity to avoid the writelns and allow for attribute inference elsewhere.

I implemented a framework quite similar to the one you described. I actually didn't realize how robust templates were in D until your comment. Logging stuff is now optional. My application code and test code have examples. I don't think MineImpl function can be @nogc at this point but please correct me if I'm wrong.


> It looks like the primary way to get the data is from the WriteToJson, correct? What if you want to use the data without writing the JSON to file? For instance, I want to get the data and put it in my own database, or I just want to get the data, do some calculations, and then not save it. There should be a way to get the data out of there without writing to file. std.json can be used for parsing the JSON and there are other libraries out there.

That's true. I added a data frame struct to save all corporate actions and prices. It can be accessed from the script. Should be quite easy to convert that into a csv or something else too.


> The WriteToJSON method should also allow writing events or prices without needing to write both. You might consider adding the ability to control the frequency (instead of just daily).

Added both of them!

S
June 14, 2020
On Friday, 12 June 2020 at 20:29:57 UTC, Jan Hönig wrote:
> This could be a really cool tool to play with. For the writing out part, maybe the class should accept a function or a delegate, or some template hook, to write out the data, so the user can define it itself. Your WriteToJson could then be an example for that.

Thanks!! Let me know if you find any bugs in case you play with it. I wrote template classes for the write operation. One for writing to a data frame and another one for json. I think it should be relatively easy to bind that write function with mysql at this point.

S