Thread overview
DasBetterR
Jun 29, 2023
bachmeier
Jun 30, 2023
zjh
Jun 30, 2023
Guillaume Piolat
Jun 30, 2023
jmh530
Jun 30, 2023
bachmeier
Jun 30, 2023
jmh530
Jul 07, 2023
bachmeier
Jul 07, 2023
jmh530
June 29, 2023

I've been using D and R together for a decade. I wrote a blog post for the D Blog on the eve of the pandemic. I released the embedrv2 library in late 2021. It's useful for writing D functions that are called from R, using D's metaprogramming to write the necessary bindings for you.

My programs usually take the opposite approach, where D is the primary language, and I call into R to fill in missing functionality. I've accumulated a large collection of code snippets to enable all kinds of things. The problem is that they were scattered across many projects, there was no consistency across programs, documentation didn't exist, and they were more or less useless to anyone other than me.

This Github repo includes D modules, tests demonstrating most of the functionality, documentation, and some posts about how I do specific things. I'm sharing publicly all the things I've been doing in case it has value to anyone else.

Examples of functionality:

  • Creating, accessing, and mutating R data structures, including vector, matrix, data frame, list, array, and time series types. Reference counting handles memory management.
  • Basic statistical functionality like calculating the mean. Many of these functions use Mir for efficiency.
  • Linear algebra
  • Random number generation and sampling
  • Parallel random number generation
  • Numerical optimization: direct access to the C libraries used by R's optim function
  • Quadratic programming
  • Passing D functions to R without creating a shared library. For example, you can use a D function as the objective function you pass to constrOptim for constrained optimization problems.

Project website

There's more detail on the website, but I used the name "Better R" because the entirety of R is available inside your D program and you can use D to improve on it as much as you'd like. Feel free to hate the name.

I was originally going to include all of this as part of embedrv2, but realized there was almost no overlap between the two use cases. Moreover, it would be strange to call R from D and call D functions from R in the same program. It simplifies things to keep them in different projects.

If you try it and have problems, you can create a discussion. You can also post in this forum, but I won't guarantee I'll see it.

June 30, 2023

On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:

>

I've been using D and R together for a decade. I wrote a blog post for the D Blog on the eve of the pandemic. I released the embedrv2 library in late 2021. It's useful for writing D functions that are called from R, using D's metaprogramming to write the necessary bindings for you.

Nice.

June 29, 2023

On 6/29/23 7:51 PM, bachmeier wrote:

>

I've been using D and R together for a decade. I wrote a blog post for the D Blog on the eve of the pandemic. I released the embedrv2 library in late 2021. It's useful for writing D functions that are called from R, using D's metaprogramming to write the necessary bindings for you.

My programs usually take the opposite approach, where D is the primary language, and I call into R to fill in missing functionality. I've accumulated a large collection of code snippets to enable all kinds of things. The problem is that they were scattered across many projects, there was no consistency across programs, documentation didn't exist, and they were more or less useless to anyone other than me.

This Github repo includes D modules, tests demonstrating most of the functionality, documentation, and some posts about how I do specific things. I'm sharing publicly all the things I've been doing in case it has value to anyone else.

Examples of functionality:

  • Creating, accessing, and mutating R data structures, including vector, matrix, data frame, list, array, and time series types. Reference counting handles memory management.
  • Basic statistical functionality like calculating the mean. Many of these functions use Mir for efficiency.
  • Linear algebra
  • Random number generation and sampling
  • Parallel random number generation
  • Numerical optimization: direct access to the C libraries used by R's optim function
  • Quadratic programming
  • Passing D functions to R without creating a shared library. For example, you can use a D function as the objective function you pass to constrOptim for constrained optimization problems.

Project website

This is very cool! I've never used R, but I have wanted to learn more about such languages.

>

There's more detail on the website, but I used the name "Better R" because the entirety of R is available inside your D program and you can use D to improve on it as much as you'd like. Feel free to hate the name.

Awfull, awfull name...

-Steve

June 30, 2023

On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:

>

If you try it and have problems, you can create a discussion. You can also post in this forum, but I won't guarantee I'll see it.

Super cool, congrats!

June 30, 2023

On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:

>

[snip]

Glad you're continuing to do work on this front. There's a lot of great material explaining things, which is always good.

It would be cool to have another version of the link below for using a mir Slice with R.
https://bachmeil.github.io/betterr/setvar.html

June 30, 2023

On Friday, 30 June 2023 at 16:14:48 UTC, jmh530 wrote:

>

On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:

>

[snip]

Glad you're continuing to do work on this front. There's a lot of great material explaining things, which is always good.

It would be cool to have another version of the link below for using a mir Slice with R.
https://bachmeil.github.io/betterr/setvar.html

I assume you mean that you've allocated memory on the D side, like this:

auto a = new double[24];
a[] = 1.6;
Slice!(double*, 1) s = a.sliced();

and you want to pass s to R for further analysis. Unfortunately, that will not work. R functions only work with memory R has allocated. It has a single struct type, so there's no way to pass s in this example to R.

The best you can do right now is something like this:

auto a = Vector(24);
Slice!(double*,1) s = a.ptr[0..24].sliced();
// Manipulate s
// Send a as an argument to R functions

In other words, you let R allocate a, and then you work with the underlying data array as a slice.

A way around this limitation would be to implement the same struct (SEXPREC) in D, while avoiding issues with R's garbage collector. That's a more involved problem than I've been willing to take on. If someone has the interest, the SEXPREC struct is defined here: https://github.com/wch/r-source/blob/060f8b64a3a8e489d8684c18b269eea63f182e73/src/include/Defn.h#L184 and the internals are documented here: https://cran.r-project.org/doc/manuals/r-release/R-ints.html#SEXPs

As much fun as it is to figure these things out, I have never had sufficient time or motivation to do so.

June 30, 2023

On Friday, 30 June 2023 at 18:47:06 UTC, bachmeier wrote:

>

[snip]

I assume you mean that you've allocated memory on the D side, like this:

auto a = new double[24];
a[] = 1.6;
Slice!(double*, 1) s = a.sliced();

and you want to pass s to R for further analysis. Unfortunately, that will not work. R functions only work with memory R has allocated. It has a single struct type, so there's no way to pass s in this example to R.

Unfortunate, but understood. Looking at the implementation for Vector, the implementation of the constructor and opAssign look like it has to copy the data over anyway.

>

[snip]

As much fun as it is to figure these things out, I have never had sufficient time or motivation to do so.

Yeah, that seems like it would be a bit hairy to figure out.

July 07, 2023

On Friday, 30 June 2023 at 16:14:48 UTC, jmh530 wrote:

>

On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:

>

[snip]

Glad you're continuing to do work on this front. There's a lot of great material explaining things, which is always good.

It would be cool to have another version of the link below for using a mir Slice with R.
https://bachmeil.github.io/betterr/setvar.html

I was wrong. They added custom allocators a while back, but didn't tell anyone.

Actually, what I said before is technically correct. The SEXP struct itself still has to be allocated by R and managed by the R garbage collector. It's just that you can use a custom allocator to send a pointer to the data you've allocated, and once R is done with the data, it'll call the function you've provide to free the memory before destroying the SEXP struct that wraps it.

I uploaded an example here.

It's still a bit hackish because you need to adjust the pointer for a header R inserts when it allocates arrays. Adjusting by 10*double.sizeof works in this example, but "my test didn't segfault" doesn't exactly inspire confidence. Once I am comfortable with this solution, I'll do a new release of betterr.

This'll be kind of a big deal if it works. For instance, if you want to use a database interface and D doesn't have one, you can use R's interface to that database without having R manage your project's memory. You could use any of the available R interfaces (databases, machine learning libraries, Qt, etc.)

July 07, 2023

On Friday, 7 July 2023 at 20:33:08 UTC, bachmeier wrote:

>

[snip]

I was wrong. They added custom allocators a while back, but didn't tell anyone.

Actually, what I said before is technically correct. The SEXP struct itself still has to be allocated by R and managed by the R garbage collector. It's just that you can use a custom allocator to send a pointer to the data you've allocated, and once R is done with the data, it'll call the function you've provide to free the memory before destroying the SEXP struct that wraps it.

I uploaded an example here.

It's still a bit hackish because you need to adjust the pointer for a header R inserts when it allocates arrays. Adjusting by 10*double.sizeof works in this example, but "my test didn't segfault" doesn't exactly inspire confidence. Once I am comfortable with this solution, I'll do a new release of betterr.

This'll be kind of a big deal if it works. For instance, if you want to use a database interface and D doesn't have one, you can use R's interface to that database without having R manage your project's memory. You could use any of the available R interfaces (databases, machine learning libraries, Qt, etc.)

Cool.

The main thing I want to try is rstan. They have an interface called cmdstan that you can call from the command line that would be possible to use with D. The problem is that you have to write the data to a CSV file and then read it. So it would be kind of slow and I never got around to playing around with it in D. With your tool as it is, I would just have to copy the data in memory, which I would expect not to be as bad of an overhead as IO (but again haven't gotten around to do anything with it).