November 05, 2020
On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer wrote:
> [snip]
>
> 2. Next I would use memory mapped i/o for storage. Usually memory mapped files are only accessible by one thread for security but I believe that this can be changed. For security you could use cryptographic keys to access the files between threads. So that memory written in one language can be access by another.
>
> [snip]

One thread only? Sounds like GIL...
November 05, 2020
On Thursday, 5 November 2020 at 20:30:03 UTC, jmh530 wrote:
> On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer wrote:
>> [snip]
>>
>> 2. Next I would use memory mapped i/o for storage. Usually memory mapped files are only accessible by one thread for security but I believe that this can be changed. For security you could use cryptographic keys to access the files between threads. So that memory written in one language can be access by another.
>>
>> [snip]
>
> One thread only? Sounds like GIL...

Not necessarily. The cryptographic keys are used to access the file not to lock it, I believe mmap files can be secured with a password, which should be generated cryptograhically as an alternative to manually entered and stored somewhere. It protects the file from unsanctioned access. Even though the file itself will probably only take a single password rather than some synchronized rotating mechanism. However it is done, the memory will need to be protected.

There should be no reason why multiple processes could not read from a file. Only writing would require a lock from other processes for obvious reasons.

As I said, I haven't even begun to properly plan an implementation yet, just something that I think about from time to time.


November 05, 2020
On Thursday, 5 November 2020 at 19:39:43 UTC, jmh530 wrote:
> On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:
>> [snip]
>>
>> The question for me is if you can work with the same data structures in D, R, Python, and Julia. Can your main program be written in D, but calling out to all three for loading, transforming, and analyzing the data? I'm guessing not, but would be awesome if you could do it.
>
> Yeah, that would be pretty nice. However, I would emphasize what aberba has been saying across several different threads, which is the importance of documentation and tutorials. It's nice to have the ability to do it, but if you don't make it clear for the typical user of R/Python/Julia to figure it out, then the reach will be limited.

Definitely, but you need to have the functionality first. On the homepage for embedr, I have examples showing most of the functionality: https://embedr.netlify.app/ I started writing up lecture notes but then the pandemic sent my workload through the roof.
November 05, 2020
On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer wrote:

> 1. If I had to do this, I would first decide on a collection of common data structures to share starting with *compositions* of R/Python/Julia style multi-dimensional arrays - contiguous arrays with basic element types with a dimensional information in form of another array. So a 2x3 double matrix is a double array of length 6 with another long array containing [2, 3]. R has externalptr, Julia can interface with pointers, as can Python.

R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R. It assumes it can do anything it wants with that data. Unless they've changed something (which is possible since I haven't looked into it in years) you'd have to copy any data you send to an R function. But if you're calling R maybe you don't care about that.

November 05, 2020
On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:
> On Thursday, 5 November 2020 at 20:22:45 UTC, data pulverizer wrote:
>
>> 1. If I had to do this, I would first decide on a collection of common data structures to share starting with *compositions* of R/Python/Julia style multi-dimensional arrays - contiguous arrays with basic element types with a dimensional information in form of another array. So a 2x3 double matrix is a double array of length 6 with another long array containing [2, 3]. R has externalptr, Julia can interface with pointers, as can Python.
>
> R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R.

Yes but you make C calls in R on the pointed object. Given the choice that's how I would write any application in R. The only purpose R would serve is as an interface to the underlying dlls. I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.


November 05, 2020
On Thursday, 5 November 2020 at 21:57:46 UTC, bachmeier wrote:
>
> Definitely, but you need to have the functionality first. On the homepage for embedr, I have examples showing most of the functionality: https://embedr.netlify.app/ I started writing up lecture notes but then the pandemic sent my workload through the roof.

Looks cool.
November 05, 2020
On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer wrote:
> On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:
>> R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R.
>
> Yes but you make C calls in R on the pointed object. Given the choice that's how I would write any application in R. The only purpose R would serve is as an interface to the underlying dlls. I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.

p.s. I'm not saying that data shouldn't be accessible or returned in R, I'm just saying that externalptr is there for other pointed objects that R might need to interface with. I hope that's clear - avoiding writing production code in R is just my professional advice.
November 12, 2020
On Thursday, 5 November 2020 at 22:46:21 UTC, data pulverizer wrote:
> On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer wrote:
>> On Thursday, 5 November 2020 at 22:02:25 UTC, bachmeier wrote:
>>> R has externalptr, but to my knowledge, that's only for transporting around C objects. I don't know of any way to call R API functions with data not allocated by R.
>>
>> Yes but you make C calls in R on the pointed object. Given the choice that's how I would write any application in R. The only purpose R would serve is as an interface to the underlying dlls. I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.
>
> p.s. I'm not saying that data shouldn't be accessible or returned in R, I'm just saying that externalptr is there for other pointed objects that R might need to interface with. I hope that's clear - avoiding writing production code in R is just my professional advice.

It really depends (which was one of the points of my earlier post about how broad this field is). For someone doing academic research or statistical analysis for, say, marketing purposes, the interactive code they write is the production code. They're not going to write two versions of their code. I know for web applications or finance or some other areas where the distinction matters.

But as far as telling people "don't write code in R", that's simply a non-starter, and there's no reason to even begin a project like this if you're going to tell people to avoid existing libraries in either R or Python. They'll just shrug when you start talking about performance because for the vast majority of what they're doing it's not an issue.
November 12, 2020
On Thursday, 12 November 2020 at 19:09:48 UTC, bachmeier wrote:

> I know for web applications or finance or some other areas where the distinction matters.

This should be

> I know for web applications or finance or some other areas performance matters enough that they'll distinguish between interactive and production code, and even write two versions.
November 13, 2020
On Thursday, 12 November 2020 at 19:09:48 UTC, bachmeier wrote:
> On Thursday, 5 November 2020 at 22:46:21 UTC, data pulverizer wrote:
>> On Thursday, 5 November 2020 at 22:17:12 UTC, data pulverizer wrote:
>>> ... I have many years of writing code in R and from my experience, apart from minor instances I would try to avoid writing production libraries or code in it.
>>
>> ... avoiding writing production code in R is just my professional advice.
>
> It really depends (which was one of the points of my earlier post about how broad this field is). For someone doing academic research or statistical analysis for, say, marketing purposes, the interactive code they write is the production code. They're not going to write two versions of their code. I know for web applications or finance or some other areas where the distinction matters.
>
> But as far as telling people "don't write code in R", that's simply a non-starter, and there's no reason to even begin a project like this if you're going to tell people to avoid existing libraries in either R or Python. They'll just shrug when you start talking about performance because for the vast majority of what they're doing it's not an issue.

You act as if I'm banning people from writing code in R - I certainly don't have the power to do that. And yes, it varies from situation to situation, as I clearly eluded to.

I've done a lot of projects in R. I'm well aware that sometimes it is unavoidable for the client. What I am saying is given the choice, you should probably choose a different tool apart from "some minor instances". I've seen R go spectacularly wrong because of the type of language it is, it makes assumptions of that the programmer means which can cause epic bugs, and very often, it does it silently and it happens all the time. You can never be sure that *any* piece of R code will work as it should. It's just the nature of the language. People write it because it's easy and has "boilerplate", which is fine if you are proof of concepting or doing research and some other things, but you use it in mission critical production apps and it may well blow up in your face, and you might not even know. And that's before we get to performance, and other things blah, blah, blah.