June 12

On Wednesday, 12 June 2024 at 17:00:14 UTC, Vinod K Chandran wrote:

>

On Wednesday, 12 June 2024 at 10:16:26 UTC, Sergey wrote:

>

Btw are you going to use PyD or doing everything manually from scratch?

Does PyD active now ? I didn't tested it. My approach is using "ctypes" library with my dll. Ctypes is the fastes FFI in my experience. I tested Cython, Pybind11 and CFFI. But None can beat the speed of ctypes. Currently the fastest experiments were the dlls created in Odin & C3. Both are non-GC languages.

It is probably not that well maintained, but it definitely works with python 3.10 and maybe even 3.11, i use it to interface with pytorch and numpy and PIL, but my use case is pretty simple, i just write some wrapper python functions to run inference and pass images back and forth using embedded py_stmts. the only problem is that it seems to leak a lot PydObjects so i have to manually free them, even scope doesn't helps with that which is sad.

example classifier python

def inference(image: Image):
    """ Predicts the image class and returns confidences for every class
    To get the class one can use the following code
    > conf = inference(image)
    > index = conf.argmax()
    > cls = classes[index]
    """

    # this detector doesn't works with more than 3 channels
    ch = len(image.getbands())
    has_transparency = image.info.get('transparency', None) is not None
    if ch > 3 or has_transparency:
        image = image.convert("RGB")

    image_tensor = prep_transform(image).float()
    image_tensor = image_tensor.unsqueeze_(0)

    # it is fast enough to run on CPU
    #if torch.cuda.is_available():
    #    image_tensor.cuda()

    with torch.inference_mode():
        # NOTE: read the comment on model
        output = model(image_tensor)
    index = output.data.numpy()

    return index

and some of D functions

ImageData aiGoesBrrrr(string path, int strength = 50) {
    try {
        if (!pymod)
            py_stmts("import sys; sys.path.append('modules/xyz')");
        initOnce!pymod(py_import("xyz.inference"));
        if (!pymod.hasattr("model"))
            pymod.model = pymod.method("load_model", "modules/xyz/pre_trained/weights.pth");

        PydObject ipath = py(path);
        scope(exit) destroy(ipath);

        auto context = new InterpContext();
        context.path = ipath;

        context.py_stmts("
        from PIL import Image
        image = Image.open(path)
        ch = len(image.getbands())
        if ch > 3:
            image = image.convert('RGB')
        ");

        // signature: def run(model, imagepath, alpha=45) -> numpy.Array
        PydObject output = pymod.method("run", pymod.model, context.image, 100-strength);
        context.output = output;
        scope(exit) destroy(output);

        PydObject shape = output.getattr("shape");
        scope(exit) destroy(shape);

        // int n = ...;
        int c = shape[2].to_d!int;
        int w = shape[1].to_d!int;
        int h = shape[0].to_d!int;

        // numpy array
        void* raw_ptr = output.buffer_view().item_ptr([0,0,0]);

        ubyte* d_ptr = cast(ubyte*) raw_ptr;
        ubyte[] d_img = d_ptr[0..h*w*c];

        return ImageData(d_img.dup, h ,w ,c);
    } catch (PythonException e) {
        // oh no...
        auto context = new InterpContext();
        context.trace = new PydObject(e.traceback);
        context.py_stmts("from traceback import format_tb; trace = format_tb(trace)");
        printerr(e.py_message, "\n", context.trace.to_d!string);
    }
    return ImageData.init;
June 12

On Wednesday, 12 June 2024 at 18:58:49 UTC, evilrat wrote:

>

On Wednesday, 12 June 2024 at 17:00:14 UTC, Vinod K Chandran wrote:

>

[...]

It is probably not that well maintained, but it definitely works with python 3.10 and maybe even 3.11, i use it to interface with pytorch and numpy and PIL, but my use case is pretty simple, i just write some wrapper python functions to run inference and pass images back and forth using embedded py_stmts. the only problem is that it seems to leak a lot PydObjects so i have to manually free them, even scope doesn't helps with that which is sad.

[...]

You can use libonnx via importc to do inference of pytorch models after converting them *.onnx. in this way you won't need python at all. Please refer to the etichetta. instead of PIL for preprocessing just use DCV.

https://github.com/trikko/etichetta

June 12

On Wednesday, 12 June 2024 at 18:58:49 UTC, evilrat wrote:

>

the only problem is that it seems to leak a lot PydObjects so i have to manually free them, even scope doesn't helps with that which is sad.

Oh I see. I did some experiments with nimpy and pybind11. Both experiments were resulted in slower than ctypes dll calling method. That's why I didn't take much interest in binding with Python C API. Even Cython is slower compare to ctypes. But it can be used when we call the dll in Cython and call the cython code from python. But then you will have to face some other obstacles. In my case, callback functions are the reason. When using a dll in cython, you need to pass a cython function as callback and inside that func, you need to convert everything into pyobject back and forth. That will take time. Imagine that you want to do some heavy lifting in a mouse move event ? No one will be happy with at snail's pace.
But yeah, Cython is a nice language and we can create an entire gui lib in Cython but the execution speed is 2.5X slower than my current c3 dll.

June 12

On Wednesday, 12 June 2024 at 18:57:41 UTC, bachmeier wrote:

>

Try foo[10] = 1.5 and foo.ptr[10] = 1.5. The first correctly throws an out of bounds error. The second gives Segmentation fault (core dumped).

We can use it like this, i think.

struct Foo {
  double * ptr;
  uint capacity;
  uint legnth;
  alias data this;

}

And then we use an index, we can perform a bound check.
I am not sure but I hope this will work.

June 12
On 12.06.2024 21:57, bachmeier wrote:
> On Wednesday, 12 June 2024 at 18:36:26 UTC, Vinod K Chandran wrote:
>> On Wednesday, 12 June 2024 at 15:33:39 UTC, bachmeier wrote:
>>> A SafeRefCounted example with main marked @nogc:
>>>
>>> ```
>>> import std;
>>> import core.stdc.stdlib;
>>>
>>> struct Foo {
>>>   double[] data;
>>>   double * ptr;
>>>   alias data this;
>>>
>>>   @nogc this(int n) {
>>>     ptr = cast(double*) malloc(n*double.sizeof);
>>>     data = ptr[0..n];
>>>     printf("Data has been allocated\n");
>>>   }
>>>  }
>>>
>>> ```
>>>
>> Why not just use `ptr` ? Why did you `data` with `ptr` ?
> 
> Try `foo[10] = 1.5` and `foo.ptr[10] = 1.5`. The first correctly throws an out of bounds error. The second gives `Segmentation fault (core dumped)`.

I think you can use data only because data contains data.ptr
June 12

On Wednesday, 12 June 2024 at 20:31:34 UTC, Vinod K Chandran wrote:

>

On Wednesday, 12 June 2024 at 18:57:41 UTC, bachmeier wrote:

>

Try foo[10] = 1.5 and foo.ptr[10] = 1.5. The first correctly throws an out of bounds error. The second gives Segmentation fault (core dumped).

We can use it like this, i think.

struct Foo {
  double * ptr;
  uint capacity;
  uint legnth;
  alias data this;

}

And then we use an index, we can perform a bound check.
I am not sure but I hope this will work.

Yes, you can do that, but then you're replicating what you get for free by taking a slice. You'd have to write your own opIndex, opSlice, etc., and I don't think there's any performance benefit from doing so.

June 12

On Wednesday, 12 June 2024 at 20:37:36 UTC, drug007 wrote:

>

On 12.06.2024 21:57, bachmeier wrote:

>

On Wednesday, 12 June 2024 at 18:36:26 UTC, Vinod K Chandran wrote:

>

On Wednesday, 12 June 2024 at 15:33:39 UTC, bachmeier wrote:

>

A SafeRefCounted example with main marked @nogc:

import std;
import core.stdc.stdlib;

struct Foo {
  double[] data;
  double * ptr;
  alias data this;

  @nogc this(int n) {
    ptr = cast(double*) malloc(n*double.sizeof);
    data = ptr[0..n];
    printf("Data has been allocated\n");
  }
 }

Why not just use ptr ? Why did you data with ptr ?

Try foo[10] = 1.5 and foo.ptr[10] = 1.5. The first correctly throws an out of bounds error. The second gives Segmentation fault (core dumped).

I think you can use data only because data contains data.ptr

Yes, but you get all the benefits of double[] for free if you do it that way, including the more concise foo[10] syntax.

June 13
bachmeier kirjoitti 12.6.2024 klo 18.21:
> You're splitting things into GC-allocated memory and manually managed memory. There's also SafeRefCounted, which handles the malloc and free for you.

I suspect `SafeRefCounted` (or `RefCounted`) is not the best fit for this scenario. The problem with it is it `malloc`s and `free`s individual objects, which doesn't sound efficient to me.

Maybe it performs if the objects in question are big enough, or if they can be bundled to static arrays so there's no need to refcount individual objects. But even then, you can't just allocate and free dozens or hundreds of megabytes with one call, unlike with the GC or manual `malloc`/`free`. I honestly don't know if calling malloc/free for, say each 64KiB, would have performance implications over a single allocation.
June 13
On 12.06.2024 23:56, bachmeier wrote:
> On Wednesday, 12 June 2024 at 20:37:36 UTC, drug007 wrote:
>> On 12.06.2024 21:57, bachmeier wrote:
>>> On Wednesday, 12 June 2024 at 18:36:26 UTC, Vinod K Chandran wrote:
>>>> On Wednesday, 12 June 2024 at 15:33:39 UTC, bachmeier wrote:
>>>>> A SafeRefCounted example with main marked @nogc:
>>>>>
>>>>> ```
>>>>> import std;
>>>>> import core.stdc.stdlib;
>>>>>
>>>>> struct Foo {
>>>>>   double[] data;
>>>>>   double * ptr;
>>>>>   alias data this;
>>>>>
>>>>>   @nogc this(int n) {
>>>>>     ptr = cast(double*) malloc(n*double.sizeof);
>>>>>     data = ptr[0..n];
>>>>>     printf("Data has been allocated\n");
>>>>>   }
>>>>>  }
>>>>>
>>>>> ```
>>>>>
>>>> Why not just use `ptr` ? Why did you `data` with `ptr` ?
>>>
>>> Try `foo[10] = 1.5` and `foo.ptr[10] = 1.5`. The first correctly throws an out of bounds error. The second gives `Segmentation fault (core dumped)`.
>>
>> I think you can use data only because data contains data.ptr
> 
> Yes, but you get all the benefits of `double[]` for free if you do it that way, including the more concise foo[10] syntax.

I meant you do not need to add `ptr` field at all
```D
import std;
import core.stdc.stdlib;

struct Foo {
@nogc:
    double[] data;
    alias data this;

    this(int n)
    {
        auto ptr = cast(double*) malloc(n*double.sizeof);
        data = ptr[0..n];
    }
}

@nogc void main() {
    auto foo = SafeRefCounted!Foo(3);
    foo[0..3] = 1.5;
    printf("%f %f %f\n", foo[0], foo[1], foo[2]);
    foo.ptr[10] = 1.5; // no need for separate ptr field
}
```
June 12

On Wednesday, 12 June 2024 at 21:36:30 UTC, Dukc wrote:

>

bachmeier kirjoitti 12.6.2024 klo 18.21:

>

You're splitting things into GC-allocated memory and manually managed memory. There's also SafeRefCounted, which handles the malloc and free for you.

I suspect SafeRefCounted (or RefCounted) is not the best fit for this scenario. The problem with it is it mallocs and frees individual objects, which doesn't sound efficient to me.

Maybe it performs if the objects in question are big enough, or if they can be bundled to static arrays so there's no need to refcount individual objects. But even then, you can't just allocate and free dozens or hundreds of megabytes with one call, unlike with the GC or manual malloc/free. I honestly don't know if calling malloc/free for, say each 64KiB, would have performance implications over a single allocation.

Why would it be different from calling malloc and free manually? I guess I'm not understanding, because you put the same calls to malloc and free that you'd otherwise be doing inside this and ~this.