Thread overview
dhtslib v0.12.0 (high-throughput sequencing library)
Sep 01, 2021
James Blachly
Sep 02, 2021
Johan
Sep 02, 2021
James Blachly
September 01, 2021
I'm delighted to finally post an official announcement of our package for high-throughput sequencing (HTS), also called Next-generation sequencing (NGS): `dhtslib`. It's not a very clever name, and we are working on a new one. ;)

https://github.com/blachlylab/dhtslib/
https://code.dlang.org/packages/dhtslib

Once upon a time, BioD[1] was fairly active, but I am afraid D is not heavily used in bioinformatics and computational biology, especially in high-throughput (genome) sequencing applications when compared to its peers.[2] However, our group (cancer genomics) has found D an ideal language which is easy to pick up for Python programmers and yet retains powerful features for C/C++ programmers.

`dhtslib` began as a thin wrapper over the ubiquitous, but very low-level and hard to use `htslib` C library (https://github.com/samtools/htslib/). We use `dhtslib` extensively in both public and private projects for computational biology, and over the years it has grown from simply a (huge) set of `extern (C)` definitions to a fully featured, RAII-enabled genome sequencing focused bioinformatics package. If you are working in this field, or know someone open to D who works in this field, I strongly encourage you to point them at `dhtslib`!

 * `htslib` namespace with complete bindings to htslib
 * `dhtslib` namespace with high level object-oriented interfaces, many using underlying htslib calls for high performance, but via convenient and idiomatic D including RAII, Forward ranges, etc.
 * htslib-backed read/write of SAM/BAM/CRAM, VCF/BCF
 * Readers for BED and GFF3/GTF (not part of htslib)
 * FASTQ streamer
 * CIGAR manipulations

The next version, v0.13.0, adds a novel feature "Typesafe Coordinates", which I'll post about separately in a moment!

Kind regards

James S Blachly, MD
The Ohio State University

[0] https://github.com/blachlylab/dhtslib/
    https://code.dlang.org/packages/dhtslib
[1] https://github.com/biod/BioD
[2] Here is a contemporary example of D used in high-throughput sequencing: DENTIST by Arne Ludwig at Max Planck institute
    https://github.com/a-ludi/dentist -- if you know of more, please let me know!
September 02, 2021
On Wednesday, 1 September 2021 at 05:27:38 UTC, James Blachly wrote:
> I'm delighted to finally post an official announcement of our package for high-throughput sequencing (HTS), also called Next-generation sequencing (NGS): `dhtslib`. It's not a very clever name, and we are working on a new one. ;)
>
> https://github.com/blachlylab/dhtslib/
>
> [...]
>
> [2] Here is a contemporary example of D used in high-throughput sequencing: DENTIST by Arne Ludwig at Max Planck institute
>     https://github.com/a-ludi/dentist

I am surprised to see the use of DMD (see the Dockerfile). If you want runtime performance, the first thing I would do is switch to LDC or GDC.

Perhaps DENTIST's particular use of D and dhtslib is mainly forwarding calls to htslib (C) and thus D performance is not relevant?

-Johan



September 02, 2021
On Thursday, 2 September 2021 at 10:32:19 UTC, Johan wrote:
> On Wednesday, 1 September 2021 at 05:27:38 UTC, James Blachly
>> [2] Here is a contemporary example of D used in high-throughput sequencing: DENTIST by Arne Ludwig at Max Planck institute
>>     https://github.com/a-ludi/dentist
>
> I am surprised to see the use of DMD (see the Dockerfile). If you want runtime performance, the first thing I would do is switch to LDC or GDC.
>
> Perhaps DENTIST's particular use of D and dhtslib is mainly forwarding calls to htslib (C) and thus D performance is not relevant?


DENTIST is someone else's unrelated project that does not to my knowledge use `dhtslib`.