Thread overview
range result in Tuple! and how to convert into assocArray by sort?
May 10, 2022
MichaelBi
May 10, 2022
rikki cattermole
May 10, 2022
Ali Çehreli
May 10, 2022
MichaelBi
May 11, 2022
Ali Çehreli
May 10, 2022
MichaelBi
May 12, 2022
forkit
May 10, 2022

s is the string, and print result as following:

s.array.sort!("a<b").group.assocArray.byPair.array.sort!("a[0]<b[0]").each!writeln;

Tuple!(dchar, "key", uint, "value")('A', 231)
Tuple!(dchar, "key", uint, "value")('C', 247)
Tuple!(dchar, "key", uint, "value")('G', 240)
Tuple!(dchar, "key", uint, "value")('T', 209)

then how to transfer into [['A',231],['C',247],['G',240],['T',209]]? tried map!, but can only sortout key or value... tried array(), but result is not sorted then...thanks in advance.

May 10, 2022
If I am understanding the problem correctly, this is a super expensive method for doing something pretty simple. Even if it is a bit more code, this won't require memory allocation which in this case wouldn't be cheap (given how big DNA tends to be).

string s = "ACGTACGT";

uint[4] counts;

foreach(char c; s) {
	switch(c) {
		case 'A':
		case 'a':
			counts[0]++;
			break;
		case 'C':
		case 'c':
			counts[1]++;
			break;
		case 'G':
		case 'g':
			counts[2]++;
			break;
		case 'T':
		case 't':
			counts[3]++;
			break;
		default:
			assert(0, "Unknown compound");
	}
}

writeln(counts);
May 09, 2022
On 5/9/22 20:38, rikki cattermole wrote:

> this is a super expensive
> method for doing something pretty simple.

Yes! :)

Assuming the data is indeed validated in some way, the following should be even faster. It validates the data after the fact:

import std.stdio;
import std.range;
import std.exception;
import std.algorithm;
import std.format;

const ulong[] alphabet = [ 'A', 'C', 'G', 'T' ];

void main() {
  string s = "ACGTACGT";

  auto counts = new ulong[char.max];

  foreach(char c; s) {
    counts[c]++;
  }

  validateCounts(counts);
  writeln(counts.indexed(alphabet));
}

void validateCounts(ulong[] counts) {
  // The other elements should all be zero.
  enforce(counts
          .enumerate
          .filter!(t => !alphabet.canFind(t.index))
          .map!(t => t.value)
          .sum == 0,
          format!"There were illegal letters in the data: %s"(counts));
}

Ali

May 10, 2022
On Tuesday, 10 May 2022 at 03:38:08 UTC, rikki cattermole wrote:
> If I am understanding the problem correctly, this is a super expensive method for doing something pretty simple. Even if it is a bit more code, this won't require memory allocation which in this case wouldn't be cheap (given how big DNA tends to be).
>
> string s = "ACGTACGT";
>
> uint[4] counts;
>
> foreach(char c; s) {
> 	switch(c) {
> 		case 'A':
> 		case 'a':
> 			counts[0]++;
> 			break;
> 		case 'C':
> 		case 'c':
> 			counts[1]++;
> 			break;
> 		case 'G':
> 		case 'g':
> 			counts[2]++;
> 			break;
> 		case 'T':
> 		case 't':
> 			counts[3]++;
> 			break;
> 		default:
> 			assert(0, "Unknown compound");
> 	}
> }
>
> writeln(counts);

yes, thanks. understood this. the problem for me now is after learning D, always thinking about using range and function composition...and forgot the basic algorithm :)
May 10, 2022
On Tuesday, 10 May 2022 at 04:21:04 UTC, Ali Çehreli wrote:
> On 5/9/22 20:38, rikki cattermole wrote:
>
> > [...]
>
> Yes! :)
>
> Assuming the data is indeed validated in some way, the following should be even faster. It validates the data after the fact:
>
> [...]

this is cool! thanks for your time and i really like your book Programming in D :)
May 11, 2022
On 5/9/22 22:12, MichaelBi wrote:
> On Tuesday, 10 May 2022 at 04:21:04 UTC, Ali Çehreli wrote:
>> On 5/9/22 20:38, rikki cattermole wrote:
>>
>> > [...]
>>
>> Yes! :)
>>
>> Assuming the data is indeed validated in some way, the following
>> should be even faster. It validates the data after the fact:
>>
>> [...]
>
> this is cool!

I've been meaning to write about a bug in my code, which would likely cause zero issues, and which you've probably already fixed. ;)

 BAD:  auto counts = new ulong[char.max];

GOOD:   auto counts = new ulong[char.max - char.min + 1];
FINE:   auto counts = new ulong[256];

> thanks for your time and i really like your book
> Programming in D :)

Yay! :)

Ali

May 12, 2022
On Tuesday, 10 May 2022 at 03:22:04 UTC, MichaelBi wrote:
> s is the string, and print result as following:
>
> s.array.sort!("a<b").group.assocArray.byPair.array.sort!("a[0]<b[0]").each!writeln;
>
> Tuple!(dchar, "key", uint, "value")('A', 231)
> Tuple!(dchar, "key", uint, "value")('C', 247)
> Tuple!(dchar, "key", uint, "value")('G', 240)
> Tuple!(dchar, "key", uint, "value")('T', 209)
>
> then how to transfer into [['A',231],['C',247],['G',240],['T',209]]? tried map!, but can only sortout key or value... tried array(), but result is not sorted then...thanks in advance.

Adding tuples to an AA is easy.

Sorting the output of an AA is the tricky part.

// -----

module test;
@safe:

import std;

void main()
{
    uint[dchar] myAA;
    Tuple!(dchar, uint) myTuple;

    myTuple[0] = 'C'; myTuple[1] = 247;
    myAA[ myTuple[0] ] = myTuple[1];

    myTuple[0] = 'G'; myTuple[1] = 240;
    myAA[ myTuple[0] ] = myTuple[1];

    myTuple[0] = 'A'; myTuple[1] = 231;
    myAA[ myTuple[0] ] = myTuple[1];

    myTuple[0] = 'T'; myTuple[1] = 209;
    myAA[ myTuple[0] ] = myTuple[1];

    // NOTE: associative arrays do not preserve the order of the keys inserted into the array.
    // See: https://dlang.org/spec/hash-map.html

    // if we want the output of an AA to be sorted (by key)..
    string[] orderedKeyPairSet;

    foreach(ref key, ref value; myAA.byPair)
        orderedKeyPairSet ~= key.to!string ~ ":" ~ value.to!string;

    orderedKeyPairSet.sort;

    foreach(ref str; orderedKeyPairSet)
        writeln(str);

    /+
    A:231
    C:247
    G:240
    T:209
   +/

}

// --------