Thread overview
how to write a string to a c pointer?
Mar 05, 2015
zhmt
Mar 05, 2015
Kagamin
Mar 05, 2015
zhmt
Mar 05, 2015
FG
Mar 05, 2015
Kagamin
Mar 05, 2015
FG
Mar 05, 2015
ketmar
Mar 06, 2015
Ali Çehreli
Mar 06, 2015
Kagamin
Mar 06, 2015
FG
March 05, 2015
I am writing a asio binding. Objects need to be serialized into a buffer (void *),

for example, write utf8 string into buffer,
write int into buffer,
write long into buffer,

Here is my class

class Buffer
{
	private void *ptr;
	private int size;
	private int _cap;

	public this(int cap)
	{
		ptr = malloc(cap);
		this._cap = cap;
	}

	public ~this()
	{
		free(ptr);
	}

	public ubyte[] asArray()
	{
		ubyte[] ret = (cast(ubyte*)ptr)[0..cap];
		return ret;
	}

	public void* getPtr()
	{
		return ptr;
	}

	public int cap()
	{
		return _cap;
	}
}

how can i write a utf8 string into the buffer?
March 05, 2015
string s;
char[] b = cast(char[])asArray();
b[0..s.length] = s[];
March 05, 2015
On Thursday, 5 March 2015 at 09:42:53 UTC, Kagamin wrote:
> string s;
> char[] b = cast(char[])asArray();
> b[0..s.length] = s[];

Thank you very much. I should stop my developing , and read the dlang tutorial again.
March 05, 2015
On 2015-03-05 at 10:42, Kagamin wrote:
> string s;
> char[] b = cast(char[])asArray();
> b[0..s.length] = s[];

It's a bit more complicated than that if you include cutting string for buffers with smaller capacity, doing so respecting UTF-8, and adding a '\0' sentinel, since you may want to use the string in C (if I assume correctly). The setString function does all that:



import std.stdio, std.range, std.c.stdlib;

class Buffer {
    private void *ptr;
    private int size;
    private int _cap;

    public this(int cap) { ptr = malloc(cap); this._cap = cap; }
    public ~this() { free(ptr); }
    public ubyte[] asArray() { ubyte[] ret = (cast(ubyte*)ptr)[0..cap]; return ret; }
    public void* getPtr() { return ptr; }
    public int cap() { return _cap; }
}

int setString(Buffer buffer, string s)
{
    assert(buffer.cap > 0);
    char[] b = cast(char[])buffer.asArray();
    int len = min(s.length, buffer.cap - 1);
    int break_at;
    // The dchar is essential in walking over UTF-8 code points.
    // break_at will hold the last position at which the string can be cleanly cut
    foreach (int i, dchar v; s) {
        if (i == len) { break_at = i; break; }
        if (i > len) break;
        break_at = i;
    }
    len = break_at;
    b[0..len] = s[0..len];

    // add a sentinel if you want to use the string in C
    b[len] = '\0';
    // you could at this point set buffer.size to len in order to use the string in D
    return len;
}

void main()
{
    string s = "ąćęłńóśźż";
    foreach (i; 1..24) {
        Buffer buffer = new Buffer(i);
        int len = setString(buffer, s);
        printf("bufsize %2d -- strlen %2d -- %s --\n", i, len, buffer.getPtr);
    }
}



Output of the program:

bufsize  1 -- strlen  0 --  --
bufsize  2 -- strlen  0 --  --
bufsize  3 -- strlen  2 -- ą --
bufsize  4 -- strlen  2 -- ą --
bufsize  5 -- strlen  4 -- ąć --
bufsize  6 -- strlen  4 -- ąć --
bufsize  7 -- strlen  6 -- ąćę --
bufsize  8 -- strlen  6 -- ąćę --
bufsize  9 -- strlen  8 -- ąćęł --
bufsize 10 -- strlen  8 -- ąćęł --
bufsize 11 -- strlen 10 -- ąćęłń --
bufsize 12 -- strlen 10 -- ąćęłń --
bufsize 13 -- strlen 12 -- ąćęłńó --
bufsize 14 -- strlen 12 -- ąćęłńó --
bufsize 15 -- strlen 14 -- ąćęłńóś --
bufsize 16 -- strlen 14 -- ąćęłńóś --
bufsize 17 -- strlen 16 -- ąćęłńóśź --
bufsize 18 -- strlen 16 -- ąćęłńóśź --
bufsize 19 -- strlen 16 -- ąćęłńóśź --
bufsize 20 -- strlen 16 -- ąćęłńóśź --
bufsize 21 -- strlen 16 -- ąćęłńóśź --
bufsize 22 -- strlen 16 -- ąćęłńóśź --
bufsize 23 -- strlen 16 -- ąćęłńóśź --


March 05, 2015
On Thursday, 5 March 2015 at 13:57:45 UTC, FG wrote:
> void main()
> {
>     string s = "ąćęłńóśźż";

Try with string s = "ąc\u0301ęłńóśźż";
March 05, 2015
On 2015-03-05 at 15:18, Kagamin wrote:
> On Thursday, 5 March 2015 at 13:57:45 UTC, FG wrote:
>> void main()
>> {
>>     string s = "ąćęłńóśźż";
>
> Try with string s = "ąc\u0301ęłńóśźż";

Yeah, I see your point: ą, ąc (missing diacritic), ąć, ąćę, ...
Damn those composite characters!
March 05, 2015
On Thu, 05 Mar 2015 16:36:35 +0100, FG wrote:

> Damn those composite characters!

or invisible ones. or RTL switch.

unicode sux[1].

[1] http://file.bestmx.net/ee/articles/uni_vs_code.pdf

March 06, 2015
On 03/05/2015 03:25 PM, ketmar wrote:

> unicode sux[1].
>
> [1] http://file.bestmx.net/ee/articles/uni_vs_code.pdf

Thanks. I enjoyed the article and I agree with everything said in there.

It made me happy that I was not the only person who has been ruminating over "alphabet" as the crucial piece in this whole Unicode story. I've been giving the example of if I have a company name as the string "ali & jim", the uppercase of it should be "ALİ & JIM" because the different letter 'i's belong to different alphabets. Anyway...

Here is how I attempted to define an alphabet with its implied collation orders. For example, for the Turkish alphabet:

  https://code.google.com/p/trileri/source/browse/trunk/tr/alfabe.d#796

Unfortunately, the code itself is in Turkish, has never been finished, bad and older D code, and is abandoned at this point. :-/

Ali

March 06, 2015
On Friday, 6 March 2015 at 00:53:49 UTC, Ali Çehreli wrote:
> It made me happy that I was not the only person who has been ruminating over "alphabet" as the crucial piece in this whole Unicode story. I've been giving the example of if I have a company name as the string "ali & jim", the uppercase of it should be "ALİ & JIM" because the different letter 'i's belong to different alphabets.

I'd say, company name should be processed verbatim, no need for uppercase should arise.
March 06, 2015
On 2015-03-06 at 00:25, ketmar wrote:
> unicode sux[1].
>
> [1] http://file.bestmx.net/ee/articles/uni_vs_code.pdf


Great article. Thanks, Кетмар

  ⚠     ∑ ♫ ⚽ ☀ ☕ ☺  ≡  ♛