August 31, 2008
Fawzi Mohamed:
> I would have done it quite differently: keep a linked list (if you want
> to improve of batches of 10 arrays), with a pointer to the end, append
> in O(1), and keep the total length.
> only upon a call to toArray allocate an array of the requested length
> and fill it.
> An array (even with tricks) is not a good structure to append to, so it
> seems a bad idea to me to use it.

Yes, my code was experimental, my main purpose was to ask if it's correct, in its GC management, etc. Algorithmic and code tuning is something I think about later.

I'm now implementing ArrayBuilder with a deque-like, to see how it performs in comparison.
There are 3 possible things you may want to join to such ArrayBuilder:
- Fixed size arrays
- Single items
- Dynamic arrays

For the dynamic arrays can be copied just a reference, for the other is better to copy the data. The single items can be added to short (64 KB) structs with fixed size arrays inside, organized into a linked list. The static arrays can equally be copied in pieces into such blocks. The dynamic arrays instead require short arrays filled with the struct of the dynamic arrays. This makes things messy. If the user keeps alternating single items to dynamic arrays the data structure has to avoid wasting too much memory. So I don't see an easy solution yet, beside my original one or a similar one (that consists in copying all items of dynamic arrays too).

Bye and thank you,
bearophile
August 31, 2008
dsimcha <dsimcha@yahoo.com> wrote:
> static if (IsType!(Mutable!(Types[0]), bool, byte, ubyte, short, ushort, int, uint,
>                            long, ulong, float, double, real, ifloat, idouble,
>                            ireal, cfloat, cdouble, creal, char, dchar, wchar) )
> 
> Also, although it may be the least bad way to do this (I can't think of a better one), the process of elimination method this relies on seems kind of brittle because it would have to be modified for any new type that is added.  If anyone knows of a less kludge-y way, please post.

I may be missing something, but the following seems to work:

template Category(T)
{
	static if (
		is(T == interface) ||
		is(T == class) ||
		is(T Base : Base*))
	{
		const Category = "reference";
	}
	else
		const Category = "value";
}

template Category(T : T[U], U)
{
	const Category = Category!(T);
}

You need to extend this for structs and unions but that seems routine...

-- 
SnakE
August 31, 2008
Sergey Gromov <snake.scaly@gmail.com> wrote:
> template Category(T)
> {
> 	static if (
> 		is(T == interface) ||
> 		is(T == class) ||
> 		is(T Base : Base*))
> 	{
> 		const Category = "reference";
> 	}
> 	else
> 		const Category = "value";
> }
> 
> template Category(T : T[U], U)
> {
> 	const Category = Category!(T);
> }

Fixed arrays: dynamic and associative arrays always contain references, static arrays must be checked recursively:

template Category(T)
{
	static if (
		is(T == interface) ||
		is(T == class) ||
		is(T Elem : Elem[]) ||	// dyn array
		is(T Base : Base*))
	{
		const Category = "reference";
	}
	else
		const Category = "value";
}

template Category(T : T[U], U)
{
	// AAs always contain references
	const Category = "reference";
}

template Category(T : T[U], size_t U)
{
	// Static arrays may or may not contain references
	const Category = Category!(T);
}

-- 
SnakE
August 31, 2008
On 2008-08-31 17:37:13 +0200, bearophile <bearophileHUGS@lycos.com> said:

> Fawzi Mohamed:
>> I would have done it quite differently: keep a linked list (if you want
>> to improve of batches of 10 arrays), with a pointer to the end, append
>> in O(1), and keep the total length.
>> only upon a call to toArray allocate an array of the requested length
>> and fill it.
>> An array (even with tricks) is not a good structure to append to, so it
>> seems a bad idea to me to use it.
> 
> Yes, my code was experimental, my main purpose was to ask if it's correct, in its GC management, etc. Algorithmic and code tuning is something I think about later.
> 
> I'm now implementing ArrayBuilder with a deque-like, to see how it performs in comparison.
> There are 3 possible things you may want to join to such ArrayBuilder:
> - Fixed size arrays
> - Single items
> - Dynamic arrays
> 
> For the dynamic arrays can be copied just a reference, for the other is better to copy the data. The single items can be added to short (64 KB) structs with fixed size arrays inside, organized into a linked list. The static arrays can equally be copied in pieces into such blocks. The dynamic arrays instead require short arrays filled with the struct of the dynamic arrays. This makes things messy. If the user keeps alternating single items to dynamic arrays the data structure has to avoid wasting too much memory. So I don't see an easy solution yet, beside my original one or a similar one (that consists in copying all items of dynamic arrays too).
> 
> Bye and thank you,
> bearophile

I had not thought about an important thing: the content of the array can change... so copying is safer.
Probably using (as you said) a linked list of fixed arrays is the best approach for many small appends, maybe append might have a bool argument "fixed=false" to know if copying can be avoided.

Fawzi

August 31, 2008
Fawzi Mohamed:
> I had not thought about an important thing: the content of the array can change... so copying is safer.

Yes, and there are other problems, that make the situation/code a mess, so I copy all items now, keeping code simpler. I'm debugging/benchmarking it now...

Later,
bearophile
September 01, 2008
On Mon, 01 Sep 2008 01:11:46 +0400, bearophile <bearophileHUGS@lycos.com> wrote:

> Fawzi Mohamed:
>> I had not thought about an important thing: the content of the array
>> can change... so copying is safer.
>
> Yes, and there are other problems, that make the situation/code a mess, so I copy all items now, keeping code simpler. I'm debugging/benchmarking it now...
>
> Later,
> bearophile

Please, test my version, too!
While it is slighly slower, it allows appending invariant arrays (i.e. strings) without copying data (although it is currently not implemented).

An idea is to store an array (or a list) of invariant elements.
Current buffer is mutable until it is completely full. It then casted to invariant and appended to the list and an new mutable buffer (of a higher size) created.

struct ArrayBuilder(T) {
    //private BearophileArrayBuilder!(invariant(T[])) subArrays;
    private invariant(T[])[] subArrays;
    private int lengthSoFar;
	
    private T[] tmpArray;
    private int curOffset = 0;

    void opCatAssign(T x) {
        int capacity = tmpArray.length;
        if (curOffset >= capacity) {
            int old_capacity = capacity;

            if (capacity == 0)
                capacity = 4; // starting point not too much small
            else if (capacity < (1 << 26)) // 67_108_864
                capacity *= 2; // for amortized O(1) and less RAM fragmentation
            else
                capacity *= 1.3; // slower growing rate to save RAM

            if (old_capacity != 0) {
                subArrays ~= cast(invariant(T[]))tmpArray;
            }

            tmpArray = new T[capacity];
			curOffset = 0;
        }

        tmpArray[curOffset] = x;
        assert(tmpArray[curOffset] == x);
        ++curOffset;
        ++lengthSoFar;
    }

    ///
    T[] toarray() {
        T[] array = new T[lengthSoFar];
        int offset = 0;
        foreach (subArray; subArrays) {
            int newOffset = offset+subArray.length;
            array[offset..newOffset] = subArray[];
            offset = newOffset;
        }
        array[offset..$] = tmpArray[0..curOffset];
        return array;
    }
}
September 01, 2008
bearophile wrote:
> Christian Kamm:
>> Actually, I don't think the D spec requires typeof(T.init) == T.
> 
> Maybe the D specs have to be fixed then.
> 

This should do what you want:

template Init(T) { const T Init; }
September 01, 2008
Hi

I came across this the other day, and no one has mentioned this, on this news group before, I thought I would bring the subject to the communities attention, so to speak.

Google has very recently, open sourced "Protocol Buffers".

What is it you ask ?  In a couple of lines it is a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

Think XML, but smaller, faster, and simpler.

Why not just use XML ?

Protocol buffers have many advantages over XML for serializing structured data.  Protocol buffers:
* are simpler
* are 3 to 10 times smaller
* are 20 to 100 times faster
* are less ambiguous
* generate data access classes that are easier to use programmatically.

PB supports Java, Python,and C++ currently.

A more detailed overview can be found here:
http://code.google.com/apis/protocolbuffers/docs/overview.html

and the FAQ here:
http://code.google.com/apis/protocolbuffers/docs/faq.html

See the answer to the question "Can I add support for a new language to protocol buffers?" inside the FAQ.

Some Tips and comments can be found here:
http://zunger.livejournal.com/164024.html


My questions.
Does the D community see this of interest ?
Is this something they might use ?
Do they see value in it being added to the respective
Tango or Phobos frameworks ?

any other comments ?

cheers

Nick B.
September 01, 2008
Nick B wrote:

> Hi
> 
> I came across this the other day, and no one has mentioned this, on this news group before, I thought I would bring the subject to the communities attention, so to speak.

That is well and fine, but you shouldn't hijack a thread for it, rather start a new one.

> Google has very recently, open sourced "Protocol Buffers".
> 
> What is it you ask ?  In a couple of lines it is a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
> 
> Think XML, but smaller, faster, and simpler.
> 
> Why not just use XML ?
> 
> Protocol buffers have many advantages over XML for serializing
> structured data.  Protocol buffers:
> * are simpler
> * are 3 to 10 times smaller
> * are 20 to 100 times faster

Not necessarily a valid point if you use the right parser, say, one from Tango.

> * are less ambiguous
> * generate data access classes that are easier to use programmatically.
> 
> PB supports Java, Python,and C++ currently.
> 
> A more detailed overview can be found here: http://code.google.com/apis/protocolbuffers/docs/overview.html
> 
> and the FAQ here: http://code.google.com/apis/protocolbuffers/docs/faq.html
> 
> See the answer to the question "Can I add support for a new language to protocol buffers?" inside the FAQ.
> 
> Some Tips and comments can be found here: http://zunger.livejournal.com/164024.html
> 
> 
> My questions.
> Does the D community see this of interest ?
> Is this something they might use ?
> Do they see value in it being added to the respective
> Tango or Phobos frameworks ?

If the protocol is widely used, then it would certainly be beneficial to support it, and I would assume that it is fairly easy to implement on top of Tango.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
September 01, 2008
Denis Koroskin:
> Please, test my version, too!
> While it is slighly slower, it allows appending invariant arrays (i.e.
> strings) without copying data (although it is currently not implemented).

Sorry, I'm not using D 2.x. I (hopefully) will soon show an improved version of my code, so you can test your version.

Note: I have benchmarked the version with the deque-like data structure, and it's slower, the code is a bit more complex (and I may have hit a bug in the GC (I'd like to know if Tango behaves at the same way)), so I'll probably just use the first version, with the reallocs.

----------------

With Phobos:

import std.stdio: writefln;
import std.gc: malloc;

void main() {
    int M = 1 << 16;
    int N = M / int.sizeof;
    const int N_POINTERS = 18;

    int*[N_POINTERS] pointers;
    foreach (ref ptr; pointers) {
        ptr = cast(int*)malloc(N * int.sizeof);
        if (ptr is null) {
            throw new Error("OUT OF MEM");
            return 1;
        }
    }

    int last = 0;
    foreach (i, ptr; pointers) {
        int int_ptr = cast(int)ptr;
        int diff = int_ptr - last;
        writefln(i, " ", int_ptr, " ", diff, " ", M - diff);
        last = int_ptr;
    }
}

Prints:

0 9052160 9052160 -8986624
1 9117696 65536 0
2 9183232 65536 0
3 9248768 65536 0
4 9314304 65536 0
5 9379840 65536 0
6 9445376 65536 0
7 9510912 65536 0
8 9576448 65536 0
9 9641984 65536 0
10 9707520 65536 0
11 9773056 65536 0
12 9838592 65536 0
13 9904128 65536 0
14 9969664 65536 0
15 10092544 122880 -57344
16 10158080 65536 0
17 10223616 65536 0

Is that result at n = 15 normal/correct?

Bye,
bearophile