April 26, 2006 Re: Maximum array size | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | Ivan Kazmenko wrote:
> Secondly, for any number N, say, 1 <= N <= 100_000_000, I need to have a fast way to check whether it is prime.
Sounds to me like you want an associative array. I.e:
bool[int] primes;
primes[769] = true;
if (769 in primes) {
...
} else {
...
}
The only annoyance with this is that you might as well set primes[769] to false, as you don't care about the value, only whether the key exists.
|
April 26, 2006 Re: Maximum array size | ||||
---|---|---|---|---|
| ||||
Posted in reply to jcc7 | jcc7:
>>Besides, having such an undocumented feature as a 16M limit on an array length is IMHO no good and deserves at least an explanation by someone familiar with the dmd code.
>Are you sure it's undocumented?
>
>See http://www.digitalmars.com/d/arrays.html, "Static Arrays" section
>
>"The total size of a static array cannot exceed 16Mb. A dynamic array should be used instead for such large arrays."
Thanks for pointing that out! The document I used is "spec_DMD_0.109.pdf", it's outdated and does not seem to contain that limit; where could I find a more up-to-date specification? The site docs don't build up to a complete specification, do they?
Still, a question remains: what's the reason for such a limit?
Ivan Kazmenko.
|
April 26, 2006 Re: Maximum array size | ||||
---|---|---|---|---|
| ||||
Posted in reply to Deewiant | Deewiant wrote: > Ivan Kazmenko wrote: >> Secondly, for any number N, say, 1 <= N <= 100_000_000, I need to have a fast way to check whether it is prime. > > Sounds to me like you want an associative array. I.e: > > bool[int] primes; > > primes[769] = true; > > if (769 in primes) { > ... > } else { > ... > } > > The only annoyance with this is that you might as well set primes[769] to false, as you don't care about the value, only whether the key exists. They might look clean and elegant, but associative arrays are terribly slow in this particular case. A hash map would be fast, O(1), if all the primes were already known, but this isn't the case here. A tree implementation would have O(log n) average complexity and adding keys to a hash map would be quite suboptimal too (resizing, collisions). -- Jari-Matti |
April 26, 2006 Re: Maximum array size | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | In article <e2odih$1akv$1@digitaldaemon.com>, Ivan Kazmenko says... > >jcc7: >>>Besides, having such an undocumented feature as a 16M limit on an array length is IMHO no good and deserves at least an explanation by someone familiar with the dmd code. >>Are you sure it's undocumented? >> >>See http://www.digitalmars.com/d/arrays.html, "Static Arrays" section >> >>"The total size of a static array cannot exceed 16Mb. A dynamic array should be used instead for such large arrays." >Thanks for pointing that out! You're quite welcome. >The document I used is "spec_DMD_0.109.pdf", it's >outdated and does not seem to contain that limit; where could I find a more >up-to-date specification? The site docs don't build up to a complete >specification, do they? Well, spec_DMD_0.109.pdf is just an unofficial snapshot of the docs that existed back in December 2004. I might create another PDF sometime, but I think a lot of pages have been added since I made that PDF (and I'd have to remember how I did it). But the docs on the digitalmars website and included with DMD are the official version, so those are the best ones to use anyway (if possible). There are other PDF snapshots listed at http://www.prowiki.org/wiki4d/wiki.cgi?LanguageSpecification, but you've already found the newest one that I know of. >Still, a question remains: what's the reason for such a limit? I don't know what the reason is, but I'm sure there is one. It's probably related to issues such as RAM and garbage collection. It might be there because it's better to stop this practice at compile time than run the high risk of running out of memory at runtime. But I'm not expert at memory management, so I'm just guessing. jcc7 |
April 26, 2006 Re: Maximum array size | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | Ivan Kazmenko wrote:
> Still, a question remains: what's the reason for such a limit?
1) Most of the time, such huge static arrays are the result of a typo.
2) It's not practical for the compiler to determine what the maximum amount is. So very large static arrays will not reliably work, and there will be bug reports filed about why the executable mysteriously fails to load, etc.
3) Such an executable may load, but may cripple the system.
4) A 16Mb limit gives a reasonable chance of the program being portable.
5) If initialized data is put into such an array, the executable size will grow by the size of the array.
6) Optlink has some troubles with very large static data.
7) Such arrays can easily be just allocated dynamically at runtime.
|
April 26, 2006 Re: Maximum array size | ||||
---|---|---|---|---|
| ||||
Posted in reply to jcc7 | jcc7:
>Well, spec_DMD_0.109.pdf is just an unofficial snapshot of the docs that existed back in December 2004. I might create another PDF sometime, but I think a lot of pages have been added since I made that PDF (and I'd have to remember how I did it). But the docs on the digitalmars website and included with DMD are the official version, so those are the best ones to use anyway (if possible).
I checked the documentation you suggested and found, among other changes, that there are no bits and no bit arrays; seems that bit is currently an alias to ubyte or such. Why is that?
Ivan Kazmenko.
|
April 26, 2006 A benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright:
>...
>7) Such arrays can easily be just allocated dynamically at runtime.
Thanks for the very detailed explanation! Actually, I checked the seventh argument, and it worked for me, but gave me some interesting experience in the process of investigation which I'd like to share.
First, I checked the example posted by Dave, the program compiled and ran, but I found some oddity in its behaviour: after writing the output line, it still went on working a bit and then terminated.
Then, I changed my sieve program to use dynamic array instead of static one. Imagine my astonishment when it ran about 10 seconds, while the static array version ran no more than a second! I checked different versions of my program (such as manually working with a bit array stored in ubytes), and the results were the same: they all ran 10-20 seconds. By redirecting the output from file to the screen, I found out that the output was generated fast enough like before, and the remaining time was used by the program to consume processor resources for a purpose unknown.
I then remembered that in good old days I used no gc, and happily wrote a delete line at the end of my program. Oh joy! its runtime got back to one second. Well, I expected that gc is one of the main advantages of d over c++... but manually deleting allocated objects currently does not pose a problem to me. Anyway, I usually program for contests and research, not for large scale projects. But I wonder what did it do in that ten seconds. Did it scan the array being deleted for hidden pointers?
All that encouraged me to write a simple benchmark. Below is a simple Eratosphenes sieve program. Thanks to the conditional compile capabilities of D, I was able to maintain five versions in a small, simple, readable program.
-----
import std.stdio;
const int MaxN = 16_000_000;
version (Byte) {alias ubyte main_type;}
version (Int) {alias int main_type;}
version (Static) {main_type [MaxN] a;}
version (Dynamic) {main_type [] a;}
int main ()
{
int i, j, s;
version (Dynamic) {a = new main_type [MaxN];}
a[2..MaxN - 1] = 1;
for (i = 2; i * i < MaxN; i++)
if (a[i])
for (j = i * i; j < MaxN; j += i)
a[j] = 0;
s = 0;
for (i = 0; i < MaxN; i++)
if (a[i])
s++;
version (Delete) {delete a;}
printf ("The number of primes is %d.\n", s);
return 0;
}
-----
The compile line then looks like "dmd -version=Byte -version=Static test.d".
Here is the run time for different versions of the program on my current configuration - it's Athlon64 3200+ @2500 MHz (overclocked), 768 MB DDR, Windows 2003 Server (32-bit). I measured wall-clock time since I know no simple reliable way to get process time under Windows. Didn't try dmd under Linux yet.
-----
Byte, Static: 1.44
Byte, Dynamic: 2.52
Byte, Dynamic, Delete: 1.41
Int, Dynamic: 2.23
Int, Dynamic, Delete: 2.17
-----
Interestingly, repetitive runs show that the dynamic version with delete is in fact a little bit faster than the static version - I definitely didn't expect such an effect! On the contrary, dynamic version without delete does the job in the same time and then spends a second more doing some secret internal cleanup work. Another interesting notion is that the int version behaves nearly the same with or without delete, slightly faster than the dynamic byte version without delete. Static int version doesn't compile, since it requires a ~64M array.
Ivan Kazmenko.
|
April 27, 2006 Re: A benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | Ivan Kazmenko wrote:
> But I
> wonder what did it do in that ten seconds. Did it scan the array being deleted
> for hidden pointers?
Yup. GC is a lousy tool to use for 64Mb arrays. A program necessarily is not going to have many of those, and they should be handled with explicit allocation/delete.
|
April 27, 2006 Re: Maximum array size | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | Ivan Kazmenko wrote:
> jcc7:
>> Well, spec_DMD_0.109.pdf is just an unofficial snapshot of the docs that existed
>> back in December 2004. I might create another PDF sometime, but I think a lot of
>> pages have been added since I made that PDF (and I'd have to remember how I did
>> it). But the docs on the digitalmars website and included with DMD are the
>> official version, so those are the best ones to use anyway (if possible).
> I checked the documentation you suggested and found, among other changes, that
> there are no bits and no bit arrays; seems that bit is currently an alias to
> ubyte or such. Why is that?
Long threads about that. But you can use std.bitarray.
|
April 27, 2006 Re: A benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright wrote:
> Ivan Kazmenko wrote:
>> But I
>> wonder what did it do in that ten seconds. Did it scan the array being deleted
>> for hidden pointers?
>
> Yup. GC is a lousy tool to use for 64Mb arrays. A program necessarily is not going to have many of those, and they should be handled with explicit allocation/delete.
It would be nice if the GC could be told not to scan them as well (or if this were sorted out automatically using type information), but that's something I gather will happen at some point anyway if you're working on a compacting GC :-)
Sean
|
Copyright © 1999-2021 by the D Language Foundation