Jump to page: 1 25  
Page
Thread overview
Encouraging preliminary results implementing memcpy in D
Jun 13, 2018
Mike Franklin
Jun 13, 2018
Antonio Corbi
Jun 13, 2018
drug
Jun 13, 2018
Mike Franklin
Jun 13, 2018
Dukc
Jun 13, 2018
Mike Franklin
Jun 13, 2018
Dukc
Jun 13, 2018
Mike Franklin
Jun 13, 2018
Arredondo
Jun 13, 2018
Basile B.
Jun 13, 2018
Fra Mecca
Jun 13, 2018
Mike Franklin
Jun 13, 2018
Uknown
Jun 14, 2018
Jonathan M Davis
Jun 14, 2018
Uknown
Jun 13, 2018
UnpaidTester
Jun 13, 2018
Cym13
Jun 13, 2018
Ali Çehreli
Jun 14, 2018
errExit
Jun 14, 2018
Cym13
Jun 14, 2018
AnotherTorUser
Jun 14, 2018
Jonathan M Davis
[OT]: companies
Jun 15, 2018
Joakim
Jun 15, 2018
Arafel
Jun 14, 2018
Joakim
Jun 14, 2018
rikki cattermole
Jun 14, 2018
bachmeier
Jun 14, 2018
Diego
Jun 14, 2018
Patrick Schluter
Jun 14, 2018
baz
Jun 15, 2018
Mike Franklin
Jun 15, 2018
Patrick Schluter
Jun 17, 2018
David Nadlinger
Jun 17, 2018
David Nadlinger
Jun 18, 2018
Mike Franklin
Jun 18, 2018
Mike Franklin
June 13, 2018
I had a little fun today kicking the crap out of C's memcpy with a D implementation.

https://github.com/JinShil/memcpyD

Request for help: I don't have a Linux system running on real hardware at this time, nor do I have a wide range of platforms and machines to test with.  If you'd like to help me with this potentially foolish endeavor, please run the program on your hardware and send me the results.

Feedback, advise, and pull requests to improve the implementation are most welcome.

Mike
June 13, 2018
On Wednesday, 13 June 2018 at 06:46:43 UTC, Mike Franklin wrote:
> I had a little fun today kicking the crap out of C's memcpy with a D implementation.
>
> https://github.com/JinShil/memcpyD
>
> Request for help: I don't have a Linux system running on real hardware at this time, nor do I have a wide range of platforms and machines to test with.  If you'd like to help me with this potentially foolish endeavor, please run the program on your hardware and send me the results.

Hi Mike,
These are my results running your program under archlinux x86_64 with the zen-kernel 4.17.1, the hardware is powered by an ancient "Intel(R) Core(TM)2 CPU 6600  @ 2.40GHz" with 4GB of ram:

size memcpyC memcpyD
1 67607 56810
2 68105 57638
4 66760 58949
8 66943 61262
16 71937 43821
32 70955 48392
64 111473 54226
128 144784 77165
256 183504 113597
512 289039 180930
1024 450526 1314835
2048 782029 1890236
4096 1627622 3165319
8192 2751701 5614202
16384 6361074 11484517
32768 30931212 42805529
65536 61878379 86000892

size memcpyC memcpyD
1 66796 44745
1 66773 44343
1 66780 44157
2 66769 44370
2 66792 44529
4 66776 44298
4 66775 44412
8 66766 44409
8 70945 44359
4 66804 44367
8 71007 44432
16 75210 50656

June 13, 2018
Ubuntu 18.04 Linux 4.15.0-23-generic
AMD® Fx(tm)-8350 eight-core processor × 8

size memcpyC memcpyD
1 51089 36921
2 45896 35733
4 46079 36200
8 48443 37509
16 48669 24925
32 52917 27787
64 55631 44928
128 84282 47795
256 107350 66009
512 159310 126795
1024 247683 452560
2048 440687 673211
4096 1129135 1304085
8192 4740910 4095254
16384 8389579 8874273
32768 16630336 17370310
65536 33032013 42904705

size memcpyC memcpyD
1 52354 28365
1 48407 28445
1 50264 30273
2 51312 27708
2 46138 28973
4 52753 28535
4 52150 27418
8 52220 27276
8 49625 27804
4 49356 33510
8 48529 27668
16 52662 135357

second run

size memcpyC memcpyD
1 47248 36964
2 45624 35627
4 45535 35596
8 47920 37012
16 47960 25107
32 52798 27394
64 55444 44282
128 76819 41055
256 105852 66429
512 157629 126243
1024 253841 448974
2048 438973 667101
4096 1144280 1337549
8192 3647558 4141162
16384 8301059 8722185
32768 16413116 17506957
65536 32958933 40381270

size memcpyC memcpyD
1 48513 26288
1 46080 26842
1 48526 26989
2 48634 26419
2 43522 27150
4 48229 25737
4 52841 28117
8 49632 25913
8 46325 25487
4 40267 32343
8 45990 25220
16 46509 124042

June 13, 2018
On Wednesday, 13 June 2018 at 06:46:43 UTC, Mike Franklin wrote:
> I had a little fun today kicking the crap out of C's memcpy with a D implementation.

If I read your benchmark graphs right, they claimed that allocating 16 kilobytes takes over 10^^6 usecs, with both mallocs. Doesn't that mean over a second, 16 kilobytes? Can't be! Are you confusing usecs with nsecs?
June 13, 2018
On Wednesday, 13 June 2018 at 09:40:05 UTC, Dukc wrote:

> If I read your benchmark graphs right, they claimed that allocating 16 kilobytes takes over 10^^6 usecs, with both mallocs. Doesn't that mean over a second, 16 kilobytes? Can't be! Are you confusing usecs with nsecs?

The benchmark doesn't allocate any data; it's just copying data.  Each benchmark is run 10,000,000 times to smooth out some of the entropy in the results:  https://github.com/JinShil/memcpyD/blob/2e0d3c33ea876a25a04358a3ae505b2eba9f99cb/memcpyd.d#L78  The usecs in the graph is the time it takes to run the benchmark 10,000,000 times.

Mike
June 13, 2018
On Wednesday, 13 June 2018 at 09:59:52 UTC, Mike Franklin wrote:
> The benchmark doesn't allocate any data; it's just copying data.
>
> Mike

Ah of course. I was thinking other stuff while writing.


June 13, 2018
On Wednesday, 13 June 2018 at 10:13:13 UTC, Dukc wrote:
> On Wednesday, 13 June 2018 at 09:59:52 UTC, Mike Franklin wrote:
>> The benchmark doesn't allocate any data; it's just copying data.
>>
>> Mike
>
> Ah of course. I was thinking other stuff while writing.

Well, actually, I probably should divide that time by 10,000,000 to make a more accurate representation.

Thanks for the feedback,

Mike
June 13, 2018
On Wednesday, 13 June 2018 at 09:07:21 UTC, ToRuSer wrote:
> On Wednesday, 13 June 2018 at 06:46:43 UTC, Mike Franklin wrote:
>> I had a little fun today kicking the crap out of C's memcpy with a D implementation.
>>
>> https://github.com/JinShil/memcpyD
>>
>> Request for help: I don't have a Linux system running on real hardware at this time, nor do I have a wide range of platforms and machines to test with.  If you'd like to help me with this potentially foolish endeavor, please run the program on your hardware and send me the results.
>>
>> Feedback, advise, and pull requests to improve the implementation are most welcome.
>>
>> Mike
>
> All Tor users now apparently have their posts subjected to 'moderation'.
>
> (i.e. someone, will, perhaps, at some point, get around to reviewing their posts, and then, perhaps, it might reach the forum, or not.)
>
> So... maybe..you'll get those posts...or maybe not...and when.. is anyones guess.
>
> So well done to the D community, for discriminating against all the Tor users out there. You've done yourself proud.

The problem is likely more that someone has used Tor to troll here and then the enpoint used was blacklisted.
June 13, 2018
On Wednesday, 13 June 2018 at 10:17:10 UTC, Mike Franklin wrote:
> Well, actually, I probably should divide that time by 10,000,000 to make a more accurate representation.


For rigorous benchmarking, check out the first part of Andrei's Writing Fast Code:

https://www.youtube.com/watch?v=vrfYLlR8X8k

One takeaway is that taking the average of many runtimes is not the best use of your dataset.
June 13, 2018
On Wednesday, 13 June 2018 at 08:55:40 UTC, drug wrote:
> Ubuntu 18.04 Linux 4.15.0-23-generic
> AMD® Fx(tm)-8350 eight-core processor × 8
>
> size memcpyC memcpyD
> 1 51089 36921
> 2 45896 35733
> 4 46079 36200
> 8 48443 37509
> 16 48669 24925
> 32 52917 27787
> 64 55631 44928
> 128 84282 47795
> 256 107350 66009
> 512 159310 126795
> 1024 247683 452560
> 2048 440687 673211
> 4096 1129135 1304085
> 8192 4740910 4095254
> 16384 8389579 8874273
> 32768 16630336 17370310
> 65536 33032013 42904705
>
> size memcpyC memcpyD
> 1 52354 28365
> 1 48407 28445
> 1 50264 30273
> 2 51312 27708
> 2 46138 28973
> 4 52753 28535
> 4 52150 27418
> 8 52220 27276
> 8 49625 27804
> 4 49356 33510
> 8 48529 27668
> 16 52662 135357

Interesting! I have an AMD 8370 running Windows 8, and I get more favorable results in Windows:

size memcpyC memcpyD
1 45361 43626
2 55091 43791
4 70507 43714
8 50910 42854
16 63328 28831
32 72817 30790
64 76307 45823
128 97180 55368
256 164935 68362
512 230508 132100
1024 502189 490590
2048 892968 823070
4096 1896480 1456353
8192 4530645 4516681
16384 10886602 9921215
32768 21717080 19116839
65536 59787610 43549445

size memcpyC memcpyD
1 48770 30084
1 49169 30921
1 43370 30144
2 51404 27571
2 56002 29729
4 69588 29804
4 63743 29510
8 55492 29002
8 46752 31793
4 72673 28858
8 48989 27547
10 55527 121628

In your results, I see that for sizes 1024 and higher (that's when is dispatches to the REP MOVSB algorithm), the performance begins to degrade for Linux.  I'm going to install Linux soon and see if I can fix that.

Thanks for the data,

Mike
« First   ‹ Prev
1 2 3 4 5