Thread overview | ||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
June 13, 2018 Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
I had a little fun today kicking the crap out of C's memcpy with a D implementation. https://github.com/JinShil/memcpyD Request for help: I don't have a Linux system running on real hardware at this time, nor do I have a wide range of platforms and machines to test with. If you'd like to help me with this potentially foolish endeavor, please run the program on your hardware and send me the results. Feedback, advise, and pull requests to improve the implementation are most welcome. Mike |
June 13, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Franklin | On Wednesday, 13 June 2018 at 06:46:43 UTC, Mike Franklin wrote:
> I had a little fun today kicking the crap out of C's memcpy with a D implementation.
>
> https://github.com/JinShil/memcpyD
>
> Request for help: I don't have a Linux system running on real hardware at this time, nor do I have a wide range of platforms and machines to test with. If you'd like to help me with this potentially foolish endeavor, please run the program on your hardware and send me the results.
Hi Mike,
These are my results running your program under archlinux x86_64 with the zen-kernel 4.17.1, the hardware is powered by an ancient "Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz" with 4GB of ram:
size memcpyC memcpyD
1 67607 56810
2 68105 57638
4 66760 58949
8 66943 61262
16 71937 43821
32 70955 48392
64 111473 54226
128 144784 77165
256 183504 113597
512 289039 180930
1024 450526 1314835
2048 782029 1890236
4096 1627622 3165319
8192 2751701 5614202
16384 6361074 11484517
32768 30931212 42805529
65536 61878379 86000892
size memcpyC memcpyD
1 66796 44745
1 66773 44343
1 66780 44157
2 66769 44370
2 66792 44529
4 66776 44298
4 66775 44412
8 66766 44409
8 70945 44359
4 66804 44367
8 71007 44432
16 75210 50656
|
June 13, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Franklin | Ubuntu 18.04 Linux 4.15.0-23-generic AMD® Fx(tm)-8350 eight-core processor × 8 size memcpyC memcpyD 1 51089 36921 2 45896 35733 4 46079 36200 8 48443 37509 16 48669 24925 32 52917 27787 64 55631 44928 128 84282 47795 256 107350 66009 512 159310 126795 1024 247683 452560 2048 440687 673211 4096 1129135 1304085 8192 4740910 4095254 16384 8389579 8874273 32768 16630336 17370310 65536 33032013 42904705 size memcpyC memcpyD 1 52354 28365 1 48407 28445 1 50264 30273 2 51312 27708 2 46138 28973 4 52753 28535 4 52150 27418 8 52220 27276 8 49625 27804 4 49356 33510 8 48529 27668 16 52662 135357 second run size memcpyC memcpyD 1 47248 36964 2 45624 35627 4 45535 35596 8 47920 37012 16 47960 25107 32 52798 27394 64 55444 44282 128 76819 41055 256 105852 66429 512 157629 126243 1024 253841 448974 2048 438973 667101 4096 1144280 1337549 8192 3647558 4141162 16384 8301059 8722185 32768 16413116 17506957 65536 32958933 40381270 size memcpyC memcpyD 1 48513 26288 1 46080 26842 1 48526 26989 2 48634 26419 2 43522 27150 4 48229 25737 4 52841 28117 8 49632 25913 8 46325 25487 4 40267 32343 8 45990 25220 16 46509 124042 |
June 13, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Franklin | On Wednesday, 13 June 2018 at 06:46:43 UTC, Mike Franklin wrote:
> I had a little fun today kicking the crap out of C's memcpy with a D implementation.
If I read your benchmark graphs right, they claimed that allocating 16 kilobytes takes over 10^^6 usecs, with both mallocs. Doesn't that mean over a second, 16 kilobytes? Can't be! Are you confusing usecs with nsecs?
|
June 13, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dukc | On Wednesday, 13 June 2018 at 09:40:05 UTC, Dukc wrote: > If I read your benchmark graphs right, they claimed that allocating 16 kilobytes takes over 10^^6 usecs, with both mallocs. Doesn't that mean over a second, 16 kilobytes? Can't be! Are you confusing usecs with nsecs? The benchmark doesn't allocate any data; it's just copying data. Each benchmark is run 10,000,000 times to smooth out some of the entropy in the results: https://github.com/JinShil/memcpyD/blob/2e0d3c33ea876a25a04358a3ae505b2eba9f99cb/memcpyd.d#L78 The usecs in the graph is the time it takes to run the benchmark 10,000,000 times. Mike |
June 13, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Franklin | On Wednesday, 13 June 2018 at 09:59:52 UTC, Mike Franklin wrote:
> The benchmark doesn't allocate any data; it's just copying data.
>
> Mike
Ah of course. I was thinking other stuff while writing.
|
June 13, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dukc | On Wednesday, 13 June 2018 at 10:13:13 UTC, Dukc wrote:
> On Wednesday, 13 June 2018 at 09:59:52 UTC, Mike Franklin wrote:
>> The benchmark doesn't allocate any data; it's just copying data.
>>
>> Mike
>
> Ah of course. I was thinking other stuff while writing.
Well, actually, I probably should divide that time by 10,000,000 to make a more accurate representation.
Thanks for the feedback,
Mike
|
June 13, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
On Wednesday, 13 June 2018 at 09:07:21 UTC, ToRuSer wrote:
> On Wednesday, 13 June 2018 at 06:46:43 UTC, Mike Franklin wrote:
>> I had a little fun today kicking the crap out of C's memcpy with a D implementation.
>>
>> https://github.com/JinShil/memcpyD
>>
>> Request for help: I don't have a Linux system running on real hardware at this time, nor do I have a wide range of platforms and machines to test with. If you'd like to help me with this potentially foolish endeavor, please run the program on your hardware and send me the results.
>>
>> Feedback, advise, and pull requests to improve the implementation are most welcome.
>>
>> Mike
>
> All Tor users now apparently have their posts subjected to 'moderation'.
>
> (i.e. someone, will, perhaps, at some point, get around to reviewing their posts, and then, perhaps, it might reach the forum, or not.)
>
> So... maybe..you'll get those posts...or maybe not...and when.. is anyones guess.
>
> So well done to the D community, for discriminating against all the Tor users out there. You've done yourself proud.
The problem is likely more that someone has used Tor to troll here and then the enpoint used was blacklisted.
|
June 13, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Franklin | On Wednesday, 13 June 2018 at 10:17:10 UTC, Mike Franklin wrote: > Well, actually, I probably should divide that time by 10,000,000 to make a more accurate representation. For rigorous benchmarking, check out the first part of Andrei's Writing Fast Code: https://www.youtube.com/watch?v=vrfYLlR8X8k One takeaway is that taking the average of many runtimes is not the best use of your dataset. |
June 13, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to drug | On Wednesday, 13 June 2018 at 08:55:40 UTC, drug wrote:
> Ubuntu 18.04 Linux 4.15.0-23-generic
> AMD® Fx(tm)-8350 eight-core processor × 8
>
> size memcpyC memcpyD
> 1 51089 36921
> 2 45896 35733
> 4 46079 36200
> 8 48443 37509
> 16 48669 24925
> 32 52917 27787
> 64 55631 44928
> 128 84282 47795
> 256 107350 66009
> 512 159310 126795
> 1024 247683 452560
> 2048 440687 673211
> 4096 1129135 1304085
> 8192 4740910 4095254
> 16384 8389579 8874273
> 32768 16630336 17370310
> 65536 33032013 42904705
>
> size memcpyC memcpyD
> 1 52354 28365
> 1 48407 28445
> 1 50264 30273
> 2 51312 27708
> 2 46138 28973
> 4 52753 28535
> 4 52150 27418
> 8 52220 27276
> 8 49625 27804
> 4 49356 33510
> 8 48529 27668
> 16 52662 135357
Interesting! I have an AMD 8370 running Windows 8, and I get more favorable results in Windows:
size memcpyC memcpyD
1 45361 43626
2 55091 43791
4 70507 43714
8 50910 42854
16 63328 28831
32 72817 30790
64 76307 45823
128 97180 55368
256 164935 68362
512 230508 132100
1024 502189 490590
2048 892968 823070
4096 1896480 1456353
8192 4530645 4516681
16384 10886602 9921215
32768 21717080 19116839
65536 59787610 43549445
size memcpyC memcpyD
1 48770 30084
1 49169 30921
1 43370 30144
2 51404 27571
2 56002 29729
4 69588 29804
4 63743 29510
8 55492 29002
8 46752 31793
4 72673 28858
8 48989 27547
10 55527 121628
In your results, I see that for sizes 1024 and higher (that's when is dispatches to the REP MOVSB algorithm), the performance begins to degrade for Linux. I'm going to install Linux soon and see if I can fix that.
Thanks for the data,
Mike
|
Copyright © 1999-2021 by the D Language Foundation