Thread overview
Bytes with 128 bits?
Aug 12, 2007
Manfred Nowak
Aug 12, 2007
Regan Heath
Aug 14, 2007
Manfred Nowak
Aug 14, 2007
Don Clugston
Aug 15, 2007
Manfred Nowak
Aug 15, 2007
Manfred Nowak
Aug 13, 2007
Witold Baryluk
August 12, 2007
As some may know current DDR dual channel configurations access 128 bits in each transaction.

How has this to be modelled in D?

-manfred
August 12, 2007
Manfred Nowak wrote:
> As some may know current DDR dual channel configurations access 128 bits in each transaction.
> 
> How has this to be modelled in D?

std.bitarray?

ucent? (once it is implemented)

Or am I missing the point entirely?

Regan
August 13, 2007
Dnia Sun, 12 Aug 2007 20:39:24 +0000 (UTC)
Manfred Nowak <svv1999@hotmail.com> napisał/a:

> As some may know current DDR dual channel configurations access 128 bits in each transaction.
> 
> How has this to be modelled in D?

It isn't. 128 is used only by hardware. If it will be avaible to programer it will be called "cent and ucent".

-- 
Witold Baryluk
MAIL: baryluk@smp.if.uj.edu.pl, baryluk@mpi.int.pl
JID: movax@jabber.autocom.pl
August 14, 2007
Regan Heath wrote

> Or am I missing the point entirely?

I post in the learn-group because do not know exactly what the point is.

At first glance it seems that the current hardware emulates bytes of size 8 bits on main memory with a granularity of 128 bits.

If this is true then a severe slowdown should happen by exceeding the 128 bit boundary for consecutive accesses of 8 bit bytes---and algorithms as well as the compiler should take this into account.

-manfred
August 14, 2007
Manfred Nowak wrote:
> Regan Heath wrote
> 
>> Or am I missing the point entirely?
> 
> I post in the learn-group because do not know exactly what the point is.
> 
> At first glance it seems that the current hardware emulates bytes of size 8 bits on main memory with a granularity of 128 bits.
> 
> If this is true then a severe slowdown should happen by exceeding the 128 bit boundary for consecutive accesses of 8 bit bytes---and algorithms as well as the compiler should take this into account.

It would, if there was no cache. In fact, ALL access from RAM is very slow.
The accessing works like this:

RAM -> L2 cache -> L1 cache -> CPU.

The 128 bits only affects the speed at which you load from RAM into L2 cache.
But normally you try to stay in the L1 cache, so it doesn't matter.

August 15, 2007
Don Clugston wrote

> RAM -> L2 cache -> L1 cache -> CPU

Seems not to be true shortly. Instead:

RAM -> L1 --------------|
       | -> L2----------|
             |-> L3 -> CPU

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2939&p=9


> But normally you try to stay in the L1 cache, so it doesn't matter.

Trying to stay in the L1 cache at least needs information about how big the L1 cache is. Where can this information be entered?

-manfred
August 15, 2007
Manfred Nowak wrote:

> As some may know current DDR dual channel configurations access 128 bits in each transaction.
> 
> How has this to be modelled in D?

Write the code in C, and call it from D ?

At least until it allows 128-bit types...

--anders
August 15, 2007
Don Clugston wrote

>> If this is true then a severe slowdown should happen
> It would, if there was no cache.

Both not confirmed!!!

This table shows the quotient Q for the slow down for some access differences delta, compared to an access difference of 1 Byte.

delta:  Q
------------
    2:  0.98
    3:  1.04
    4:  1.06
    5:  1.20
    6:  1.35
    7:  1.53
    8:  1.68
    9:  1.52
   10:  1.66
   12:  2.23
   14:  2.73
   16:  3.21
   18:  3.64
   20:  4.24
   23:  5.15
   26:  6.25
   29:  7.15
   32:  8.31
   36:  9.31
   40: 10.30
   45: 10.97
   50: 11.89
   56: 12.96
   62: 13.80
   69: 14.37
   76: 14.96
   84: 15.90
   93: 16.62
  103: 16.72
  114: 14.08
  126: 10.49
  139:  9.80
  153:  9.96
  169: 10.12
  186: 10.33
  205: 10.46
  226: 10.82
  249: 10.87
  274: 11.16
  302: 11.55
  333: 11.60
  367: 11.83
  404: 12.26
  445: 12.68
  490: 13.25
  540: 13.13
  595: 12.96
  655: 13.02
  721: 13.40
  794: 13.49
  874: 14.11
  962: 14.72
 1059: 14.89
 1165: 15.23
 1282: 15.52
 1411: 16.05
 1553: 16.78
 1709: 17.44
 1880: 18.11

Tested on an Athlon64 X2 4200, winXP, dmd1.016

-manfred