May 11, 2002
The following causes the compiler to generate:
Internal error: ..\ztc\cgcs.c 348

void foo(inout bit[] test)
{
  test[0] = 1;
  test[1] = 0;
  test[2] = 1;
}

int main(char[][] args)
{

  bit[] a;
  a.length = 32;

  for(int i = 0; i < a.length; ++i)
    a[i] = 0;

  foo(a[3..6]); // foo(a) works ok

  for(int i = 0; i < a.length; ++i)
    printf("%d",a[i]);

  printf("\n");

  return 0;
}
May 11, 2002
"Pavel Minayev" <evilone@omen.ru> wrote in message news:abiv29$2lhf$1@digitaldaemon.com...
> "Patrick Down" <pat@codemoon.com> wrote in message news:Xns920B30A21022patcodemooncom@63.105.9.61...
>
> > Is this right?
> > err.d(2): cannot have out parameter of type bit
>
> There can be no pointers to bits. Since out parameters are implicitly dereferenced pointers, they cannot be bit.
>


So we do need a bool type, byte sized!

--
Stijn
OddesE_XYZ@hotmail.com
http://OddesE.cjb.net
_________________________________________________
Remove _XYZ from my address when replying by mail




May 11, 2002
"OddesE" <OddesE_XYZ@hotmail.com> wrote in news:abjvsl$fem$1 @digitaldaemon.com:
> 
> So we do need a bool type, byte sized!
> 

typedef ubyte bool;

I am sort of doubting the utility of the bit type for flags.  I think I may go back to schemes like this.

enum flags { f1=1, f2=2, f3=4, f4=8, f5=16 }

flags a = flags.f1 | flags.f4;

May 11, 2002
Thanks for reporting this. The bit type is turning out to be a constant special case in the compiler.

"Patrick Down" <pat@codemoon.com> wrote in message news:Xns920B8C31915A4patcodemooncom@63.105.9.61...
> The following causes the compiler to generate:
> Internal error: ..\ztc\cgcs.c 348
>
> void foo(inout bit[] test)
> {
>   test[0] = 1;
>   test[1] = 0;
>   test[2] = 1;
> }
>
> int main(char[][] args)
> {
>
>   bit[] a;
>   a.length = 32;
>
>   for(int i = 0; i < a.length; ++i)
>     a[i] = 0;
>
>   foo(a[3..6]); // foo(a) works ok
>
>   for(int i = 0; i < a.length; ++i)
>     printf("%d",a[i]);
>
>   printf("\n");
>
>   return 0;
> }


May 11, 2002
Where the bit type comes into its own is in very large arrays of bits.

"Patrick Down" <pat@codemoon.com> wrote in message news:Xns920BB36E7D600patcodemooncom@63.105.9.61...
> "OddesE" <OddesE_XYZ@hotmail.com> wrote in news:abjvsl$fem$1 @digitaldaemon.com:
> >
> > So we do need a bool type, byte sized!
> >
>
> typedef ubyte bool;
>
> I am sort of doubting the utility of the bit type for flags.  I think I may go back to schemes like this.
>
> enum flags { f1=1, f2=2, f3=4, f4=8, f5=16 }
>
> flags a = flags.f1 | flags.f4;
>


May 12, 2002
That sucks.  The compiler should do some behind-the-scenes manipulation there so that it *can* store into that bit, instead of telling the programmer (with an error!) that he shouldn't be returning bits from functions.  Either extend the concept of pointer for pointers to bit such that the pointer is not just the pointer to the memory, but also a combined bit index so the result bit can be shifted during the exit code for a function from the return value buffer into the correct bit of whatever it points to.  Slightly larger pointer.  Or have it implemented as a *byte and have the calling code deal with putting the value there and stuffing it back into its rightful place.

But don't just disallow it.  Exceptions such as this will ruin a language. Just have the compiler Do The Right Thing, which is exactly what you asked, nay commanded it to do.  The "how" is an implementation question.  Good programmers avoid using features which are too slow, but whether a particular implementation is acceptable is up to the programmer.  I suppose Walter's D compiler could even implement it to return an error, I just don't think it should be part of the language spec that you can't use bits as out parameters.

Sean

"Pavel Minayev" <evilone@omen.ru> wrote in message news:abiv29$2lhf$1@digitaldaemon.com...
> "Patrick Down" <pat@codemoon.com> wrote in message news:Xns920B30A21022patcodemooncom@63.105.9.61...
>
> > Is this right?
> > err.d(2): cannot have out parameter of type bit
>
> There can be no pointers to bits. Since out parameters are implicitly dereferenced pointers, they cannot be bit.



May 12, 2002
If there's one thing a computer is good at, it's moving, scanning, and otherwise manipulating bits.  Sucks that most languages don't extend down to the bit type... the operations on bit arrays are like "detailed" versions of the standard memcpy, memset, memmove, strchr, etc functions but at the bit level.  And bit arrays are fairly powerful things.  I'd like to see more attention to language features that deal with that kind of thing... maybe for a D 2.0 feature.

The next area I'd focus on is color pixel processing.  The concept of color is fundamental to our interface to the computer, the display subsystem, it almost deserves to be a builtin type.  Some way of specifying operations on a range of array entries and preferrably even extend slicing to rectangular slices of a 2d array, cubic slices of a 3d array... that could get powerful pretty fast.  The compiler could generate SIMD code for that stuff pretty nicely.  Color arrays are palettes, 2d arrays are like images or textures, and 3d arrays are like movies or voxels.  I wonder if C++ has ever considered a "lossy" modifier ("typedef unsigned lossy long color", which would represent a data type that will be dealing with potentially 64 bit values but if it couldn't do that, or it wasn't fast to use longs and you'd selected high optimization settings, it could substitute a smaller type)

Standardizing filters and such (with wrapping!) could make D a great image processing language, wonderful for writing things that manipulate bitmaps. One doesn't have to know much about modern 3D graphics to see that alot of what goes on is specifying operations on millions of pixels by filling triangles with gradients of values and merging that with the existing image data somehow.  D could become the next Renderman or OpenGL.  Maybe too tight a focus but the graphics would sure make a great library.  If we can chain together operations on a source one could parse the language itself for the operations..  source * x + dest * (1 - x)

I'd actually like to see bounded int types in the language.  Saturating int, call it whatever you want.  Something that the hardware "tops out" or clamps to the allowed range before storing into.  Alot of CPU's support this concept, but no language that I know of ever has.  It's something the compiler could do for you automatically (using software emulation if necessary), and would sure be a convenient timesaver.  Pixels usually fall into the category of saturated values

struct Pixel
{
  saturated(0 <= byte <= 255) r,g,b,a;
  or
  saturated(0, 255) byte r,g,b,a;
}

That leads me to another idea  optimizing only areas of the code that it can tell are obvious inner loops (the part that needs optimizing the most). Maybe some kind of depth-first search could traverse the code, optimizing for a given amount of time, and when it's time to run it packs everything it has so far together into an .exe and runs it.

Actually pixel could be a basic type.  There could be pixel which is the ubiquitous 32-byte ARGB/RGBA whatever.  D could also have "named size" types such as pixelARGB1555, pixelRGB565, pixelRGBA4444, etc, etc) actually this is leading back to SIMD support because it's far easier to code this yourself rather than rely on typedefs for the thousands of kinds of pixels that are possible.  So I'd tackle the language feature at a lower level, that of saturating scalar types, chains of math on values of 2d/3d arrays.

How would one describe mathematically what's going on in a typical SIMD situation, such as compositing (blending) one black-and-white image onto another based on the value in a third image.

void composite(in byte[][] image1, in byte[][] image2, in byte[][] mask, out
byte[][] result)
{
  // make sure all arrays are the same size.  Note use of introspection
capability to inquire the dimensions of an array.
  assert(image1.dim(1) == image2.dim(1) == mask.dim(1) && image1.dim(2) ==
image2.dim(2) == mask.dim(2))
  assert(image1.dim == image2.dim);  // shorthand for the above.

  // make the result the same size as the others
  result.resize(image1.dim(1),image2.dim(2))

  // for each value in result (recursively in the case of nested arrays) do
the mixing.
  // writing it this way allows the compiler to write the operation using
SIMD
  result.foreach[[x]]
  {
     result[[x]] = saturate_cast(byte) (image1[[x]] * mask[[x]] +
image2[[x]] * ~mask[[x]]);
  }

  // do the same thing but now just for a slice of the arrays, and using
only the first column of the mask
  result.foreach[[x]][[y]] in [result.dim(1)/4 ..
(result.dim(1)/4)*3][result.dim(2)/4 .. (result.dim(2)/4)*3]
  {
     result[[x]][[y]] = saturate_cast(byte) (image1[[x]][[y] *
mask[[x]][[0]] + image2[[x]][[y]] * ~mask[[x]][[0]]);
  }
}

the compiler can either transform that into
  for (int __i = 0; __i < result.dim(1); ++i)
    for (int __j = 0; __j < result.dim(2); ++j)
  {
     int temp = image1[__i][__j] * mask[__i][__j] + image2 *
~mask[__i][__j];
     result[__i][__j] = (temp < 0 ? 0 : temp > 255 ? 255 : temp);
  }

or into an optimized version of that (pulling those unchanging values out of the loop helps alot) or even directly to SIMD asm code.  That SIMD stuff would kick ass if it was integrated as tightly into the compiler and optimizer as, say, the int type is.  ;)

Anyone who has ever written a 2D memory move or, say, a highly optimized memcpy, knows how difficult it is to write this stuff without bugs and fast, *and* in a general way.  It could however be entirely automated by the compiler.  I think the compiler will have a much easier time changing

  result.foreach[[x]][[y]] in [result.dim(1)/4 ..
(result.dim(1)/4)*3][result.dim(2)/4 .. (result.dim(2)/4)*3]
  {
     result[[x]][[y]] = saturate_cast(byte) (image1[[x]][[y] *
mask[[x]][[0]] + image2[[x]][[y]] * ~mask[[x]][[0]]);
  }

into really tight SIMD code than it would trying to optimize the following (which is the only way to express the idea to the compiler in languages such as C):

  for (int __i = result.dim(1)/4; __i < (result.dim(1)/4)*3; ++i)
    for (int __j = result.dim(2)/4; __j < (result.dim(2)/4)*3; ++j)
  {
     int temp = image1[__i][__j] * mask[__i][__j] + image2 *
~mask[__i][__j];
     result[__i][__j] = (temp < 0 ? 0 : temp > 255 ? 255 : temp);
  }

If the loop limits are known at compile time, or god forbid the address was known at compile time, it could output some nice custom-crafted, highly optimized code utilizing whatever instructions and methods are available. Doing that stuff isn't easy, but I've done it and I know how repetitive and error-prone it is; but it wouldn't be incredibly hard to write a program that writes traversal loops that are well optimized if you start out with the concept that the optimizer would otherwise have to infer:  that you're going to do the exact same operation to a whole bunch of data.  With a language extension, you say this directly.  If you assume the compiler can figure it out for you, it could, but it's much easier for you to just tell it what you mean instead of writing it yourself.

One might be able to approach that level of sophisticated code generation if you had a really badass inline assembler and really badass template system that let you manipulate values and execute logic at compile time, but making the badass inline assembler and badass template system is a lot harder than just correcting a basic language deficiency:  the ability to specify a set of operations on many datum as a single execution unit (leaving the traversal code entirely up to the compiler, which for multidimensional arrays, Walter's work now will save many a programmer countless man-hours of work writing this type of code.  Code would be built custom for each loop probably, and that's ok with me.

Now if we can find a way to integrate the concept of "brush" and "polygon" and "mask" and "blend" and "filter" into the language, we're well on the way to being a useful software rendering tool.

Have you ever written code to do bilinear filtering?  Conceptually it's given 2 inputs and a 2d array, does 4 lookups, 2 "frac" modulo 1.0 operations, and blend the results together using 3 linear interpolations of the form

byte BilinearFilter(byte[][] image, int x_times_16, int y_times_16)
{
  int ix = x_times_16>>4;
  int iy = y_times_16>>4;
  int fx = x_times_16&((1<<4)-1);
  int fy = y_times_16&((1<<4)-1);

  byte ul = image[iy][ix], ur = image[iy][ix+1], bl = image[iy+1][ix], br =
image[iy+1][ix+1];
  byte terp1 = (ul * fx + ur * (15 - fx))>>4;
  byte terp2 = (bl * fx + br * (15 - fx))>>4;
  return (terp1 * fy + terp2 * (15 - fy))>>4;
}

Not sure what kind of language feature would make that kind of mess easier to deal with.

Sean

"Walter" <walter@digitalmars.com> wrote in message news:abk9iv$n2j$3@digitaldaemon.com...
> Where the bit type comes into its own is in very large arrays of bits.
>
> "Patrick Down" <pat@codemoon.com> wrote in message news:Xns920BB36E7D600patcodemooncom@63.105.9.61...
> > "OddesE" <OddesE_XYZ@hotmail.com> wrote in news:abjvsl$fem$1 @digitaldaemon.com:
> > >
> > > So we do need a bool type, byte sized!
> > >
> >
> > typedef ubyte bool;
> >
> > I am sort of doubting the utility of the bit type for flags.  I think I may go back to schemes like this.
> >
> > enum flags { f1=1, f2=2, f3=4, f4=8, f5=16 }
> >
> > flags a = flags.f1 | flags.f4;
> >
>
>


May 12, 2002
Of course.  It's what the other types are built out of.  THE most fundamental type that has storage.  I'm not surprised that it's a special case.  All the implementation has to deal with it as an array contained in a series of fixed-size arrays (the machine word), not as one array.  I think you'll find that if you can write bit-packing code such as dynamic bit array that SIMD actually uses the same concept and in fact it's just an extension of the concept of ints being fixed-sized containers of bits with special operations.  SIMD types are just fixed-sized containers of bytes or ints or floats with special operations.

Wouldn't it be nice to generalize this entire concept into some kind of SIMD type?

You could declare

typedef bit[[32]] machine_word;

or

typedef float[[4]] vector;

then write shorthand

vector a,b,c;

a[[]] += b[[]] * c[[]];

I'm using [[]] as shorthand for "everything in not only this dimension but all smaller dimensions as well".

The compiler would be able to generate pretty optimal code using SIMD even if the type you declare isn't exactly available as a basic SIMD register type.  For instance if you declare

typedef float[[6]] vector6;

vector a,b,c;

a[[]] += b[[]] * c[[]];

and the compiler only had a 4-float SIMD register available it could still use that for part of the operation and either another 4-float SIMD register or 2 FPU registers to do the remainder of the work.

Sean

"Walter" <walter@digitalmars.com> wrote in message news:abk9iu$n2j$2@digitaldaemon.com...
> Thanks for reporting this. The bit type is turning out to be a constant special case in the compiler.
>
> "Patrick Down" <pat@codemoon.com> wrote in message news:Xns920B8C31915A4patcodemooncom@63.105.9.61...
> > The following causes the compiler to generate:
> > Internal error: ..\ztc\cgcs.c 348
> >
> > void foo(inout bit[] test)
> > {
> >   test[0] = 1;
> >   test[1] = 0;
> >   test[2] = 1;
> > }
> >
> > int main(char[][] args)
> > {
> >
> >   bit[] a;
> >   a.length = 32;
> >
> >   for(int i = 0; i < a.length; ++i)
> >     a[i] = 0;
> >
> >   foo(a[3..6]); // foo(a) works ok
> >
> >   for(int i = 0; i < a.length; ++i)
> >     printf("%d",a[i]);
> >
> >   printf("\n");
> >
> >   return 0;
> > }



May 12, 2002
I've been thinking about this and it seems that it might be better to extend the slicing mechanism D already has.

D already has:

typedef float[4] vector;
vector a,b,c;
a[] += b[] * c[];

which the compiler can easily tell what you intend, but what are the rules for what happens when the sizes of the arrays don't match?  What happens if they're multidimensional?

D can also already do this:

a[1..3] += b[1..3] * c[1..3];

If one defined a range like so:

range int r = 1..3;
a[r] += b[r] * c[r];

would work as convenient shorthand.

For multidimensional slicing, it could work as follows:

typedef float[4][4] matrix;
matrix a,b,c;
range(int) i = 1..3;
range(int) j = 1..3;
a[i][j] = 2.0f * sqrt(b[i][j] + c[i][j+1]);

So I guess for this to work ranges would have to be a class A type in the D language.  They could be compile-time constants or variables, or a mixture of the two.  The generated code would obviously benefit from the ranges being compile-time constants, but that shouldn't be necessary.  If one writes an expression with side-effects using a range in place of a value inside an array access, there should be a few rules:

Only one range may be included in any pair of brackets, along with optional integer offset or scale.

The entire expression is evaluated once for each permutation of possibilities in each of the included ranges.  If you have a range R from 0..2 (two possibilities, 0 and 1) and a 3d array int x[][][], then x[R][R][R] += 1 would evaluate to eight increments.

The order of evaluation is undefined however it would be nice if one could control this behavior so that overlapping moves do the right thing.  It should be possible to figure out whether to go forward or backward for each range by evaluating the dependencies in the graph of information flow.  I can't see how it would handle say a matrix transpose gracefully though:

typedef float[4][4] matrix;
matrix a;
range(int) i = 0..4;
range(int) j = 0..4;
swap(a[i][j],a[j][i]);

Writing memcpy() is hard on today's processors.  This job will be far more difficult as what's needed is a program that generates optimized code to execute a loop that deals with alignment and direction and multiple operations instead of writing the one program yourself which only has just copying.  I know memset is about half as difficult to write as memcpy... this sounds like 4 or 5 times harder than writing optimized memcpy.  The unoptimized version is still useful though and it's not hard to write at all.  Walter already has most of what's needed.

Sean


"Sean L. Palmer" <spalmer@iname.com> wrote in message news:abkl59$vpu$1@digitaldaemon.com...
> Of course.  It's what the other types are built out of.  THE most fundamental type that has storage.  I'm not surprised that it's a special case.  All the implementation has to deal with it as an array contained in
a
> series of fixed-size arrays (the machine word), not as one array.  I think you'll find that if you can write bit-packing code such as dynamic bit
array
> that SIMD actually uses the same concept and in fact it's just an
extension
> of the concept of ints being fixed-sized containers of bits with special operations.  SIMD types are just fixed-sized containers of bytes or ints
or
> floats with special operations.
>
> Wouldn't it be nice to generalize this entire concept into some kind of
SIMD
> type?
>
> You could declare
>
> typedef bit[[32]] machine_word;
>
> or
>
> typedef float[[4]] vector;
>
> then write shorthand
>
> vector a,b,c;
>
> a[[]] += b[[]] * c[[]];
>
> I'm using [[]] as shorthand for "everything in not only this dimension but all smaller dimensions as well".
>
> The compiler would be able to generate pretty optimal code using SIMD even if the type you declare isn't exactly available as a basic SIMD register type.  For instance if you declare
>
> typedef float[[6]] vector6;
>
> vector a,b,c;
>
> a[[]] += b[[]] * c[[]];
>
> and the compiler only had a 4-float SIMD register available it could still use that for part of the operation and either another 4-float SIMD
register
> or 2 FPU registers to do the remainder of the work.
>
> Sean
>
> "Walter" <walter@digitalmars.com> wrote in message news:abk9iu$n2j$2@digitaldaemon.com...
> > Thanks for reporting this. The bit type is turning out to be a constant special case in the compiler.
> >
> > "Patrick Down" <pat@codemoon.com> wrote in message news:Xns920B8C31915A4patcodemooncom@63.105.9.61...
> > > The following causes the compiler to generate:
> > > Internal error: ..\ztc\cgcs.c 348
> > >
> > > void foo(inout bit[] test)
> > > {
> > >   test[0] = 1;
> > >   test[1] = 0;
> > >   test[2] = 1;
> > > }
> > >
> > > int main(char[][] args)
> > > {
> > >
> > >   bit[] a;
> > >   a.length = 32;
> > >
> > >   for(int i = 0; i < a.length; ++i)
> > >     a[i] = 0;
> > >
> > >   foo(a[3..6]); // foo(a) works ok
> > >
> > >   for(int i = 0; i < a.length; ++i)
> > >     printf("%d",a[i]);
> > >
> > >   printf("\n");
> > >
> > >   return 0;
> > > }
>
>
>


May 12, 2002
You're far beyond me in graphics code. But one possibility for easilly generating loop code would be to, say, build a simple compiler for a specialized language that generated D code. I use similar kinds of things for building tables. This would be easilly modifiable to generate custom code.