constness for arrays (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » constness for arrays (page 2)

July 19, 2006

Re: constness for arrays

Posted by Andrew Fedoniouk
in reply to Chad J

Andrew Fedoniouk

Posted in reply to Chad J

>> typedef  string char[]
>> {
>>     disable opAssign;
>>     ....
>>     char[] tolower() { ..... }
>> }
>>

> I like that typedef.  Should be templatable though...
>
> typedef(T) array T[]
> {
>     ...
> }

I think so too. It would be nice to have this but I think that
it is enough to be able to define such types in each perticular case.


I think that such extended typedef makes sense for other basic types:

typedef color uint
{
    uint red() {  .... }
    uint blue() {  .... }
    uint green() {  .... }
}

Also such typedef makes sense for classes too.
To avoid vtbl  pollution. Especially actual for templated classes.


>
> Or some such.  In an earlier post ("Module level operator overloading" at http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/39504) I was hoping for external functions as operator overloads and IFTI to help with things like array operations.  I just didn't know about external functions at the time.  But if this is supposed to replace external functions, how would I do the array op overloads that external functions would help me with?  Would be unfortunate to write something like this...
>
> typedef(T) T[] T[] // mmm what would this do
> {
>   void opAdd(T[] array1, T[] array2)
>   {
>     etc...
>   }
> }

What is the problem with the following:

typedef(T) array T[]
{
   void opAdd(array a1, array a2)
   {

   }
}

?

Andrew Fedoniouk.
http://terrainformatica.com

July 19, 2006

Re: constness for arrays

Posted by Andrew Fedoniouk
in reply to Dave

Andrew Fedoniouk

Posted in reply to Dave

>
> What do you mean by external methods?
>
> This?

Positive.

>
> import std.stdio;
> void main()
> {
>     char[] str = "abc";
>     writefln(str.ucase()); // "ABC"
> }
> char[] ucase(char[] str)
> {
>     foreach(inout char c; str) if(c >= 'a' && c <= 'z') c += 'A' - 'a';
>     return str;
> }
>
> If so, that's not a bug, it's intentional. Line 4141 of expression.c.
>

People are looking in the doc/ language specification first.

Line 4141 of expression.c is the last place where someone will try to find answer on what language features D has.

Andrew Fedoniouk.
http://terrainformatica.com

July 19, 2006

Re: constness for arrays

Posted by Chad J
in reply to Andrew Fedoniouk

Chad J

Posted in reply to Andrew Fedoniouk

Andrew Fedoniouk wrote:
> 
> What is the problem with the following:
> 
> typedef(T) array T[]
> {
>    void opAdd(array a1, array a2)
>    {
> 
>    }
> }
> 
> ?
> 

In the usage.
How do you use it?
Something like this?

array!(short) foo = [1,2,3];
array!(short) bar = [4,5,6];
array!(short) result = foo + bar;
// result is now [5,7,9]

not too bad... how about multidimensional stuff...

array!(array!(short)) foo = [[1,2],[3,4]];
array!(array!(short)) bar = [[5,6],[7,8]];
// um, maybe a matrix multiply or something.  I don't feel like it.

Well doable, but it would be better to have ordinary array syntax. Also, what if some external library passes in an ordinary array that is not set up as one of these types, and you want to use the new fancy features on it:

void gimmeAnArray( short[] foo )
{
  array!(short) bar = array!(short).convert( foo );
  // ugh
  array!(short) bar = array.convert( foo );
  // ahh IFTI is better, but this whole line should be unnecessary IMO.
  ...
}

I suppose the problem I have with your syntax suggestion is that it's impossible to add properties to existing types.  It forces you to define new types to add properties.

July 19, 2006

Re: constness for arrays

Posted by Andrew Fedoniouk
in reply to Chad J

Andrew Fedoniouk

Posted in reply to Chad J

> void gimmeAnArray( short[] foo )
> {
>   array!(short) bar = array!(short).convert( foo );
>   // ugh
>   array!(short) bar = array.convert( foo );
>   // ahh IFTI is better, but this whole line should be unnecessary IMO.
>   ...
> }
>
> I suppose the problem I have with your syntax suggestion is that it's impossible to add properties to existing types.  It forces you to define new types to add properties.

If you will define it as

alias array short[]
{
    ...
    void someNewOp(self) {    }
}

then 1) you can use this someNewOp with it and 2)

void gimmeAnArray( short[] foo )
{
    array bar = foo; // ok
    ...
}

Extended alias allows to extend base types.
Extended typedef allows to extend and to reduce operations of base types.

Andrew Fedoniouk.
http://terrainformatica.com

July 19, 2006

Re: constness for arrays

Posted by Chad J
in reply to Andrew Fedoniouk

Chad J

Posted in reply to Andrew Fedoniouk

Andrew Fedoniouk wrote:
> 
> If you will define it as
> 
> alias array short[]
> {
>     ...
>     void someNewOp(self) {    }
> }
> 
> then 1) you can use this someNewOp with it and 2)
> 
> void gimmeAnArray( short[] foo )
> {
>     array bar = foo; // ok
>     ...
> }
> 
> Extended alias allows to extend base types.
> Extended typedef allows to extend and to reduce operations of base types.
> 
> Andrew Fedoniouk.
> http://terrainformatica.com
> 

I suppose that means I could do something like

alias array short[]
{
  ...
  short[] opAdd( short[] other )
  {
    ...
  }
}

short[] gimmeAnArray( short[] foo )
{
  short[] newArray;
  for ( int i = 0; i < foo.length; i++ )
    newArray ~= i;

  return foo + newArray; // usage of extension on short[]
}

That would be cool.  Though it would be nice if it didn't also stick an "array" type out there (does it?).  Is there anywhere I can find a complete look at what you're proposing?

July 19, 2006

Re: constness for arrays

Posted by Dave
in reply to Andrew Fedoniouk

Dave

Posted in reply to Andrew Fedoniouk

Andrew Fedoniouk wrote:
> 
> People are looking in the doc/ language specification first.
> 
> Line 4141 of expression.c is the last place where someone will
> try to find answer on what language features D has.
> 

I agree; just pointing out that it is there by design even if that design hasn't been codified in the docs. <g>

> Andrew Fedoniouk.
> http://terrainformatica.com

July 19, 2006

Re: constness for arrays

Posted by xs0
in reply to Andrew Fedoniouk

xs0

Posted in reply to Andrew Fedoniouk

Andrew Fedoniouk wrote:
> Dynamic constness versus static (compile time) constness is not new.

Never said it was.

> For example in Ruby you can dynamicly declare object/array readonly and
> its runtime will control all modifications and note - in full as Ruby's sandbox
> (as any other VM based runtime) has all facilities to fully control
> immutability of such objects.

Cool! OTOH, I'm proposing of making the reference readonly, not the data itself.

> In case of runtimes like D (natively compileable) such control is not an
> option.

Because?

> I beleive that proposed runtime flag a) is not a constness in any sense

It's more like readonlyness.

> b) does not solve compile verification of readonlyness and

I said so myself :P But, the question is whether compile-time verification is better or not. In some cases it definitely isn't;

int[] cowFoo(int[] a) { if (whatever) { a=a.dup; a[0] = 5; } }
int[] cowBar(int[] a) { if (something) { a=a.dup; a[1] = 10; } }

int[] result=cowFoo(cowBar(whatever));

How can a compile-time check ever help you avoid the (unnecessary) second .dup when both funcs decide to modify the data?

> c) can be implemented now by defining:
> struct vector
> {
>     bool readonly;
>     T*  data;
>     uint length;
> }

So? How does that help when using built-in arrays?

> Declarative contness prevents data misuse at compile time
> when runtime constness moves problem into execution time
> when is a) too late to do anything and b) expensive.

I disagree. A single .dup probably costs more than tens (if not hundreds) of checks of a single bit (which can even be disabled in release builds). And why would it be too late to do anything?

> I would mention old idea again - real solution would be in creating of
> mechanism of disabling exiting or creating new opertaions
> for intrinsic types.

Start your own thread :P

xs0

July 19, 2006

Re: constness for arrays

Posted by xs0
in reply to xs0

xs0

Posted in reply to xs0

> int[] cowFoo(int[] a) { if (whatever) { a=a.dup; a[0] = 5; } }
> int[] cowBar(int[] a) { if (something) { a=a.dup; a[1] = 10; } }

Of course, both return a as well :)

xs0

July 19, 2006

Re: constness for arrays

Posted by Reiner Pope
in reply to Andrew Fedoniouk

Reiner Pope

Posted in reply to Andrew Fedoniouk

Andrew Fedoniouk wrote:
> Dynamic constness versus static (compile time) constness is not new.
So what?

> In case of runtimes like D (natively compileable) such control is not an
> option.
What do you mean by this? The runtime itself doesn't need to be able to control the code, as we know, such control could be forced at compile time. As to the fact that the runtime could be subverted, well, since we have assembly in D, static const can similarly be converted. If speed issues are the concern, read on.

> I beleive that proposed runtime flag a) is not a constness in any sense
What about the sense that illegal write operations to readonly arrays could be caught in debug builds? That effectively ensures that the arrays are kept *constant*, doesn't it?
> b) does not solve compile verification of readonlyness and
There seem two main arguments for compile time verification of readonlyness: speed and certainty. For reasons outlined below, speed is actually likely to be _greater_ with runtime const than with compile-time const. As for certainty, readonlyness is just one of many bug-catching mechanisms. Others include:
  - Design by Contract (pre- and post- conditions and invariants)
  - Unit testing
  - Typing mechanism (partial type safety)
  - Array bounds checking
  - GC (catches memory and type-safety errors)
All of these checking mechanisms other than type safety are implemented at runtime, yet there is not too much debate about that fact, even though they *could* be checked for at compile time, using theorem proving, (see http://en.wikipedia.org/wiki/SPARK_programming_language for a programming language that does this). The fact that they are checked at runtime means that, like runtime const-ness, the certainty of static checking isn't present. However, it still many more bugs to be caught than no const system at all, and I would even go so far as to say that it would catch *most* const violations if combined with good unit tests.

The main advantage of runtime checking is flexibility/speed, as well as no 'const-pollution', as xs0 put it.

You get the speed gains from avoiding all unnecessary duplications, a feat which simple (a la C++) static const-checking can't achieve. Imagine that we had a static const-checking system in D:

const char[] tolower(const char[] input)
// the input must be const, because we agree with CoW, so we won't change it
// Because of below, we also declare the output of the function const
{
  // do some stuff
  if ( a write is necessary )
  { // copy it into another variable, since we can't change input (it's const)
  }
  return something;
// This something could possibly be input, so it also needs to be declared const. So we go back and make the return value of the function also a const.
}

// Now, since the return value is const, we *must* dup it whenever we call it. This is *very* inefficient if we own the string, because we get two unnecessary dups. This is a big price to pay just to keep static const-checking.


> c) can be implemented now by defining:
> struct vector
> {
>     bool readonly;
>     T*  data;
>     uint length;
> }
Yes and no. It can be implemented like that because that would effectively copy exactly what an array does already, but a) it takes up more memory than what xs0 proposed, and b) it isn't supported natively by the language's arrays, so it is less likely to be used.

> 
> Declarative contness prevents data misuse at compile time
> when runtime constness moves problem into execution time
> when is a) too late to do anything and b) expensive.
a) Testing, especially when assisted by unit testing and the code coverage tool included in DMD, should pick up most, if not all, of the const violations in your code, when you still do have a chance to do something about it. It's impossible to rely on the compiler to pick up all your bugs in any situation.
b) It's not expensive, because it avoids unnecessary duplications and there should be a compiler switch to turn of the readonly checks in release builds, once you're sure of safety. xs0 covered the costs and concluded they weren't many.

> I would mention old idea again - real solution would be in creating of
> mechanism of disabling exiting or creating new opertaions
> for intrinsic types.
> 
> For example string definition might look like as:
> 
> typedef  string char[]
> {
>     disable opAssign;
>     ....
>     char[] tolower() { ..... }
> }
While this could be a useful tool, using this as a form of data-protection is just WAY TOO inflexible, and it removes the areas where D's string (and array) processing is so powerful.



Cheers,

Reiner

July 19, 2006

Re: constness for arrays

Posted by xs0
in reply to Don Clugston

xs0

Posted in reply to Don Clugston

Don Clugston wrote:
> xs0 wrote:
>> - the top bit of arrays' .length becomes an indicator of the readonlyness of the array reference
> 
> This is a really interesting idea. You're essentially chasing a performance benefit, rather than program correctness. Some benchmarks ought to be able to tell you if the performance benefit is real:
> 
> instead of char[], use
> 
> struct CharArray {
>  char [] arr;
>  bool readOnly;
> }
> 
> for both the existing and proposed behaviour (for the existing one, readonly is ignored, but include it to make the parameter passing fair).
> 
> For code that makes heavy use of COW, I suspect that the benefit could be considerable. You probably don't need to eliminate many .dups to pay for the slightly slower .length.

Well, I did a (admittedly biased :) test, and there do seem to be potential large benefits..

I wrote an app that counts different words in the 5.6MB ASCII text from
http://www.gutenberg.org/etext/1581

The text was read and duplicated 10 times (so I could do 10 runs). Then, words were extracted (word := a sequence of alnum chars), lowercased and placed into an AA. I ran each version about 20 times, and here are the fastest results for each:

bib_current    : 3641ms
bib_str        : 3031ms
bib_str (old)  : 3625ms
bib_str (ugly) : 3109ms

Commenting out the AA stuff, the results become

bib_current    : 1281
bib_str        : 812
bib_str (old)  : 1234
bib_str (ugly) : 812

About 11% of the calls to toLower would result in .duping currently, and none do with the new system, as it's not necessary in this particular case. Had I used toUpper, ... :)

bib_current is exactly the same code, except it uses Phobos' tolower and char[] instead of string.

For some reason, bib_current is (slightly) slower even than the string version that does exactly the same thing.. my guess would be that some more inlining/optimization was done in my code..

For some other reason, toLowerUgly is slower than toLowerNew, even though it potentially does less checks.. Probably the benefit of that was lost completely, as words tend to only have the first character uppercase, and more code just slowed the thing down.

Well, anyway, the conclusion would be that using that bit for readonly indication does not cause slowdowns even for code that doesn't use it. If used for COW-only-when-necessary, speed gains can be considerable.


xs0


The code was this (I snipped boring code in the interest of brevity, I can post the full code if someone wants it)

struct string {
    char* ptr;
    uint _length;

    public static string opCall(char[] bu, int readonly) { ... }
    public int length() { return _length & 0x7fffffff }
    public void length(int newlen) { ... }
    public string dup() { ... }
    public char opIndex(int i) { return ptr[i]; }
    public char opIndexAssign(char c, int i) { return ptr[i] = c; }
    public char[] toString() { return ptr[0..length()]; }
    public void wantToWrite() { if (_length & 0x80000000 } { ... } }
    public string slice(int start, int end) { ... }
}

string toLowerOld(string txt)
{
    int l = txt.length;
	
    for (int a=0; a<l; a++) {
        char c = txt[a];
        if (c>='A' && c<='Z') {
            txt = txt.dup;
            txt[a] = c+32;
            for (int b=a+1; b<l; b++) {
                c = txt[b];
                if (c>='A' && c<='Z')
                    txt[b]=c+32;
            }
            return txt;
        }
    }
    return txt;
}

string toLowerNew(string txt)
{
    int l = txt.length;
    for (int a=0; a<l; a++) {
        char c = txt[a];
        if (c>='A' && c<='Z') {	
            txt.wantToWrite();
            txt[a]=c+32;
        }
    }
    return txt;
}

string toLowerUgly(string txt)
{
    int l = txt.length;
    for (int a=0; a<l; a++) {
        char c = txt[a];
        if (c>='A' && c<='Z') {
            txt.wantToWrite();
            txt[a]=c+32;
            for (int b=a+1; b<l; b++) {
                c = txt[b];
                if (c>='A' && c<='Z')
                    txt[b]=c+32;
            }
            return txt;
        }
    }
    return txt;
}

void main()
{
    string[] bible;
    bible.length = 10;
    for (int a=0; a<bible.length; a++) {
        if (a==0) {
            bible[a] = string(cast(char[])read("bible.txt"), 0);
        } else {
            bible[a] = bible[a-1].dup;
        }
    }
    long start = getUTCtime();

    uint result;

    for (int q=0; q<bible.length; q++) {
        string txt = bible[q];

        int[char[]] count;

        int pos = 0;
        while (pos<txt.length) {
            if (!isalnum(txt[pos])) {
                pos++;
                continue;
            }
            int len = 1;
            while (pos+len < txt.length && isalnum(txt[pos+len]))
                len++;

            //string word = toLowerOld(txt.slice(pos, pos+len));
            string word = toLowerNew(txt.slice(pos, pos+len));
            //string word = toLowerUgly(txt.slice(pos, pos+len));
            pos+=len;

            if (auto c = word.toString() in count) {
                (*c)++;
            } else {
                count[word.toString()]=1;
            }
        }
        result = count.length;
    }
    long end = getUTCtime();

    writefln("Different words found: ", result);
    writefln("Time taken: ", (end-start));
}

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation