Array bounds checking causes algorithmic nasties (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Array bounds checking causes algorithmic nasties (page 2)

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Matthew
in reply to Andrew Edwards

Matthew

Posted in reply to Andrew Edwards

"Andrew Edwards" <ridimz_at@yahoo.dot.com> wrote in message news:cd31h6$1901$1@digitaldaemon.com...
> Matthew wrote:
> > "Walter" <newshound@digitalmars.com> wrote in message news:cd2v21$14n3$1@digitaldaemon.com...
> >
> >>"Matthew" <admin@stlsoft.dot.dot.dot.dot.org> wrote in message news:cd2mgr$je4$2@digitaldaemon.com...
> >>
> >>>   ORJRecordA *begin = &m_database.records[0];
> >>>   ORJRecordA *end = &m_database.records[m_database.records.length];
> >>>
> >>>   for(; begin != end; ++begin)
> >>>   {
> >>>        . . .
> >>>
> >>>The process halts with an ArrayBoundsError on the second line above.
> >>>
> >>>The workaround for this is
> >>>
> >>>   ORJRecordA *begin = &m_database.records[0];
> >>>   ORJRecordA *end = begin + m_database.records.length;
> >>>
> >>>but it's hardly what I'd call a good solution. I suppose D doesn't have
> >>
> >>all the
> >>
> >>>horrible pitfalls of C & C++ which mandate that &ar[N] is the only
> >>
> >>portable
> >>
> >>>(between arrays and UDTs, and between compilers) syntax. But I still think
> >>
> >>this
> >>
> >>>chews. I'd rather not have the checking.
> >>
> >>An even better workaround:
> >>    foreach (OBJRecordA o; m_database.records)
> >>    {
> >>        ...
> >>    }
> >
> >
> > Doesn't work, since it gives me copies of the record structures, and I need
their
> > addresses.
> >
>
> My experience has been that I can always get an address(index) of an array by specifying an int counter in the foreach loop.
>
> typedef char[] ORJRecordA;
>
> struct database
> {
>    ORJRecordA[] records;
> }
>
> void main ()
> {
>    database m_database;
>    m_database.records ~= cast(ORJRecordA)"Contents @ Address 0";
>    m_database.records ~= cast(ORJRecordA)"Contents @ Address 1";
>
>    foreach(int address, ORJRecordA rec; m_database.records)
>     printf("%2d: %.*s"\n,address,cast(char[])rec);
> }

Mate, you'll have to explain what you're doing here. This looks like a grand hack. A mighty beguiling one, to be sure, but a hack nonetheless.

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Matthew
in reply to Lars Ivar Igesund

Matthew

Posted in reply to Lars Ivar Igesund

"Lars Ivar Igesund" <larsivar@igesund.net> wrote in message news:cd33il$1c2l$1@digitaldaemon.com...
> Matthew wrote:
>
> > "Lars Ivar Igesund" <larsivar@igesund.net> wrote in message news:cd2op0$n6m$1@digitaldaemon.com...
> >
> >>Matthew wrote:
> >>
> >>>   ORJRecordA *begin = &m_database.records[0];
> >>>   ORJRecordA *end = &m_database.records[m_database.records.length];
> >>>
> >>>   for(; begin != end; ++begin)
> >>>   {
> >>>        . . .
> >>>
> >>>The process halts with an ArrayBoundsError on the second line above.
> >>
> >>Well, the last element should be:
> >>
> >>   m_database.records[m_database.records.length - 1];
> >
> >
> > Not correct. Give it another think. :)
> >
>
> Maybe I misunderstand something, but you get an ArrayBoundsError if you do arr[arr.length];

Yes, that's my point. I may well be wanting to bring too much C++ methodology over, but it's a well-established idiom in C++ to enumerate an array by taking the address of the first element, and the one-past-the-post element. Your correction to m_database.records[m_database.records.length - 1]; would mean that I'd not include the last element - the one at [m_database.records.length - 1] - in the enumeration.

The problem is that an ArrayBoundsError (should that be an exception??) is thrown, even though I'm not using the element, merely its address. This is compounded when the array is empty, since taking the address of the 0th element also causes the Error

My proposal/wish is that the use of ArrayBoundsError checking will be elided in the cases where the element's address is taken. I justify it by suggesting that since the coder is taking the address of the elements, they've already gone for a swim in C-world, and are therefore responsible for themselves.

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Ben Hinkle
in reply to Matthew

Ben Hinkle

Posted in reply to Matthew

Matthew wrote:

>    ORJRecordA *begin = &m_database.records[0];
>    ORJRecordA *end = &m_database.records[m_database.records.length];
> 
>    for(; begin != end; ++begin)
>    {
>         . . .
> 
> The process halts with an ArrayBoundsError on the second line above.
> 
> The workaround for this is
> 
>    ORJRecordA *begin = &m_database.records[0];
>    ORJRecordA *end = begin + m_database.records.length;
> 
> but it's hardly what I'd call a good solution. I suppose D doesn't have all the horrible pitfalls of C & C++ which mandate that &ar[N] is the only portable (between arrays and UDTs, and between compilers) syntax. But I still think this chews. I'd rather not have the checking.

I'd just loop over indices since you get the benefit of array bounds checking on a debug build. I assume on a release build the optimizer can turn it into pointer arithmetic if it wants to so the performance should be the same.

or, you can just always tell the compiler to not do bounds checking (never build debug) - it all depends on how much you want to use C/C++ idioms in D.

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Matthew
in reply to Ben Hinkle

Matthew

Posted in reply to Ben Hinkle

"Ben Hinkle" <bhinkle4@juno.com> wrote in message news:cd3922$1lkm$1@digitaldaemon.com...
> Matthew wrote:
>
> >    ORJRecordA *begin = &m_database.records[0];
> >    ORJRecordA *end = &m_database.records[m_database.records.length];
> >
> >    for(; begin != end; ++begin)
> >    {
> >         . . .
> >
> > The process halts with an ArrayBoundsError on the second line above.
> >
> > The workaround for this is
> >
> >    ORJRecordA *begin = &m_database.records[0];
> >    ORJRecordA *end = begin + m_database.records.length;
> >
> > but it's hardly what I'd call a good solution. I suppose D doesn't have all the horrible pitfalls of C & C++ which mandate that &ar[N] is the only portable (between arrays and UDTs, and between compilers) syntax. But I still think this chews. I'd rather not have the checking.
>
> I'd just loop over indices since you get the benefit of array bounds checking on a debug build. I assume on a release build the optimizer can turn it into pointer arithmetic if it wants to so the performance should be the same.
>
> or, you can just always tell the compiler to not do bounds checking (never build debug) - it all depends on how much you want to use C/C++ idioms in D.

Yeah, I'm pretty sure I'm on a losing wicket with this one, but it's a shame. I'm also looking at supporting pointer-based range algorithms. Given what I've learned with this, I'm even more inclined than I was before to not support them.

Now that's a silver lining! :)

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Matthew
in reply to Matthew

Matthew

Posted in reply to Matthew

"Matthew" <admin@stlsoft.dot.dot.dot.dot.org> wrote in message news:cd3976$1ls2$1@digitaldaemon.com...
>
> "Ben Hinkle" <bhinkle4@juno.com> wrote in message news:cd3922$1lkm$1@digitaldaemon.com...
> > Matthew wrote:
> >
> > >    ORJRecordA *begin = &m_database.records[0];
> > >    ORJRecordA *end = &m_database.records[m_database.records.length];
> > >
> > >    for(; begin != end; ++begin)
> > >    {
> > >         . . .
> > >
> > > The process halts with an ArrayBoundsError on the second line above.
> > >
> > > The workaround for this is
> > >
> > >    ORJRecordA *begin = &m_database.records[0];
> > >    ORJRecordA *end = begin + m_database.records.length;
> > >
> > > but it's hardly what I'd call a good solution. I suppose D doesn't have all the horrible pitfalls of C & C++ which mandate that &ar[N] is the only portable (between arrays and UDTs, and between compilers) syntax. But I still think this chews. I'd rather not have the checking.
> >
> > I'd just loop over indices since you get the benefit of array bounds checking on a debug build. I assume on a release build the optimizer can turn it into pointer arithmetic if it wants to so the performance should be the same.
> >
> > or, you can just always tell the compiler to not do bounds checking (never build debug) - it all depends on how much you want to use C/C++ idioms in D.
>
> Yeah, I'm pretty sure I'm on a losing wicket with this one, but it's a shame.
I'm
> also looking at supporting pointer-based range algorithms.

for DTL (I meant)

 Given what I've
> learned with this, I'm even more inclined than I was before to not support
them.
>
> Now that's a silver lining! :)

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Andrew
in reply to Matthew

Andrew

Posted in reply to Matthew

In article <cd37fn$1iv6$1@digitaldaemon.com>, Matthew says...
>
>
>"Andrew Edwards" <ridimz_at@yahoo.dot.com> wrote in message news:cd31h6$1901$1@digitaldaemon.com...
>>
>> My experience has been that I can always get an address(index) of an array by specifying an int counter in the foreach loop.
>>
>> typedef char[] ORJRecordA;
>>
>> struct database
>> {
>>    ORJRecordA[] records;
>> }
>>
>> void main ()
>> {
>>    database m_database;
>>    m_database.records ~= cast(ORJRecordA)"Contents @ Address 0";
>>    m_database.records ~= cast(ORJRecordA)"Contents @ Address 1";
>>
>>    foreach(int address, ORJRecordA rec; m_database.records)
>>     printf("%2d: %.*s"\n,address,cast(char[])rec);
>> }
>
>Mate, you'll have to explain what you're doing here. This looks like a grand hack. A mighty beguiling one, to be sure, but a hack nonetheless.
>

Simply put, all arrays (including char[]) has both an index and a value at that index location. foreach normally allows access to the value, however you can always access the index by explicitly identifying it.

void main() {
char[] string = "this is a string";

foreach(int idx, char c; string) {
printf("%d: %c\n",idx,c);
string[idx] = c + 1;
}
printf(string);
}

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Walter
in reply to Matthew

Walter

Posted in reply to Matthew

"Matthew" <admin@stlsoft.dot.dot.dot.dot.org> wrote in message news:cd2v5r$14vm$2@digitaldaemon.com...
>
> "Walter" <newshound@digitalmars.com> wrote in message news:cd2v21$14n3$1@digitaldaemon.com...
> > An even better workaround:
> >     foreach (OBJRecordA o; m_database.records)
> >     {
> >         ...
> >     }
>
> Doesn't work, since it gives me copies of the record structures, and I
need their
> addresses.

For what purpose?

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Walter
in reply to Matthew

Walter

Posted in reply to Matthew

I don't know what is happening inside the loop, but what it superficially looks like here is trying to apply C style pointer arithmetic optimizations to D. With foreach, I'll argue that 1) it isn't necessary and 2) using the index form, the optimizer can transform it to the pointer form automatically.

While doing the C pointer form is still possible in D, such as:

ORJRecordA* begin = cast(OBJRecordA*)m_database.records; ORJRecordA* end = begin + m_database.records.length;

I'd argue that one will be better off using foreach or the index form. One reason is that using the pointer form is NOT necessarilly the most efficient. Another is that the pointer form can impair more aggressive optimizations. Using the higher level construct will enable advanced compilers to do a better job of code generation than if the source usurps that by going directly to pointer arithmetic.

[Note: the 'index form' would be:
    for (size_t i = 0; i < m_database.records; i++)
    {
        ... m_database[i] ...;
    }
]

Small D-style nit:

In D, declare pointers as:
    char* p;
rather than the C style:
    char *p;
because in D:
    char* p,q;    // p and q are both pointers to char
whereas in C:
    char *p,q;    // p is a pointer, q is a char

Using whitespace in this way helps illustrate the left-associativity of D's * rather than the right-associativity of C.

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Sean Kelly
in reply to Walter

Sean Kelly

Posted in reply to Walter

In article <cd3voa$2rie$1@digitaldaemon.com>, Walter says...
>
>I'd argue that one will be better off using foreach or the index form. One reason is that using the pointer form is NOT necessarilly the most efficient. Another is that the pointer form can impair more aggressive optimizations. Using the higher level construct will enable advanced compilers to do a better job of code generation than if the source usurps that by going directly to pointer arithmetic.

I've been trying to come up with a situation where the pointer method is necessary... but I can't.  Combined with slicing, foreach can take care of every situation I can think of.  But the slicing is important.  It's necessary to be able to sequence across a subset of the contents of an associative container. Speaking of which, is there any built-in support for multisets?  I haven't tried associating more than one value with a specific key.


Sean

July 14, 2004

Re: Array bounds checking causes algorithmic nasties

Posted by Matthew
in reply to Walter

Matthew

Posted in reply to Walter

"Walter" <newshound@digitalmars.com> wrote in message news:cd3v5d$2qim$1@digitaldaemon.com...
>
> "Matthew" <admin@stlsoft.dot.dot.dot.dot.org> wrote in message news:cd2v5r$14vm$2@digitaldaemon.com...
> >
> > "Walter" <newshound@digitalmars.com> wrote in message news:cd2v21$14n3$1@digitaldaemon.com...
> > > An even better workaround:
> > >     foreach (OBJRecordA o; m_database.records)
> > >     {
> > >         ...
> > >     }
> >
> > Doesn't work, since it gives me copies of the record structures, and I
> need their
> > addresses.
>
> For what purpose?

So that the Record instance can hold a pointer to the underlying ORJRecordA structure, which lives in a contiguous block headed by the ORJDatabaseA structure. (This is one of the nice things about OpenRJ: there are only two memory (re-)allocations in the creation of the database from the database file. In almost all circumstances this amounts to one block, since only other threads might incur an allocation that would require the second ORJ allocation to not expand the original block.)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation