Jump to page: 1 2
Thread overview
[Issue 4475] New: Improving the compiler 'in' associative array can return just a bool
Jan 07, 2012
Stewart Gordon
July 16, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4475

           Summary: Improving the compiler 'in' associative array can
                    return just a bool
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2010-07-16 16:32:54 PDT ---
This is relative to page 18 and 56 of The D Programming Language.

"foo in associativeArray" returns a pointer. So can this work in SafeD too? Maybe there are ways to accept this in SafeD too (if the pointer is not used and just tested if it's null or not), but there is a cleaner alternative solution.

In normal D code there is no need to write this to find the parity of x: int parity = x & 1;

The following operation can be used, that is more readable, because some stage
of compiler is able to optimize this to the first expression:
int parity = x % 2;


The "in" for associative arrays returns a pointer for efficiency reasons, to avoid a double lookup in some situations. But the D1 LDC compiler is now be able to optimize away two "close" associative array lookups in all situations, performing just one lookup.

LDC is probably not able to perform this optimization if the pointer is stored in a variable and used much later, but this is not a common usage pattern, so I think this can be ignored.

If the compiler is able to perform this optimization, there the "in" can return a boolean, and it can be used cleanly in SafeD code too.

So in this case consider returning a boolean and improving the compiler instead. DMD is probably currently (v20.47) not able to perform this optimization.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4475



--- Comment #1 from bearophile_hugs@eml.cc 2010-08-26 16:59:06 PDT ---
See also bug 4625

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 07, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=4475


Stewart Gordon <smjg@iname.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg@iname.com


--- Comment #2 from Stewart Gordon <smjg@iname.com> 2012-01-07 06:21:53 PST ---
From a semantic point of view, in needs to continue to return a pointer in regular D, or a boolean in SafeD.

But if it's well optimised, then in most use cases the generated code would end up the same in both cases.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 07, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=4475



--- Comment #3 from bearophile_hugs@eml.cc 2012-01-07 06:28:08 PST ---
(In reply to comment #2)
> From a semantic point of view, in needs to continue to return a pointer in regular D, or a boolean in SafeD.
> 
> But if it's well optimised, then in most use cases the generated code would end up the same in both cases.

I think "in" returning a pointer is a case of premature optimization. LDC shows that in most real situations a compiler is able to optimize away two nearby calls to the associative array lookup function into a single call. So I think a better design for "in" is to always return a boolean, both in safe and unsafe D code.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 07, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=4475


Alex Rønne Petersen <xtzgzorex@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xtzgzorex@gmail.com


--- Comment #4 from Alex Rønne Petersen <xtzgzorex@gmail.com> 2012-01-07 06:28:41 PST ---
I would be against making 'in' return bool for AAs. I often do:

if (auto x = foo in someAA)
    // do something with *x

Doing a lookup after checking for foo's presence in someAA is ugly compared to this.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 07, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=4475



--- Comment #5 from Alex Rønne Petersen <xtzgzorex@gmail.com> 2012-01-07 06:29:23 PST ---
Furthermore, such a change would break way too much code.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 07, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=4475



--- Comment #6 from bearophile_hugs@eml.cc 2012-01-07 07:18:55 PST ---
(In reply to comment #4)
> I would be against making 'in' return bool for AAs. I often do:
> 
> if (auto x = foo in someAA)
>     // do something with *x
> 
> Doing a lookup after checking for foo's presence in someAA is ugly compared to this.

Ugly is returning a pointer in a language like D where pointers are usually not necessary.

What's bad/ugly in code like this? I think it's more readable:

if (foo in someAA) {
    // do something with someAA[foo]

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 07, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=4475



--- Comment #7 from Alex Rønne Petersen <xtzgzorex@gmail.com> 2012-01-07 07:22:48 PST ---
If you need to use x multiple times inside the if statement's true branch, you end up having to declare a variable, e.g.:

if (foo in someAA)
{
    auto x = someAA[foo];
    someFunction(otherStuff, x, x, moreStuff);
}

As opposed to:

if (auto x = foo in someAA)
    someFunction(otherStuff, *x, *x, moreStuff);

I don't see why pointers are so bad. While, yes, D is a high-level language, it is not C# or Java.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 08, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=4475



--- Comment #8 from bearophile_hugs@eml.cc 2012-01-08 06:47:39 PST ---
(In reply to comment #7)

> I don't see why pointers are so bad. While, yes, D is a high-level language, it is not C# or Java.

Pointers are not evil, but they are usually more bug-prone. An example from simendsjo:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=31482

> aa["a"] = new C();
> auto c = "a" in aa;
> aa["b"] = new C();
> // Using c here is undefined as an element was added to aa

This can't happen if "in" returns a bool.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 15, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=4475


hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh@quickfur.ath.cx


--- Comment #9 from hsteoh@quickfur.ath.cx 2013-08-15 10:06:29 PDT ---
(In reply to comment #8)
[...]
> > aa["a"] = new C();
> > auto c = "a" in aa;
> > aa["b"] = new C();
> > // Using c here is undefined as an element was added to aa
> 
> This can't happen if "in" returns a bool.

Actually, that is not undefined. AA's are designed such that inserting new elements does not invalidate pointers to existing elements. In D, because we have a GC, even if you *delete* elements from AA's, pointers returned by 'in' continue to be valid. This holds even in the event of a rehash, because the pointer points to data in a Slot, and add/remove/rehash only shuffle pointers in the Slot, it doesn't move the Slot around in memory.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2