Thread overview
Re: Using "in" with associative arrays and then indexing them (efficiency)
Jan 03, 2012
Jonathan M Davis
Jan 03, 2012
Timon Gehr
Jan 03, 2012
Jonathan M Davis
Jan 03, 2012
Timon Gehr
Jan 03, 2012
Jonathan M Davis
Jan 03, 2012
Kai Meyer
Jan 03, 2012
Matej Nanut
January 03, 2012
On Tuesday, January 03, 2012 11:52:13 Matej Nanut wrote:
> Hello everyone,
> 
> I would like to know whether
> 
>         if (symbol in symbols)
>                 return symbols[symbol];
> 
> is any less efficient than
> 
>         auto tmp = symbol in symbols;
>         if (tmp !is null)
>                 return *tmp;
> 
> Without optimisation, it looks like the first example searches for `symbol' twice.

Of course it does. in does a search and returns a pointer to the element in the AA (or null if it isn't there). The subscript operator also does a search, returning the element if it's there and blowing up if it's not (OutOfRangeError IIRC without -release and who-knows-what with -release). So, if you use in and then the subscript operator, of course it's going to search twice. Part of the point of using in is to not have to do a double lookup (like you would be doing if AAs had a contains function and you called that prior to using the substript operator).

The correct way to do it is the second way, though you should be able to reduce it to

if(auto tmp = symbol in symbols)
    return *tmp;

- Jonathan M Davis
January 03, 2012
On 01/03/2012 12:07 PM, Jonathan M Davis wrote:
> On Tuesday, January 03, 2012 11:52:13 Matej Nanut wrote:
>> Hello everyone,
>>
>> I would like to know whether
>>
>>          if (symbol in symbols)
>>                  return symbols[symbol];
>>
>> is any less efficient than
>>
>>          auto tmp = symbol in symbols;
>>          if (tmp !is null)
>>                  return *tmp;
>>
>> Without optimisation, it looks like the first example
>> searches for `symbol' twice.
>
> Of course it does. in does a search and returns a pointer to the element in
> the AA (or null if it isn't there). The subscript operator also does a search,
> returning the element if it's there and blowing up if it's not
> (OutOfRangeError IIRC without -release and who-knows-what with -release). So,
> if you use in and then the subscript operator, of course it's going to search
> twice. Part of the point of using in is to not have to do a double lookup
> (like you would be doing if AAs had a contains function and you called that
> prior to using the substript operator).
>
> The correct way to do it is the second way, though you should be able to
> reduce it to
>
> if(auto tmp = symbol in symbols)
>      return *tmp;
>
> - Jonathan M Davis

I think this is the single most ugly thing in the language. IIRC ldc will generate identical code for both code snippets.
January 03, 2012
On Tuesday, January 03, 2012 12:13:45 Timon Gehr wrote:
> On 01/03/2012 12:07 PM, Jonathan M Davis wrote:
> > On Tuesday, January 03, 2012 11:52:13 Matej Nanut wrote:
> >> Hello everyone,
> >> 
> >> I would like to know whether
> >> 
> >>          if (symbol in symbols)
> >> 
> >>                  return symbols[symbol];
> >> 
> >> is any less efficient than
> >> 
> >>          auto tmp = symbol in symbols;
> >>          if (tmp !is null)
> >> 
> >>                  return *tmp;
> >> 
> >> Without optimisation, it looks like the first example searches for `symbol' twice.
> > 
> > Of course it does. in does a search and returns a pointer to the element
> > in the AA (or null if it isn't there). The subscript operator also does
> > a search, returning the element if it's there and blowing up if it's
> > not
> > (OutOfRangeError IIRC without -release and who-knows-what with
> > -release). So, if you use in and then the subscript operator, of course
> > it's going to search twice. Part of the point of using in is to not
> > have to do a double lookup (like you would be doing if AAs had a
> > contains function and you called that prior to using the substript
> > operator).
> > 
> > The correct way to do it is the second way, though you should be able to reduce it to
> > 
> > if(auto tmp = symbol in symbols)
> > 
> >      return *tmp;
> > 
> > - Jonathan M Davis
> 
> I think this is the single most ugly thing in the language. IIRC ldc will generate identical code for both code snippets.

What, declaring variables in if statements? It's fantastic IMHO. It allows you to restrict the scope of the variable to the if statement's scope and still use it in the if's condition. And yes, as far as the assembly goes, the generated code is identical. But the scoping for the variable is most definitely different - it won't exist past the if statement if it's declared in the if's condition - and it saves you a line of code. The reduced scope is the more important of the two though IMHO, as nice as saving a line of code is.

- Jonathan M Davis
January 03, 2012
On 01/03/2012 12:22 PM, Jonathan M Davis wrote:
> On Tuesday, January 03, 2012 12:13:45 Timon Gehr wrote:
>> On 01/03/2012 12:07 PM, Jonathan M Davis wrote:
>>> On Tuesday, January 03, 2012 11:52:13 Matej Nanut wrote:
>>>> Hello everyone,
>>>>
>>>> I would like to know whether
>>>>
>>>>           if (symbol in symbols)
>>>>
>>>>                   return symbols[symbol];
>>>>
>>>> is any less efficient than
>>>>
>>>>           auto tmp = symbol in symbols;
>>>>           if (tmp !is null)
>>>>
>>>>                   return *tmp;
>>>>
>>>> Without optimisation, it looks like the first example
>>>> searches for `symbol' twice.
>>>
>>> Of course it does. in does a search and returns a pointer to the element
>>> in the AA (or null if it isn't there). The subscript operator also does
>>> a search, returning the element if it's there and blowing up if it's
>>> not
>>> (OutOfRangeError IIRC without -release and who-knows-what with
>>> -release). So, if you use in and then the subscript operator, of course
>>> it's going to search twice. Part of the point of using in is to not
>>> have to do a double lookup (like you would be doing if AAs had a
>>> contains function and you called that prior to using the substript
>>> operator).
>>>
>>> The correct way to do it is the second way, though you should be able to
>>> reduce it to
>>>
>>> if(auto tmp = symbol in symbols)
>>>
>>>       return *tmp;
>>>
>>> - Jonathan M Davis
>>
>> I think this is the single most ugly thing in the language. IIRC ldc
>> will generate identical code for both code snippets.
>
> What, declaring variables in if statements? It's fantastic IMHO. It allows you
> to restrict the scope of the variable to the if statement's scope and still
> use it in the if's condition. And yes, as far as the assembly goes, the
> generated code is identical. But the scoping for the variable is most
> definitely different - it won't exist past the if statement if it's declared in
> the if's condition - and it saves you a line of code. The reduced scope is the
> more important of the two though IMHO, as nice as saving a line of code is.
>
> - Jonathan M Davis

No, I love declaring variables in if statements and would like it to be extended to while statements as well. What I meant is the fact that something called 'in' returns a pointer. And the two code snippets I was referring to were the two in Matej's post.
January 03, 2012
On Tuesday, January 03, 2012 12:27:08 Timon Gehr wrote:
> No, I love declaring variables in if statements and would like it to be extended to while statements as well. What I meant is the fact that something called 'in' returns a pointer. And the two code snippets I was referring to were the two in Matej's post.

Those two code snippets can't possibly result in the same code without the compiler assuming that it can safely optimize the first one into the second. Certainly, with a user-defined type, that would be impossible. With the AA, since it's essentially built-in, it may decide that it can make that assumption, but it could definitely result in different behavior if you were dealing with shared or the like, and there's nothing requiring the compiler to make such an optimization.

- Jonathan M Davis
January 03, 2012
On 01/03/2012 04:07 AM, Jonathan M Davis wrote:
> On Tuesday, January 03, 2012 11:52:13 Matej Nanut wrote:
>> Hello everyone,
>>
>> I would like to know whether
>>
>>          if (symbol in symbols)
>>                  return symbols[symbol];
>>
>> is any less efficient than
>>
>>          auto tmp = symbol in symbols;
>>          if (tmp !is null)
>>                  return *tmp;
>>
>> Without optimisation, it looks like the first example
>> searches for `symbol' twice.
>
> Of course it does. in does a search and returns a pointer to the element in
> the AA (or null if it isn't there). The subscript operator also does a search,
> returning the element if it's there and blowing up if it's not
> (OutOfRangeError IIRC without -release and who-knows-what with -release). So,
> if you use in and then the subscript operator, of course it's going to search
> twice. Part of the point of using in is to not have to do a double lookup
> (like you would be doing if AAs had a contains function and you called that
> prior to using the substript operator).
>
> The correct way to do it is the second way, though you should be able to
> reduce it to
>
> if(auto tmp = symbol in symbols)
>      return *tmp;
>
> - Jonathan M Davis

+1

Very slick :)
January 03, 2012
On 3 January 2012 17:58, Kai Meyer <kai@unixlords.com> wrote:
> On 01/03/2012 04:07 AM, Jonathan M Davis wrote:
>> if(auto tmp = symbol in symbols)
>>     return *tmp;
>>
>> - Jonathan M Davis
>
>
> +1
>
> Very slick :)

Yup, I'm going with this one. Thanks!