August 25, 2010



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Wed, August 25, 2010 4:47:35 PM
> Subject: Re: [phobos] I think we need to make an emergency release
> 
> On 8/25/10 12:48 PDT, Steve Schveighoffer wrote:
> > One very very notable  difference from the original appender is it does not
>take
> > a pointer as  the parameter.  This means it does not update the original
>array
> >  you pass to it (well, it does, but will not modify the length).
> 
> I don't  think this is good. When you format stuff you do want to append the  formatted result incrementally to the string.

All it does, and I've made these changes in a couple places, is change this:

auto app = appender(&buffer);
...
return buffer;

to this:

auto app = appender(buffer);
...
return app.data;

You say it's needed, but in reality I didn't find a single compelling use case in phobos, and Appender is used all over phobos.  Most cases looked like this:

string buffer;
auto app = appender(&buffer);

and no reference to buffer later.  Those were trivially replaced with this:

auto app = appender!string();

There was one case in stdio.d where appender is used to convert a utf-16 file to utf-8, but appender already has code to take care of that (not added by me). The new code looks less confusing to me, you be the judge:

[steves at steveslaptop phobos]$ svn diff std/stdio.d
Index: std/stdio.d
===================================================================
--- std/stdio.d    (revision 1925)
+++ std/stdio.d    (working copy)
@@ -2099,8 +2099,8 @@
          * Read them and convert to chars.
          */
         static assert(wchar_t.sizeof == 2);
-        auto app = appender(&buf);
-        buf.length = 0;
+        auto app = appender(buf);
+        app.clear();
         for (int c = void; (c = FGETWC(fp)) != -1; )
         {
             if ((c & ~0x7F) == 0)
@@ -2120,11 +2120,13 @@
                     }
                     c = ((c - 0xD7C0) << 10) + (c2 - 0xDC00);
                 }
-                std.utf.encode(buf, c);
+                //std.utf.encode(buf, c);
+                app.put(cast(dchar)c);
             }
         }
         if (ferror(fps))
             StdioException();
+        buf = app.data;
         return buf.length;
     }

@@ -2138,20 +2140,16 @@
          * cases.
          */
       L1:
-        if(buf.ptr is null)
-        {
-            sz = 128;
-            auto p = cast(char*) GC.malloc(sz, GC.BlkAttr.NO_SCAN);
-            buf = p[0 .. 0];
-        } else {
-            buf.length = 0;
-        }
+        auto app = appender(buf);
+        app.clear();
+        if(app.capacity == 0)
+            app.reserve(128); // get at least 128 bytes available

-        auto app = appender(&buf);
         int c;
         while((c = FGETC(fp)) != -1) {
             app.put(cast(char) c);
-            if(buf[$ - 1] == terminator) {
+            if(c == terminator) {
+                buf = app.data;
                 return buf.length;
             }

===========================

I also suspect that the way this function's return value is typed is based on the 'take-an-array-pointer' mode of Appender.  I think it'd be better typed this way:

private char[] readlnImpl(FILE* fps, char[] buf = null, dchar terminator = '\n')

That would get rid of the "set buf before returning" part of the code, you just return app.data.

There were several cases where someone did something like this:

buffer.length = 0;
auto app = appender(&buffer);

or

auto app = appender(&buffer);
buffer.length = 0;

This is trivially replaced with this:

auto app = appender(buffer);
app.clear();

This tells appender, you can own and replace the data in buffer.  It will not append outside of buffer, rather it will realloc as necessary.

> 
> > Other than that, here  is the updated API (with private data shown for an
>idea of
> > how the  implementation works):
> >
> > struct Appender(A : T[], T)
> >  {
> >      private struct Data
> >       {
> >          size_t capacity;
> >           Unqual!(T)[] arr;
> >       }
> >
> >      private Data* _data;
> 
> I think this  looks good - why not make arr of type pointer to array and support append to  existing arrays?

Safety concerns.  You are most likely passing a reference to a stack-based array reference, so if you escape the Appender, you escape the stack-based reference. This isn't strictly necessary because the most important thing is the data. Now, you could escape stack data by passing a reference to a stack-allocated buffer, but I think there's no way around this.

The thing is, when requiring passing a pointer, you are *guaranteeing* the chance for escaping.  Even when you don't care about the original reference. The fact that about half of phobos usages of appender looked like this:

string buffer;
auto app = appender(&buffer); // appender contains a pointer to stack data now!

tells me that people will make this mistake.

> > /**
> > Construct an appender with a given  array.  Note that this does not copy the
> > data, but appending to the  array will copy the data, since Appender does
not
> > know if this data has  valid data residing after it.  The initial capacity
>will
> > be
> >  arr.length.
> > */
> >      this(T[]  arr);
> 
> Appender was partly created to avoid the issue the comment warns  about.

All it means is, don't expect what you do to the appender to be reflected in the original array reference.  I think designing to support this is a huge safety problem, and provides almost no useful benefit.  If you have the appender, you can get the appender's data,  why do you have to use the original reference?. Also note any aliases to the original array reference won't be updated.

-Steve




August 25, 2010
I needed Appender to be a reference type for std.format and std.stdio. The point is to be able to compose a complex stream read into a sequence of simpler reads. This is possible if you use one Appender throughout, but I don't see why preclude people from appending output to their own strings. It's not a difficult to implement feature (my convoluted code notwithstanding), and it's useful. Why decide to not provide it?

Andrei

On 8/25/10 14:13 PDT, Steve Schveighoffer wrote:
>
>
>
>
> ----- Original Message ----
>> From: Andrei Alexandrescu<andrei at erdani.com>
>> To: Discuss the phobos library for D<phobos at puremagic.com>
>> Sent: Wed, August 25, 2010 4:47:35 PM
>> Subject: Re: [phobos] I think we need to make an emergency release
>>
>> On 8/25/10 12:48 PDT, Steve Schveighoffer wrote:
>>> One very very notable  difference from the original appender is it does not
>> take
>>> a pointer as  the parameter.  This means it does not update the original
>> array
>>>   you pass to it (well, it does, but will not modify the length).
>>
>> I don't  think this is good. When you format stuff you do want to append the  formatted result incrementally to the string.
>
> All it does, and I've made these changes in a couple places, is change this:
>
> auto app = appender(&buffer);
> ...
> return buffer;
>
> to this:
>
> auto app = appender(buffer);
> ...
> return app.data;
>
> You say it's needed, but in reality I didn't find a single compelling use case in phobos, and Appender is used all over phobos.  Most cases looked like this:
>
> string buffer;
> auto app = appender(&buffer);
>
> and no reference to buffer later.  Those were trivially replaced with this:
>
> auto app = appender!string();
>
> There was one case in stdio.d where appender is used to convert a utf-16 file to utf-8, but appender already has code to take care of that (not added by me). The new code looks less confusing to me, you be the judge:
>
> [steves at steveslaptop phobos]$ svn diff std/stdio.d
> Index: std/stdio.d
> ===================================================================
> --- std/stdio.d    (revision 1925)
> +++ std/stdio.d    (working copy)
> @@ -2099,8 +2099,8 @@
>            * Read them and convert to chars.
>            */
>           static assert(wchar_t.sizeof == 2);
> -        auto app = appender(&buf);
> -        buf.length = 0;
> +        auto app = appender(buf);
> +        app.clear();
>           for (int c = void; (c = FGETWC(fp)) != -1; )
>           {
>               if ((c&  ~0x7F) == 0)
> @@ -2120,11 +2120,13 @@
>                       }
>                       c = ((c - 0xD7C0)<<  10) + (c2 - 0xDC00);
>                   }
> -                std.utf.encode(buf, c);
> +                //std.utf.encode(buf, c);
> +                app.put(cast(dchar)c);
>               }
>           }
>           if (ferror(fps))
>               StdioException();
> +        buf = app.data;
>           return buf.length;
>       }
>
> @@ -2138,20 +2140,16 @@
>            * cases.
>            */
>         L1:
> -        if(buf.ptr is null)
> -        {
> -            sz = 128;
> -            auto p = cast(char*) GC.malloc(sz, GC.BlkAttr.NO_SCAN);
> -            buf = p[0 .. 0];
> -        } else {
> -            buf.length = 0;
> -        }
> +        auto app = appender(buf);
> +        app.clear();
> +        if(app.capacity == 0)
> +            app.reserve(128); // get at least 128 bytes available
>
> -        auto app = appender(&buf);
>           int c;
>           while((c = FGETC(fp)) != -1) {
>               app.put(cast(char) c);
> -            if(buf[$ - 1] == terminator) {
> +            if(c == terminator) {
> +                buf = app.data;
>                   return buf.length;
>               }
>
> ===========================
>
> I also suspect that the way this function's return value is typed is based on the 'take-an-array-pointer' mode of Appender.  I think it'd be better typed this way:
>
> private char[] readlnImpl(FILE* fps, char[] buf = null, dchar terminator = '\n')
>
> That would get rid of the "set buf before returning" part of the code, you just return app.data.
>
> There were several cases where someone did something like this:
>
> buffer.length = 0;
> auto app = appender(&buffer);
>
> or
>
> auto app = appender(&buffer);
> buffer.length = 0;
>
> This is trivially replaced with this:
>
> auto app = appender(buffer);
> app.clear();
>
> This tells appender, you can own and replace the data in buffer.  It will not append outside of buffer, rather it will realloc as necessary.
>
>>
>>> Other than that, here  is the updated API (with private data shown for an
>> idea of
>>> how the  implementation works):
>>>
>>> struct Appender(A : T[], T)
>>>   {
>>>       private struct Data
>>>        {
>>>           size_t capacity;
>>>            Unqual!(T)[] arr;
>>>        }
>>>
>>>       private Data* _data;
>>
>> I think this  looks good - why not make arr of type pointer to array and support append to  existing arrays?
>
> Safety concerns.  You are most likely passing a reference to a stack-based array reference, so if you escape the Appender, you escape the stack-based reference. This isn't strictly necessary because the most important thing is the data. Now, you could escape stack data by passing a reference to a stack-allocated buffer, but I think there's no way around this.
>
> The thing is, when requiring passing a pointer, you are *guaranteeing* the chance for escaping.  Even when you don't care about the original reference. The fact that about half of phobos usages of appender looked like this:
>
> string buffer;
> auto app = appender(&buffer); // appender contains a pointer to stack data now!
>
> tells me that people will make this mistake.
>
>>> /**
>>> Construct an appender with a given  array.  Note that this does not copy the
>>> data, but appending to the  array will copy the data, since Appender does
> not
>>> know if this data has  valid data residing after it.  The initial capacity
>> will
>>> be
>>>   arr.length.
>>> */
>>>       this(T[]  arr);
>>
>> Appender was partly created to avoid the issue the comment warns  about.
>
> All it means is, don't expect what you do to the appender to be reflected in the original array reference.  I think designing to support this is a huge safety problem, and provides almost no useful benefit.  If you have the appender, you can get the appender's data,  why do you have to use the original reference?. Also note any aliases to the original array reference won't be updated.
>
> -Steve
>
>
>
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
August 25, 2010



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Wed, August 25, 2010 5:36:24 PM
> Subject: Re: [phobos] I think we need to make an emergency release
> 
> I needed Appender to be a reference type for std.format and std.stdio. The  point is to be able to compose a complex stream read into a sequence of  simpler reads. This is possible if you use one Appender throughout, but I  don't see why preclude people from appending output to their own strings.  It's not a difficult to implement feature (my convoluted code notwithstanding), and it's useful. Why decide to not provide  it?

It is a reference type.  There's only one pointer in the struct, that points to the GC-allocated private Data struct.  Or maybe I didn't understand you?

-Steve




August 25, 2010
On 8/25/10 14:39 PDT, Steve Schveighoffer wrote:
> ----- Original Message ----
>> From: Andrei Alexandrescu<andrei at erdani.com>
>> To: Discuss the phobos library for D<phobos at puremagic.com>
>> Sent: Wed, August 25, 2010 5:36:24 PM
>> Subject: Re: [phobos] I think we need to make an emergency release
>>
>> I needed Appender to be a reference type for std.format and std.stdio. The  point is to be able to compose a complex stream read into a sequence of  simpler reads. This is possible if you use one Appender throughout, but I  don't see why preclude people from appending output to their own strings.  It's not a difficult to implement feature (my convoluted code notwithstanding), and it's useful. Why decide to not provide  it?
>
> It is a reference type.  There's only one pointer in the struct, that points to the GC-allocated private Data struct.  Or maybe I didn't understand you?

I know it is a reference type (I was just explaining the change I introduced). The issue with your appender is that it makes it impossible for a user to append to an existing string and have the change reflected at all times in the string.

Andrei
August 25, 2010
This broke off accidentally into a private conversation...



----- Forwarded Message ----
> From: Steve Schveighoffer <schveiguy at yahoo.com>
> To: Andrei Alexandrescu <andrei at erdani.com>
> Sent: Wed, August 25, 2010 5:54:09 PM
> Subject: Re: [phobos] I think we need to make an emergency release
> 
> 
> 
> 
> 
> ----- Original Message ----
> > From: Andrei Alexandrescu  <andrei at erdani.com>
> >
> > On  8/25/10 14:39 PDT, Steve Schveighoffer wrote:
> > > ----- Original  Message  ----
> > >> From: Andrei Alexandrescu<andrei at erdani.com>
> > >> 
> > >> I needed Appender  to be a reference type for std.format  and std.stdio. The  point  is to be able to compose a  complex stream read into a
>sequence
> > >>  of  simpler  reads. This is possible if you use one Appender
throughout,
> >  >> but I  don't see why preclude people from appending  output  to their
own
> > >> strings.  It's not a difficult to  implement  feature (my convoluted code notwithstanding),  and it's useful. Why  decide to not provide  it?
> > > 
> >  > It is a reference type.   There's only one pointer in the struct, that
>points
>
> >to
> > > the GC-allocated  private Data struct.   Or maybe I didn't understand you?
> > 
> > I know it is  a  reference type (I was just explaining the change I
>introduced).
>
> >The  issue with  your appender is that it makes it impossible for a user to append to an existing  string and have the change reflected at all  times in
>the
>
> >string.
> 
> Well, I didn't find any cases  requiring that in std.format or std.stdio, but they were mostly in  unittests.  Do you have a better example?  Note that performing  things like builtin appends on an array allocated by Appender are
>not
>
> a good  idea, since appender completely ignores and does not update the stored

> length properly.
> 
> Even an explanation of what you plan to do with the  data at the same time you

> are appending would be good in lieu of an  example.  Maybe we can find a better
>
> design, or provide an unsafe  function to get a reference to the GC-allocated array reference for those  who want to live on the edge.  I just dislike the *requirement* of  stepping into unsafe territory just to use Appender.  It
>seems
>
> to go  against the spirit of most of Phobos.  IMO the default should be the
>safe
>
> way.
> 
> -Steve
> 
> 
> 
> 
> 



August 26, 2010
On 8/25/10 14:54 PDT, Steve Schveighoffer wrote:
> Well, I didn't find any cases requiring that in std.format or std.stdio, but they were mostly in unittests.  Do you have a better example?  Note that performing things like builtin appends on an array allocated by Appender are not a good idea, since appender completely ignores and does not update the stored length properly.

Agreed.

> Even an explanation of what you plan to do with the data at the same time you are appending would be good in lieu of an example.  Maybe we can find a better design, or provide an unsafe function to get a reference to the GC-allocated array reference for those who want to live on the edge.  I just dislike the *requirement* of stepping into unsafe territory just to use Appender.  It seems to go against the spirit of most of Phobos.  IMO the default should be the safe way.

I think it's fine to require people who use Appender to re-read the string after having used it. So feel free to proceed with your design.


Andrei
August 26, 2010
Done: http://www.dsource.org/projects/phobos/changeset/1929

Let me know what you think.

BTW, can someone who does regular compiling on Windows test this?  I know Appender is ok, but I'm unsure if std/stdio.d compiles right, I updated it, but I don't have a full installation on a Windows box.

Thanks

-Steve



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Thu, August 26, 2010 3:32:14 AM
> Subject: Re: [phobos] I think we need to make an emergency release
> 
> On 8/25/10 14:54 PDT, Steve Schveighoffer wrote:
> > Well, I didn't find any  cases requiring that in std.format or std.stdio,
but
> > they were mostly in  unittests.  Do you have a better example?  Note that
> >  performing things like builtin appends on an array allocated by Appender are
>not
> > a good idea, since appender completely ignores and does not update  the
>stored
> > length properly.
> 
> Agreed.
> 
> > Even an  explanation of what you plan to do with the data at the same time
>you
> >  are appending would be good in lieu of an example.  Maybe we can find a
>better
> > design, or provide an unsafe function to get a reference to the
GC-allocated
> > array reference for those who want to live on the  edge.  I just dislike the *requirement* of stepping into unsafe  territory just to use Appender.  It
>seems
> > to go against the spirit  of most of Phobos.  IMO the default should be the
>safe
> >  way.
> 
> I think it's fine to require people who use Appender to re-read the  string
>after having used it. So feel free to proceed with your  design.
> 
> 
> Andrei
> _______________________________________________
> phobos  mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
> 



August 27, 2010

Don Clugston wrote:
> BTW, one compiler bug popped up in the last release, too. Fairly
> obscure though. I created a patch:
> 4655 Regression(1.063, 2.048) goto to a try block ICEs
>
> I also made a patch for this compiler bug from three releases back: 4302 Regression(1.061, 2.046) compiler errors using startsWith in CTFE
>
> 

Done.
1 2
Next ›   Last »