May 31, 2005
In article <d7ifkp$2goj$1@digitaldaemon.com>, Brad Beveridge says...
>
>Do you have any ideas on how this could work with basic data types and arrays?

I don't know that this is worth doing for POD types because they're copied when passed as 'in' parameters anyway.  But I suppose strings could be an issue.  One possibility would be to fingerprint the memory using a checksum and verify that fingerprint in the out clause.  Perhaps someone has a better suggestion?

>With classes, is it possible for the compiler to generate accessors for all members automatically?

Certainly.  If DBC became a popular means for verifying const correctness I'd suggest that the compiler offer a means to do this.  I don't think it would be too terribly complicated, but I haven't given it much thought.

>That would ease implementation
>details.  Perhaps a template could be created so that class code could
>look like
>class C
>{
>	bit mutable;
>	mixin constcapable!(int) someInt;
>}

Definately a possibility.  My only concern with the DBC method is that it requires the library writer to build stuff in to support it to keep the code clean.  The client could do the switching instead:

c.mutable = false;
func( c, d );
c.mutable = true;

but this is obviously pretty clunky.  It might be possible to do this with auto classes:

# auto class SetConst(T)
# {
#     this( T obj )
#     {
#         m_mutable = obj.mutable;
#         m_obj = obj;
#         m_obj.mutable = false;
#         printf( "ctor\n" );
#     }
#
#     ~this()
#     {
#         m_obj.mutable = m_mutable;
#         printf( "dtor\n" );
#     }
#
#     T opCast() { return m_obj; }
#
# private:
#     bit m_mutable;
#     T   m_obj;
# }
#
# void func( C val )
# {
#
# }
#
# void main()
# {
#     C c = new C();
#     printf( "pre\n" );
#     func( cast(C)(new SetConst!(C)( c )) );
#     printf( "post\n" );
# }

but this is very clunky and doesn't even offer the potential to streamline it because of auto lifetime rules, though it's worth noting that the above example prints this:

pre
ctor
post
dtor

and wrapping the function call in its own scope doesn't help:

{ func( cast(C)(new SetConst!(C)( c )) ); }

it's little things like this that has me cursing the restriction that structs can't have ctors.  Not only does the above code require a completely pointless memory allocation, but the lifetime of the class isn't even what it should be (though I'd consider this latter issue to be a bug).


Sean


June 01, 2005
"Eugene Pelekhay" <pelekhay@gmail.com> wrote in message news:d7hfuh$1ejl$1@digitaldaemon.com...
> May be I'm dummy, but I don't see in this example why this other languages must copy it 10 times. For my implementation of reference counted string in my C++ project, copy will be performed also 0 times. And if there is more then 1 reference to instance exsits it's only one copy operation will be performed. I see only one advantage in current implementation of string - not need to check or increment/decrement reference counter, but instead of this string duplication is required

You're right that you can avoid excessive copying by doing ref counting.

Reference counting carries with it other penalties - storage must be allocated for the ref count, every copy increments the count, and every reference that goes out of scope must decrement the count. Add in exception handling, and the price is high (although C++'s mechanisms hide that price from you).

Ref counting would make it impractical to do D's array slices.

Furthermore, in the presence of garbage collection, layering on top a reference counting mechanism probably means you'll want to ditch the gc and go with a full ref counting architecture for every object. In my experience, such is slower than using mark/sweep gc.


June 01, 2005
"Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d7h66r$14s5$1@digitaldaemon.com...
>
> And what will be your advice then for:
>
> class Url {
>   char[] _hostname;
>   char[] hostname() { return _hostname; }
> }
>
> _hostname should not be changeable nor intentionally
> nor accidentally.
> hostname access pattern is primarily read. But it could possibly be
> passed in some third party functions.
>
> I am serious. I really want to know how to design it better.

Third party functions should follow the COW principle too. They should not modify strings that they don't know they are the owner of. Look at std.string.tolower for an example of this.


> I am remebering old good days of C programming with these char[]s.
> Damned fast but not maintainable.
> In C++ I have my own nice tool::string with reliable
> copy-on-write..... sigh.

The C++ std::string is slower than D strings. www.digitalmars.com/d/cppstrings.html

C++ strings have another serious problem (that D doesn't have): you have to keep track of who the owner is, so it can be deleted (else you get a memory leak). In my experience, that is a LOT harder to get right than adhering to COW. Trying to absolutely determine ownership, like C++ does, is much harder than just being able to assume you don't own it.


June 01, 2005
On Tue, 31 May 2005 19:37:03 -0700, Walter wrote:

> "Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d7h66r$14s5$1@digitaldaemon.com...
>>
>> And what will be your advice then for:
>>
>> class Url {
>>   char[] _hostname;
>>   char[] hostname() { return _hostname; }
>> }
>>
>> _hostname should not be changeable nor intentionally
>> nor accidentally.
>> hostname access pattern is primarily read. But it could possibly be
>> passed in some third party functions.
>>
>> I am serious. I really want to know how to design it better.
> 
> Third party functions should follow the COW principle too. They should not modify strings that they don't know they are the owner of.

Yes, and cyclists shouldn't run red lights either.

We have to code in a world in which many people using our libraries don't care about what they 'should' do; they use anything that seems like an expedient idea at the time. Yes I know that not following the CoW rules is dangerous, but its not as dangerous as cyclists running red lights and they continue to do that.

-- 
Derek
Melbourne, Australia
1/06/2005 12:56:20 PM
June 01, 2005
"Derek Parnell" <derek@psych.ward> wrote in message news:7aqw8u524dge$.1hmvgp4jz3dvc.dlg@40tude.net...
> On Tue, 31 May 2005 19:37:03 -0700, Walter wrote:
>
>> "Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d7h66r$14s5$1@digitaldaemon.com...
>>>
>>> And what will be your advice then for:
>>>
>>> class Url {
>>>   char[] _hostname;
>>>   char[] hostname() { return _hostname; }
>>> }
>>>
>>> _hostname should not be changeable nor intentionally
>>> nor accidentally.
>>> hostname access pattern is primarily read. But it could possibly be
>>> passed in some third party functions.
>>>
>>> I am serious. I really want to know how to design it better.
>>
>> Third party functions should follow the COW principle too. They should
>> not
>> modify strings that they don't know they are the owner of.
>
> Yes, and cyclists shouldn't run red lights either.
>
> We have to code in a world in which many people using our libraries don't
> care about what they 'should' do; they use anything that seems like an
> expedient idea at the time. Yes I know that not following the CoW rules is
> dangerous, but its not as dangerous as cyclists running red lights and
> they
> continue to do that.

hmm. around here it isn't the cyclists that run red lights - it's the things with 4 wheels and that unused pedal called the "brake". :-P

But more to topic I'm with Walter that when you look at the big picture COW is a reasonable balance of trade-offs. The only suggestion I have is to put COW more front-and-center in the array help so that people see it from the start and it becomes second nature. Compiler protection against malicious code isn't that important to me since people will go out of their way to write malicious code no matter what the compiler does. I'm more worried about the accidental D-newbie who doesn't know about arrays or COW. For those cases talking about COW right away in the doc will decrease the likelihood of newbie errors.


June 01, 2005
Ben Hinkle wrote:
>>>
>>>>And what will be your advice then for:
>>>>
>>>>class Url {
>>>>  char[] _hostname;
>>>>  char[] hostname() { return _hostname; }
>>>>}
>>>>
>>>>_hostname should not be changeable nor intentionally
>>>>nor accidentally.
>>>>hostname access pattern is primarily read. But it could possibly be
>>>>passed in some third party functions.
>>>>
>>>>I am serious. I really want to know how to design it better.
>>>
>>>Third party functions should follow the COW principle too. They should not
>>>modify strings that they don't know they are the owner of.
>>
>>Yes, and cyclists shouldn't run red lights either.
>>
>>We have to code in a world in which many people using our libraries don't
>>care about what they 'should' do; they use anything that seems like an
>>expedient idea at the time. Yes I know that not following the CoW rules is
>>dangerous, but its not as dangerous as cyclists running red lights and they
>>continue to do that.
> 
> 
> hmm. around here it isn't the cyclists that run red lights - it's the things with 4 wheels and that unused pedal called the "brake". :-P
> 
> But more to topic I'm with Walter that when you look at the big picture COW is a reasonable balance of trade-offs. The only suggestion I have is to put COW more front-and-center in the array help so that people see it from the start and it becomes second nature. Compiler protection against malicious code isn't that important to me since people will go out of their way to write malicious code no matter what the compiler does. I'm more worried about the accidental D-newbie who doesn't know about arrays or COW. For those cases talking about COW right away in the doc will decrease the likelihood of newbie errors. 


Ben; Walter;

I think perhaps you're missing a significant point being made? CoW is not the issue at stake ~ instead, what's being asked for is a mechanism to /enforce/ CoW.

For example: the little example above should not be dup'ing the content before return, if it's only being used for reference (read-only) purposes by both parties (caller and callee). I think we can all agree on that? Yes?

What's being asked for is a means whereby the compiler will 'prohibit' some other caller from using the returned array as a writable lValue; at compile time. That is, the CoW should be performed by the caller (not the callee), /if and when the caller needs to perform a write upon it/. And only at that time.

Again, CoW is not being questioned. It's the total lack of enforcement that would be good to do something about. The compiler goes out of its way to catch out-of-bounds errors WRT arrays ~ we're asking for something similar here to avoid a source of silly, easily preventable, and hard to track down bugs. It would add some noticable weight to any story regarding robustness.

Turn things around for a minute, and assume such a facility was available. It's not hard to see how this would be viewed in a most favourable light. And there's no downside for the code, or for the developer. Best of all worlds?

June 01, 2005
"Walter" <newshound@digitalmars.com> wrote in message news:d7j6na$d2f$1@digitaldaemon.com...
>
> "Eugene Pelekhay" <pelekhay@gmail.com> wrote in message news:d7hfuh$1ejl$1@digitaldaemon.com...
>> May be I'm dummy, but I don't see in this example why this other languages must copy it 10 times. For my implementation of reference counted string in my C++ project, copy will be performed also 0 times. And if there is more then 1 reference to instance exsits it's only one copy operation will be performed. I see only one advantage in current implementation of string - not need to check or increment/decrement reference counter, but instead of this string duplication is required
>
> You're right that you can avoid excessive copying by doing ref counting.
>
> Reference counting carries with it other penalties - storage must be
> allocated for the ref count, every copy increments the count, and every
> reference that goes out of scope must decrement the count. Add in
> exception
> handling, and the price is high (although C++'s mechanisms hide that price
> from you).
>
> Ref counting would make it impractical to do D's array slices.
>
> Furthermore, in the presence of garbage collection, layering on top a
> reference counting mechanism probably means you'll want to ditch the gc
> and
> go with a full ref counting architecture for every object. In my
> experience,
> such is slower than using mark/sweep gc.
>
>

Yes, GC does good job. In some places. In other places ref-counting is
better.
Ideal language shall allow to use both. Dot.
Everything has its own price:
as much objects allocated (your .dup advice) as slow their scanning will be
by GC.
And I am not  sure what is faster in fact in big picture - ref-counting
for strings or GC.
So far Java and C# are not demonstrating nothing spectacular here - it is
rather
defeat. At least in real life projects I can test by hands. But in abstract
tests - everything
is just perfect.
I know only one: as less GC cycle as better. As it locks everything
and at unpredictable moment. ref-counting has price but
this price is acceptable as it is predictable and accountible and
equally spreaded.

The best solution is as always - in the middle - in the balance between GC and not-GC.

If I have vector of passive elements (chars) I would go with
ref-countng for creating envelope safe to pass back and forth.
If I have container of active elements (objects) with
complex and sometimes unknown system of relationship
I'll go with GC to avoid headaches with cyclic references and so on
and broken pointers.

Strings are strange types, they are both : wave and particle - scalar and aggregate at the same time.

String as a wrapper-owner of character buffer allows
somehow (not ideally!) to work with the string using its both
forms, balancing between str1 = str2, str1 == str2 and
str1.ptr == str2.ptr.

Back to const.
Having bultin arrays and slicing now creates *prerequisites*
of optimal or suboptimal string handling.
But e.g. slicing is just nothing without const ( for strings especially).
See: I've found some string fragment and passed it to some function.
This function does something and is passing it further. All these
functions were built with good intentions and good programmers.
But these programmers live in 12 hours timezone shift .
The only one feasible way for them is self documenting code.
Someone thinked that this particular string is safe to zero
terminate it. Everything is ruined. To find source of it is not trivial.
I bet that second time when it will happen D will be dead
for the project. When it happened for me first time
I've decided to do a string wrapper emulating constness.
JUST NO WAY IN D. not technically nor theoretically.
Neither '=' overload (to implement ownership and refcounting)
nor const. Nothing.  Dead corner.

char[] is not a string - it is array of chars.

Pattern of string use is quite different from array.
As a rule array is a heart of some container and pretty
frequently already wrapped. But strings are flying
everywhere. D shall have const for arrays and pointers
to be considered as a language for teams and serious
projects.

IMHO.




June 02, 2005
On Mon, 30 May 2005 22:50:29 -0700, Andrew Fedoniouk wrote:


[snip]

> class Url
> {
>    char[] _hostname;
>    ...
>    char[] hostname() { return _hostname.dup; } // Doh!
> }
> 
> if( url.hostname == "terrainformatica.com" )
> // 32 bytes less in memory, just to compare it!
>   ....
> 
> Ideal from many points of view would be a solution with const
> 
> class Url {
>   char[] _hostname;
> 
>   const char[] hostname() { return _hostname; } // Yep! this exactly what we
> need.
> 
> }
> 

Given the current semantics of D, could a workaround be that we give the caller the choice, thus making them take explicit responsibility for their usage.

 class Url
 {
    private char[] _hostname;
    ...
    char[] hostname_unsafe() { return _hostname; }
    char[] hostname()   { return _hostname.dup; }
 }

 char[] a;
 char[] b;
 a = url.hostname; // Gets a string with safety
 b = url.hostname_unsafe; // Gets a string without safety

<offtopic>
Of course, if we had return type function matching this would be a whole
lot easier and legible.

 typedef char[] safe_string;

 class Url
 {
    private char[] _hostname;
    ...
    char[] hostname()      { return _hostname; }
    safe_string hostname() { return cast(safe_string)_hostname.dup; }
 }

 safe_string a;
 char[] b;
 a = url.hostname; // Gets a string with safety
 b = url.hostname; // Gets a string without safety

But I should wake up from this dream now ... -)
</offtopic>

-- 
Derek
Melbourne, Australia
2/06/2005 10:12:16 AM
June 02, 2005
Well put.  Again, I think a point has been made for having a facility in the language to say "this thing shouldn't change value".  I understand that some devious programmer can find a way to change something that the compiler verified shouldn't be changed.  But I think that programmer is in the minority and the majority of programmers could use some self-documenting help from the language/compiler (and the devious programmer is specifically and intentionally going outside of the program specification, which I'd reckon could somehow be done in any language).

Again, my $0.02.  But I think many have put in their $0.02 and will continue to do so because many believe it's an important concept.  When will all the $0.02 contributions add up to be enough?

-Kramer

In article <d7li3v$2psu$1@digitaldaemon.com>, Andrew Fedoniouk says...
>
>
>"Walter" <newshound@digitalmars.com> wrote in message news:d7j6na$d2f$1@digitaldaemon.com...
>>
>> "Eugene Pelekhay" <pelekhay@gmail.com> wrote in message news:d7hfuh$1ejl$1@digitaldaemon.com...
>>> May be I'm dummy, but I don't see in this example why this other languages must copy it 10 times. For my implementation of reference counted string in my C++ project, copy will be performed also 0 times. And if there is more then 1 reference to instance exsits it's only one copy operation will be performed. I see only one advantage in current implementation of string - not need to check or increment/decrement reference counter, but instead of this string duplication is required
>>
>> You're right that you can avoid excessive copying by doing ref counting.
>>
>> Reference counting carries with it other penalties - storage must be
>> allocated for the ref count, every copy increments the count, and every
>> reference that goes out of scope must decrement the count. Add in
>> exception
>> handling, and the price is high (although C++'s mechanisms hide that price
>> from you).
>>
>> Ref counting would make it impractical to do D's array slices.
>>
>> Furthermore, in the presence of garbage collection, layering on top a
>> reference counting mechanism probably means you'll want to ditch the gc
>> and
>> go with a full ref counting architecture for every object. In my
>> experience,
>> such is slower than using mark/sweep gc.
>>
>>
>
>Yes, GC does good job. In some places. In other places ref-counting is
>better.
>Ideal language shall allow to use both. Dot.
>Everything has its own price:
>as much objects allocated (your .dup advice) as slow their scanning will be
>by GC.
>And I am not  sure what is faster in fact in big picture - ref-counting
>for strings or GC.
>So far Java and C# are not demonstrating nothing spectacular here - it is
>rather
>defeat. At least in real life projects I can test by hands. But in abstract
>tests - everything
>is just perfect.
>I know only one: as less GC cycle as better. As it locks everything
>and at unpredictable moment. ref-counting has price but
>this price is acceptable as it is predictable and accountible and
>equally spreaded.
>
>The best solution is as always - in the middle - in the balance between GC and not-GC.
>
>If I have vector of passive elements (chars) I would go with
>ref-countng for creating envelope safe to pass back and forth.
>If I have container of active elements (objects) with
>complex and sometimes unknown system of relationship
>I'll go with GC to avoid headaches with cyclic references and so on
>and broken pointers.
>
>Strings are strange types, they are both : wave and particle - scalar and aggregate at the same time.
>
>String as a wrapper-owner of character buffer allows
>somehow (not ideally!) to work with the string using its both
>forms, balancing between str1 = str2, str1 == str2 and
>str1.ptr == str2.ptr.
>
>Back to const.
>Having bultin arrays and slicing now creates *prerequisites*
>of optimal or suboptimal string handling.
>But e.g. slicing is just nothing without const ( for strings especially).
>See: I've found some string fragment and passed it to some function.
>This function does something and is passing it further. All these
>functions were built with good intentions and good programmers.
>But these programmers live in 12 hours timezone shift .
>The only one feasible way for them is self documenting code.
>Someone thinked that this particular string is safe to zero
>terminate it. Everything is ruined. To find source of it is not trivial.
>I bet that second time when it will happen D will be dead
>for the project. When it happened for me first time
>I've decided to do a string wrapper emulating constness.
>JUST NO WAY IN D. not technically nor theoretically.
>Neither '=' overload (to implement ownership and refcounting)
>nor const. Nothing.  Dead corner.
>
>char[] is not a string - it is array of chars.
>
>Pattern of string use is quite different from array.
>As a rule array is a heart of some container and pretty
>frequently already wrapped. But strings are flying
>everywhere. D shall have const for arrays and pointers
>to be considered as a language for teams and serious
>projects.
>
>IMHO.
>
>
>
>


June 02, 2005
> ....for having a facility in the
> language to say "this thing shouldn't change value".

exactly.

It is enough to have  const T[]  and const T* as distinct types from just T[]  and T* .

const T[] type has no opIndexAssign, length(int) and cannot be lvalue at
all.
Simple as 1-2-3. I really don't understand what is the motivation to do not
have them.

String literals are const char[] by definition.

Andrew.




"Kramer" <Kramer_member@pathlink.com> wrote in message news:d7ljra$2qvj$1@digitaldaemon.com...
> Well put.  Again, I think a point has been made for having a facility in
> the
> language to say "this thing shouldn't change value".  I understand that
> some
> devious programmer can find a way to change something that the compiler
> verified
> shouldn't be changed.  But I think that programmer is in the minority and
> the
> majority of programmers could use some self-documenting help from the
> language/compiler (and the devious programmer is specifically and
> intentionally
> going outside of the program specification, which I'd reckon could somehow
> be
> done in any language).
>
> Again, my $0.02.  But I think many have put in their $0.02 and will
> continue to
> do so because many believe it's an important concept.  When will all the
> $0.02
> contributions add up to be enough?
>
> -Kramer
>
> In article <d7li3v$2psu$1@digitaldaemon.com>, Andrew Fedoniouk says...
>>
>>
>>"Walter" <newshound@digitalmars.com> wrote in message news:d7j6na$d2f$1@digitaldaemon.com...
>>>
>>> "Eugene Pelekhay" <pelekhay@gmail.com> wrote in message news:d7hfuh$1ejl$1@digitaldaemon.com...
>>>> May be I'm dummy, but I don't see in this example why this other languages must copy it 10 times. For my implementation of reference counted string in my C++ project, copy will be performed also 0 times. And if there is more then 1 reference to instance exsits it's only one copy operation will be performed. I see only one advantage in current implementation of string - not need to check or increment/decrement reference counter, but instead of this string duplication is required
>>>
>>> You're right that you can avoid excessive copying by doing ref counting.
>>>
>>> Reference counting carries with it other penalties - storage must be
>>> allocated for the ref count, every copy increments the count, and every
>>> reference that goes out of scope must decrement the count. Add in
>>> exception
>>> handling, and the price is high (although C++'s mechanisms hide that
>>> price
>>> from you).
>>>
>>> Ref counting would make it impractical to do D's array slices.
>>>
>>> Furthermore, in the presence of garbage collection, layering on top a
>>> reference counting mechanism probably means you'll want to ditch the gc
>>> and
>>> go with a full ref counting architecture for every object. In my
>>> experience,
>>> such is slower than using mark/sweep gc.
>>>
>>>
>>
>>Yes, GC does good job. In some places. In other places ref-counting is
>>better.
>>Ideal language shall allow to use both. Dot.
>>Everything has its own price:
>>as much objects allocated (your .dup advice) as slow their scanning will
>>be
>>by GC.
>>And I am not  sure what is faster in fact in big picture - ref-counting
>>for strings or GC.
>>So far Java and C# are not demonstrating nothing spectacular here - it is
>>rather
>>defeat. At least in real life projects I can test by hands. But in
>>abstract
>>tests - everything
>>is just perfect.
>>I know only one: as less GC cycle as better. As it locks everything
>>and at unpredictable moment. ref-counting has price but
>>this price is acceptable as it is predictable and accountible and
>>equally spreaded.
>>
>>The best solution is as always - in the middle - in the balance between GC and not-GC.
>>
>>If I have vector of passive elements (chars) I would go with
>>ref-countng for creating envelope safe to pass back and forth.
>>If I have container of active elements (objects) with
>>complex and sometimes unknown system of relationship
>>I'll go with GC to avoid headaches with cyclic references and so on
>>and broken pointers.
>>
>>Strings are strange types, they are both : wave and particle - scalar and aggregate at the same time.
>>
>>String as a wrapper-owner of character buffer allows
>>somehow (not ideally!) to work with the string using its both
>>forms, balancing between str1 = str2, str1 == str2 and
>>str1.ptr == str2.ptr.
>>
>>Back to const.
>>Having bultin arrays and slicing now creates *prerequisites*
>>of optimal or suboptimal string handling.
>>But e.g. slicing is just nothing without const ( for strings especially).
>>See: I've found some string fragment and passed it to some function.
>>This function does something and is passing it further. All these
>>functions were built with good intentions and good programmers.
>>But these programmers live in 12 hours timezone shift .
>>The only one feasible way for them is self documenting code.
>>Someone thinked that this particular string is safe to zero
>>terminate it. Everything is ruined. To find source of it is not trivial.
>>I bet that second time when it will happen D will be dead
>>for the project. When it happened for me first time
>>I've decided to do a string wrapper emulating constness.
>>JUST NO WAY IN D. not technically nor theoretically.
>>Neither '=' overload (to implement ownership and refcounting)
>>nor const. Nothing.  Dead corner.
>>
>>char[] is not a string - it is array of chars.
>>
>>Pattern of string use is quite different from array.
>>As a rule array is a heart of some container and pretty
>>frequently already wrapped. But strings are flying
>>everywhere. D shall have const for arrays and pointers
>>to be considered as a language for teams and serious
>>projects.
>>
>>IMHO.
>>
>>
>>
>>
>
>