January 01, 2010
bearophile wrote:
> Walter Bright:
>> 3. The glaring fact that std::vector<char> and std::string are
>> different suggests something is still wrong.
> 
> In an array/vector you want O(1) access time to all items (ignoring
> RAM-cache access/transfer delays), while in a string with
> variable-width Unicode encoding that can be hard to do. So they look
> like two different data structures.

The real reason is different (multibyte support in std::string is at best nonexistent). std::vector was defined by Stepanov alone. But by the time std::string was standardized, many factions of the committee had a feature on their list. std::string is the result of that patchwork.

Andrei
January 01, 2010
Andrei Alexandrescu:
> The real reason is different (multibyte support in std::string is at best nonexistent). std::vector was defined by Stepanov alone. But by the time std::string was standardized, many factions of the committee had a feature on their list. std::string is the result of that patchwork.

Thanks you for the info, I didn't know that. In practice I was mostly talking about D2, that interests me more than C++.

In D2 strings can be your bidirectional Ranges, while fixed-sized/dynamic arrays can be random access Ranges (string can be random access Ranges according to just the underlying bytes. This may require two different syntaxes to access strings, the normal str[] and something else like str.byte[] for the bytes, and usually only the second one can guarantee a O(1) access time unless it's a 32-bit wide unicode chars. The access with [] may use something simple, like a "skip list" to speed up access from O(n) to O(ln n)).

And to avoid silly bugs D2 associative arrays can allow constant/immutable keys only (especially when used in safe modules), as in Python. Because if you put a key in a set/AA and later you modify it, its hash value and position doesn't get updated.

Bye,
bearophile
January 01, 2010
bearophile wrote:
> Andrei Alexandrescu:
>> The real reason is different (multibyte support in std::string is at best nonexistent). std::vector was defined by Stepanov alone. But by the time std::string was standardized, many factions of the committee had a feature on their list. std::string is the result of that patchwork.
> 
> Thanks you for the info, I didn't know that. In practice I was mostly talking about D2, that interests me more than C++.
> 
> In D2 strings can be your bidirectional Ranges, while fixed-sized/dynamic arrays can be random access Ranges (string can be random access Ranges according to just the underlying bytes. This may require two different syntaxes to access strings, the normal str[] and something else like str.byte[] for the bytes, and usually only the second one can guarantee a O(1) access time unless it's a 32-bit wide unicode chars. The access with [] may use something simple, like a "skip list" to speed up access from O(n) to O(ln n)).

Look for byCodeUnit in here:

http://dsource.org/projects/phobos/browser/trunk/phobos/std/string.d

and improve it.

> And to avoid silly bugs D2 associative arrays can allow constant/immutable keys only (especially when used in safe modules), as in Python. Because if you put a key in a set/AA and later you modify it, its hash value and position doesn't get updated.

That's a long discussion, sigh.


Andrei
January 01, 2010
Hello Kevin,

> I would say these are the technical merits of C that get it chosen
> these days:
> 
> 1. The new code they're writing will be part of a large body of
> existing C code which they don't have time, permission, or inclination
> to convert to C++.

Probably the most common reason that C is used (OTOH I'm not sure that counts as "choose" rather than just used)

> 
> 2. They need to be aware of every tiny low level detail anyway, so
> having the language do too many things "for you" is not the desired
> approach (security, O/S and embedded work).

Nod.

> 
> 3. C has a real ABI on almost every platform; therefore, C is chosen
> for most inter-language work such as writing python modules.
> 

Nod.

> But some people really *are* choosing C for aesthetics.  Linus
> Torvalds, bless his little world dominating heart, chose C for a
> normal app (git), and he cited that the existence of operator
> overloading in C++ is bad because it hides information -- e.g. in the
> general case you "never know what an expression is actually doing."

Is that choosing C or getting stuck with it after removing the non-options?

> 
> I think this can be seen as mainly an aesthetic choice.  Avoiding a
> language because it *supports* information hiding (which is what I
> think operator overloading is) is not really an 'economic' tradeoff,
> since you could choose not to hide information by not using those
> features.  He'd just rather not be in the vicinity of language
> features that make those kinds of choices because they seem wrong to
> him (and because he wants to keep C++ies out of his code I think.)

I considered citing Linus as the counter example... but there are also people who LIKE assembler so I think we should stick to the other 99%.

> 
> Some people want their language to have a "WYSIWYG" relationship with
> the generated assembler code (if I'm right, it does seem consistent
> with him being an OS developer).
> 
> I also know some scientists and mathematicians who use C rather than
> C++.  I think the reason is that by using a simpler language they can
> know everything about the language.  I think the sooner they can 'get
> the computer science stuff out of the way', the sooner they can focus
> on what they see as the domain issues.  (I think once the program gets
> big enough, the CompSci aspects reassert themself and scalability and
> maintainability issues begin to bite you in the rear.)

Odd, I'd expect that crowd to go with Fortran...

> 
> Kevin
> 


January 02, 2010
bearophile Wrote:

> Walter Bright:
> > 3. The glaring fact that std::vector<char> and std::string are different suggests something is still wrong.
> 
> In an array/vector you want O(1) access time to all items (ignoring RAM-cache access/transfer delays), while in a string with variable-width Unicode encoding that can be hard to do. So they look like two different data structures.
> 
> Bye,
> bearophile

Yeah, I think the charset thing was probably the main reason for the string/vector split, that and the desire to have special properties like conversion from char* that wouldn't be in vector.  Using basic_string<T> with locales is something of a historical wart, because with Unicode, getting your charset from your locale is somewhat obsolete for general purpose computers.  (Maybe very small profile systems will continue to use ascii or the code page of whatever culture buildt them.)

But I don't think C++'s string can be made to index by character unless you use wchar_t for the T in basic_string<T>.  I don't think string.size() is ever anything but a bytes or wchar_t count.

Kevin

January 04, 2010
Fri, 01 Jan 2010 00:56:04 +0000, dsimcha wrote:

[OT] the nntp client you use seems to have serious problems with line wrapping.
1 2 3 4
Next ›   Last »