April 01, 2005
Hey everybody, it's Dr Nic!

(Sorry, been spending too much time watching The Simpsons of late. <g>)

I've been meaning to bring this up for a while, and Pablo's recent work on using string_view for returning slices of strings has brought it back into focus.

Those of you who've been peering inside the STLSoft components to find out just how they can be so fabulously compatible with almost anything you can shake a stick at will have noticed a whole lot of use of string access shims, in particular c_str_ptr() and, to a lesser extent, c_str_len().

(Anyone not wholly persuaded by Shims can check them in Chapter 20 of Imperfect C++ (http://imperfectcplusplus.com) or in my August 2003 CUJ article "Generalised String Manipulation" (http://www.cuj.com/documents/s=8681/cuj0308wilson/). There's also a pretty pokey definition available at at http://www.synesis.com.au/articles.html#whitepapers.)

Anyway, the main motivation for c_str_data() would be to provide a non-necessarily-nul-terminated pointer to a contiguous array of bytes of a given string (or string-able type) instance. This would mean that when one is intending to use the c-string form of a particular object _without_ relying on the nul-termination, there could is an opportunity for non-trivial optimisation.

For example, the memory_database class in the Open-RJ/STL mapping has a generalised ctor:

class memory_database
    : public database_base
{
    . . .
    template <typename S>
    explicit memory_database(S const &contents, unsigned flags = 0)
        : parent_class_type(create_database_(::stlsoft::c_str_ptr(contents), ::stlsoft::c_str_len(contents), flags))
    {}


Since c_str_len() is being used, we may assume (and obviously one should check, in the general case!) that
create_database_() does not rely on the nul-terminator. Hence, this could be rewritten as:


class memory_database
    : public database_base
{
    . . .
    template <typename S>
    explicit memory_database(S const &contents, unsigned flags = 0)
        : parent_class_type(create_database_(::stlsoft::c_str_data(contents), ::stlsoft::c_str_len(contents), flags))
    {}

Now, for most string-able entities, this will make no difference. An MFC CString will still return the base of its nul-terminated allocation. An STL CWindow will still have to return a temporary shim_string instance.

But for other types, such as the new string_view, and the Win32 Security API type LSA_UNICODE_STRING, this would remove the need to generate nul-terminated storage.

So, the picture is rosy in so far as things that already have nul-termination (e.g. CString), or have no intrinsic
c-string of their own (e.g. HWND) are not affected, but those that have storage which is not nul-terminated would result
in more efficient code. The downside is another c_str_XXX() to be aware of, but since most people understand the
concepts of the standard String models' c_str() and data() (and length()) functions, this is pretty readily grokable.

What I'm interested in is whether anyone sees a downside? (I vaguely recall having thought of one on a bike ride a few
weeks ago, but someone passed me and I had to chase 'em down. <g>) One thing that did occur would be whether there might
be any circumstances where one might not be able to define a c_str_data(). (I can't think of one, since the
characteristics of the return value of c_str() / c_str_ptr() answers the requirements of the return value of data() /
c_str_data(), but I may well have missed something.)

Thoughts?

Cheers

Matthew



April 05, 2005
I remembered the arguments against: it might (in fact it should!) incline library writers to use c_str_data()+c_str_len() instead of c_str_ptr() (alone). For cases where the 'string' is something for which a temporary shim string instance must be synthesised (e.g. ACE's ACE_INET_Addr), it'd result in two conversions instead of one. Naturally, such cases are likely to be rare.

Anyway, I've added it in tonight, and it'll appear in beta 7. Each case where c_str_data() is now used was already using c_str_ptr()+c_str_len(), so it's either the same or a gain in every extant case. Time will tell how it pans out in future use.

Cheers

Matthew

"Matthew" <admin@stlsoft.dot.dot.dot.dot.org> wrote in message news:d2i736$2aoc$1@digitaldaemon.com...
> Hey everybody, it's Dr Nic!
>
> (Sorry, been spending too much time watching The Simpsons of late. <g>)
>
> I've been meaning to bring this up for a while, and Pablo's recent work on using string_view for returning slices of strings has brought it back into focus.
>
> Those of you who've been peering inside the STLSoft components to find out just how they can be so fabulously
> compatible
> with almost anything you can shake a stick at will have noticed a whole lot of use of string access shims, in
> particular
> c_str_ptr() and, to a lesser extent, c_str_len().
>
> (Anyone not wholly persuaded by Shims can check them in Chapter 20 of Imperfect C++ (http://imperfectcplusplus.com) or in my August 2003 CUJ article "Generalised String Manipulation" (http://www.cuj.com/documents/s=8681/cuj0308wilson/). There's also a pretty pokey definition available at at http://www.synesis.com.au/articles.html#whitepapers.)
>
> Anyway, the main motivation for c_str_data() would be to provide a non-necessarily-nul-terminated pointer to a
> contiguous array of bytes of a given string (or string-able type) instance. This would mean that when one is intending
> to use the c-string form of a particular object _without_ relying on the nul-termination, there could is an
> opportunity
> for non-trivial optimisation.
>
> For example, the memory_database class in the Open-RJ/STL mapping has a generalised ctor:
>
> class memory_database
>    : public database_base
> {
>    . . .
>    template <typename S>
>    explicit memory_database(S const &contents, unsigned flags = 0)
>        : parent_class_type(create_database_(::stlsoft::c_str_ptr(contents), ::stlsoft::c_str_len(contents), flags))
>    {}
>
>
> Since c_str_len() is being used, we may assume (and obviously one should check, in the general case!) that
> create_database_() does not rely on the nul-terminator. Hence, this could be rewritten as:
>
>
> class memory_database
>    : public database_base
> {
>    . . .
>    template <typename S>
>    explicit memory_database(S const &contents, unsigned flags = 0)
>        : parent_class_type(create_database_(::stlsoft::c_str_data(contents), ::stlsoft::c_str_len(contents), flags))
>    {}
>
> Now, for most string-able entities, this will make no difference. An MFC CString will still return the base of its nul-terminated allocation. An STL CWindow will still have to return a temporary shim_string instance.
>
> But for other types, such as the new string_view, and the Win32 Security API type LSA_UNICODE_STRING, this would
> remove
> the need to generate nul-terminated storage.
>
> So, the picture is rosy in so far as things that already have nul-termination (e.g. CString), or have no intrinsic
> c-string of their own (e.g. HWND) are not affected, but those that have storage which is not nul-terminated would
> result
> in more efficient code. The downside is another c_str_XXX() to be aware of, but since most people understand the
> concepts of the standard String models' c_str() and data() (and length()) functions, this is pretty readily grokable.
>
> What I'm interested in is whether anyone sees a downside? (I vaguely recall having thought of one on a bike ride a few
> weeks ago, but someone passed me and I had to chase 'em down. <g>) One thing that did occur would be whether there
> might
> be any circumstances where one might not be able to define a c_str_data(). (I can't think of one, since the
> characteristics of the return value of c_str() / c_str_ptr() answers the requirements of the return value of data() /
> c_str_data(), but I may well have missed something.)
>
> Thoughts?
>
> Cheers
>
> Matthew
>
>
>