Jump to page: 1 2
Thread overview
How to get a substring?
Oct 26, 2013
Gautam Goel
Oct 26, 2013
Namespace
Oct 26, 2013
Ali Çehreli
Oct 26, 2013
Namespace
Oct 26, 2013
Damian
Oct 26, 2013
Jesse Phillips
Oct 27, 2013
Timothee Cour
Oct 27, 2013
Nicolas Sicard
Oct 27, 2013
Timothee Cour
Oct 27, 2013
Nicolas Sicard
Oct 27, 2013
Jonathan M Davis
Oct 27, 2013
Jakob Ovrum
Oct 27, 2013
Nicolas Sicard
Oct 27, 2013
Jakob Ovrum
Oct 27, 2013
Jonathan M Davis
Oct 27, 2013
Jakob Ovrum
Oct 27, 2013
Jakob Ovrum
October 26, 2013
Dumb Newbie Question: I've searched through the library reference, but I haven't figured out how to extract a substring from a string. I'd like something like string.substring("Hello", 0, 2) to return "Hel", for example. What method am I looking for? Thanks!
October 26, 2013
On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
> Dumb Newbie Question: I've searched through the library reference, but I haven't figured out how to extract a substring from a string. I'd like something like string.substring("Hello", 0, 2) to return "Hel", for example. What method am I looking for? Thanks!

Use slices:

string msg = "Hello";
string sub = msg[0 .. 2];
October 26, 2013
On 10/26/2013 02:25 PM, Namespace wrote:
> On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
>> Dumb Newbie Question: I've searched through the library reference, but
>> I haven't figured out how to extract a substring from a string. I'd
>> like something like string.substring("Hello", 0, 2) to return "Hel",
>> for example. What method am I looking for? Thanks!
>
> Use slices:
>
> string msg = "Hello";
> string sub = msg[0 .. 2];

Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)

I could not find a subString() function either but it turns out to be trivial to implement with Phobos:

import std.range;
import std.algorithm;

auto subRange(R)(R s, size_t beg, size_t end)
{
    return s.dropExactly(beg).take(end - beg);
}

unittest
{
    assert("abcçdef".subRange(2, 4).equal("cç"));
}

void main()
{}

That function produces a lazy range. To convert it eagerly to a string:

import std.conv;

string subString(string s, size_t beg, size_t end)
{
    return s.subRange(beg, end).text;
}

unittest
{
    assert("Hello".subString(0, 2) == "He");
}

Ali

October 26, 2013
On Saturday, 26 October 2013 at 22:17:33 UTC, Ali Çehreli wrote:
> On 10/26/2013 02:25 PM, Namespace wrote:
>> On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
>>> Dumb Newbie Question: I've searched through the library reference, but
>>> I haven't figured out how to extract a substring from a string. I'd
>>> like something like string.substring("Hello", 0, 2) to return "Hel",
>>> for example. What method am I looking for? Thanks!
>>
>> Use slices:
>>
>> string msg = "Hello";
>> string sub = msg[0 .. 2];
>
> Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)
>
> I could not find a subString() function either but it turns out to be trivial to implement with Phobos:
>
> import std.range;
> import std.algorithm;
>
> auto subRange(R)(R s, size_t beg, size_t end)
> {
>     return s.dropExactly(beg).take(end - beg);
> }
>
> unittest
> {
>     assert("abcçdef".subRange(2, 4).equal("cç"));
> }
>
> void main()
> {}
>
> That function produces a lazy range. To convert it eagerly to a string:
>
> import std.conv;
>
> string subString(string s, size_t beg, size_t end)
> {
>     return s.subRange(beg, end).text;
> }
>
> unittest
> {
>     assert("Hello".subString(0, 2) == "He");
> }
>
> Ali

Yeah that is of course easier and nicer than C++... :D Just kidding. I think the slice should be enough. This example would have deterred me from further use if I had seen it it in my beginning.
October 26, 2013
On Saturday, 26 October 2013 at 23:19:56 UTC, Namespace wrote:
> On Saturday, 26 October 2013 at 22:17:33 UTC, Ali Çehreli wrote:
>> On 10/26/2013 02:25 PM, Namespace wrote:
>>> On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
>>>> Dumb Newbie Question: I've searched through the library reference, but
>>>> I haven't figured out how to extract a substring from a string. I'd
>>>> like something like string.substring("Hello", 0, 2) to return "Hel",
>>>> for example. What method am I looking for? Thanks!
>>>
>>> Use slices:
>>>
>>> string msg = "Hello";
>>> string sub = msg[0 .. 2];
>>
>> Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)
>>
>> I could not find a subString() function either but it turns out to be trivial to implement with Phobos:
>>
>> import std.range;
>> import std.algorithm;
>>
>> auto subRange(R)(R s, size_t beg, size_t end)
>> {
>>    return s.dropExactly(beg).take(end - beg);
>> }
>>
>> unittest
>> {
>>    assert("abcçdef".subRange(2, 4).equal("cç"));
>> }
>>
>> void main()
>> {}
>>
>> That function produces a lazy range. To convert it eagerly to a string:
>>
>> import std.conv;
>>
>> string subString(string s, size_t beg, size_t end)
>> {
>>    return s.subRange(beg, end).text;
>> }
>>
>> unittest
>> {
>>    assert("Hello".subString(0, 2) == "He");
>> }
>>
>> Ali
>
> Yeah that is of course easier and nicer than C++... :D Just kidding. I think the slice should be enough. This example would have deterred me from further use if I had seen it it in my beginning.

This functionality should really be provided in phobos str.string!
It is a very common function and I have also made the mistake of
slicing a range in the past :/
October 26, 2013
On Saturday, 26 October 2013 at 22:17:33 UTC, Ali Çehreli wrote:
>> Use slices:
>>
>> string msg = "Hello";
>> string sub = msg[0 .. 2];
>
> Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)

But that isn't how substring works. At least it seams neither Java or C# take UTF-8 encoding into account (as expected).

Though D generally has much better functions for some situations, find/until/countUntil/startsWith.
October 27, 2013
I've posted a while back a string=>string substring function that doesn't
allocating: google
"nonallocating unicode string manipulations"

code:

auto slice(T)(T a,size_t u, size_t v)if(is(T==string)){//TODO:generalize to
isSomeString
import std.exception;
auto m=a.length;
size_t i;
enforce(u<=v);
import std.utf;
while(u-- && i<m){
auto si=stride(a,i);
i+=si;
v--;
}
// assert(u==-1);
// enforce(u==-1);
size_t i2=i;
while(v-- && i2<m){
auto si=stride(a,i2);
i2+=si;
}
// assert(v==-1);
enforce(v==-1);
return a[i..i2];
}
unittest{
import std.range;
auto a="≈açç√ef";
auto b=a.slice(2,6);
assert(a.slice(2,6)=="çç√e");
assert(a.slice(2,6).ptr==a.slice(2,3).ptr);
assert(a.slice(0,a.walkLength) is a);
import std.exception;
assertThrown(a.slice(2,8));
assertThrown(a.slice(2,1));
}


On Sat, Oct 26, 2013 at 3:17 PM, Ali Çehreli <acehreli@yahoo.com> wrote:

> On 10/26/2013 02:25 PM, Namespace wrote:
>
>> On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
>>
>>> Dumb Newbie Question: I've searched through the library reference, but I haven't figured out how to extract a substring from a string. I'd like something like string.substring("Hello", 0, 2) to return "Hel", for example. What method am I looking for? Thanks!
>>>
>>
>> Use slices:
>>
>> string msg = "Hello";
>> string sub = msg[0 .. 2];
>>
>
> Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)
>
> I could not find a subString() function either but it turns out to be trivial to implement with Phobos:
>
> import std.range;
> import std.algorithm;
>
> auto subRange(R)(R s, size_t beg, size_t end)
> {
>     return s.dropExactly(beg).take(end - beg);
> }
>
> unittest
> {
>     assert("abcçdef".subRange(2, 4).equal("cç"));
> }
>
> void main()
> {}
>
> That function produces a lazy range. To convert it eagerly to a string:
>
> import std.conv;
>
> string subString(string s, size_t beg, size_t end)
> {
>     return s.subRange(beg, end).text;
> }
>
> unittest
> {
>     assert("Hello".subString(0, 2) == "He");
> }
>
> Ali
>
>


October 27, 2013
On Sunday, 27 October 2013 at 00:18:41 UTC, Timothee Cour wrote:
> I've posted a while back a string=>string substring function that doesn't
> allocating: google
> "nonallocating unicode string manipulations"
>
> code:
>
> auto slice(T)(T a,size_t u, size_t v)if(is(T==string)){//TODO:generalize to
> isSomeString
> import std.exception;
> auto m=a.length;
> size_t i;
> enforce(u<=v);
> import std.utf;
> while(u-- && i<m){
> auto si=stride(a,i);
> i+=si;
> v--;
> }
> // assert(u==-1);
> // enforce(u==-1);
> size_t i2=i;
> while(v-- && i2<m){
> auto si=stride(a,i2);
> i2+=si;
> }
> // assert(v==-1);
> enforce(v==-1);
> return a[i..i2];
> }
> unittest{
> import std.range;
> auto a="≈açç√ef";
> auto b=a.slice(2,6);
> assert(a.slice(2,6)=="çç√e");
> assert(a.slice(2,6).ptr==a.slice(2,3).ptr);
> assert(a.slice(0,a.walkLength) is a);
> import std.exception;
> assertThrown(a.slice(2,8));
> assertThrown(a.slice(2,1));
> }
>

Another one, with negative index like Javascript's String.slice():
http://dpaste.dzfl.pl/608435c5
October 27, 2013
On Saturday, October 26, 2013 15:17:33 Ali Çehreli wrote:
> On 10/26/2013 02:25 PM, Namespace wrote:
> > On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
> >> Dumb Newbie Question: I've searched through the library reference, but I haven't figured out how to extract a substring from a string. I'd like something like string.substring("Hello", 0, 2) to return "Hel", for example. What method am I looking for? Thanks!
> > 
> > Use slices:
> > 
> > string msg = "Hello";
> > string sub = msg[0 .. 2];
> 
> Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)
> 
> I could not find a subString() function either but it turns out to be trivial to implement with Phobos:
> 
> import std.range;
> import std.algorithm;
> 
> auto subRange(R)(R s, size_t beg, size_t end)
> {
>      return s.dropExactly(beg).take(end - beg);
> }
> 
> unittest
> {
>      assert("abcçdef".subRange(2, 4).equal("cç"));
> }
> 
> void main()
> {}
> 
> That function produces a lazy range. To convert it eagerly to a string:
> 
> import std.conv;
> 
> string subString(string s, size_t beg, size_t end)
> {
>      return s.subRange(beg, end).text;
> }
> 
> unittest
> {
>      assert("Hello".subString(0, 2) == "He");
> }

There's also std.utf.toUTFindex, which allows you to do

    auto str = "Hello";
    assert(str[0 .. str.toUTFindex(2)] == "He");

but you have to be careful with it when using anything other than 0 for the first index, because you don't want it to have to traverse the range multiple times. With your unicode example you're forced to do something like

    auto str = "abcçdef";
    immutable first = str.toUTFindex(2);
    immutable second = str[first .. $].toUTFindex(2) + first;
    assert(str[first .. second] == "cç");

It also has the advantage of the final result being a string without having to do any conversions. So, subString should probably be defined as

    inout(C)[] subString(C)(inout(C)[] str, size_t i, size_t j)
        if(isSomeChar!C)
    {
        import std.utf;
        immutable first = str.toUTFindex(i);
        immutable second = str[first .. $].toUTFindex(i) + first;
        return str[first .. second];
    }

Using drop/dropExactly with take/takeExactly makes more sense when you want to iterate over the characters but don't need a string (especially if you're not necessarily going to iterate over them all), but if you really want a string, then finding the right index for the slice and then slicing is arguably better.

- Jonathan M Davis
October 27, 2013
On Sat, Oct 26, 2013 at 6:24 PM, Nicolas Sicard <dransic@gmail.com> wrote:

> On Sunday, 27 October 2013 at 00:18:41 UTC, Timothee Cour wrote:
>
>> I've posted a while back a string=>string substring function that doesn't
>> allocating: google
>> "nonallocating unicode string manipulations"
>>
>> code:
>>
>> auto slice(T)(T a,size_t u, size_t v)if(is(T==string)){//TODO:**generalize
>> to
>> isSomeString
>> import std.exception;
>> auto m=a.length;
>> size_t i;
>> enforce(u<=v);
>> import std.utf;
>> while(u-- && i<m){
>> auto si=stride(a,i);
>> i+=si;
>> v--;
>> }
>> // assert(u==-1);
>> // enforce(u==-1);
>> size_t i2=i;
>> while(v-- && i2<m){
>> auto si=stride(a,i2);
>> i2+=si;
>> }
>> // assert(v==-1);
>> enforce(v==-1);
>> return a[i..i2];
>> }
>> unittest{
>> import std.range;
>> auto a="≈açç√ef";
>> auto b=a.slice(2,6);
>> assert(a.slice(2,6)=="çç√e");
>> assert(a.slice(2,6).ptr==a.**slice(2,3).ptr);
>> assert(a.slice(0,a.walkLength) is a);
>> import std.exception;
>> assertThrown(a.slice(2,8));
>> assertThrown(a.slice(2,1));
>> }
>>
>>
> Another one, with negative index like Javascript's String.slice():
> http://dpaste.dzfl.pl/608435c5
>

not as efficient as what I proposed since it's iterating over the string twice (the 2nd index redoes the work done by 1st index). Could be adapted though.


« First   ‹ Prev
1 2