Thread overview | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
October 26, 2013 How to get a substring? | ||||
---|---|---|---|---|
| ||||
Dumb Newbie Question: I've searched through the library reference, but I haven't figured out how to extract a substring from a string. I'd like something like string.substring("Hello", 0, 2) to return "Hel", for example. What method am I looking for? Thanks! |
October 26, 2013 Re: How to get a substring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Gautam Goel | On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
> Dumb Newbie Question: I've searched through the library reference, but I haven't figured out how to extract a substring from a string. I'd like something like string.substring("Hello", 0, 2) to return "Hel", for example. What method am I looking for? Thanks!
Use slices:
string msg = "Hello";
string sub = msg[0 .. 2];
|
October 26, 2013 Re: How to get a substring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Namespace | On 10/26/2013 02:25 PM, Namespace wrote:
> On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
>> Dumb Newbie Question: I've searched through the library reference, but
>> I haven't figured out how to extract a substring from a string. I'd
>> like something like string.substring("Hello", 0, 2) to return "Hel",
>> for example. What method am I looking for? Thanks!
>
> Use slices:
>
> string msg = "Hello";
> string sub = msg[0 .. 2];
Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)
I could not find a subString() function either but it turns out to be trivial to implement with Phobos:
import std.range;
import std.algorithm;
auto subRange(R)(R s, size_t beg, size_t end)
{
return s.dropExactly(beg).take(end - beg);
}
unittest
{
assert("abcçdef".subRange(2, 4).equal("cç"));
}
void main()
{}
That function produces a lazy range. To convert it eagerly to a string:
import std.conv;
string subString(string s, size_t beg, size_t end)
{
return s.subRange(beg, end).text;
}
unittest
{
assert("Hello".subString(0, 2) == "He");
}
Ali
|
October 26, 2013 Re: How to get a substring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Saturday, 26 October 2013 at 22:17:33 UTC, Ali Çehreli wrote:
> On 10/26/2013 02:25 PM, Namespace wrote:
>> On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
>>> Dumb Newbie Question: I've searched through the library reference, but
>>> I haven't figured out how to extract a substring from a string. I'd
>>> like something like string.substring("Hello", 0, 2) to return "Hel",
>>> for example. What method am I looking for? Thanks!
>>
>> Use slices:
>>
>> string msg = "Hello";
>> string sub = msg[0 .. 2];
>
> Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)
>
> I could not find a subString() function either but it turns out to be trivial to implement with Phobos:
>
> import std.range;
> import std.algorithm;
>
> auto subRange(R)(R s, size_t beg, size_t end)
> {
> return s.dropExactly(beg).take(end - beg);
> }
>
> unittest
> {
> assert("abcçdef".subRange(2, 4).equal("cç"));
> }
>
> void main()
> {}
>
> That function produces a lazy range. To convert it eagerly to a string:
>
> import std.conv;
>
> string subString(string s, size_t beg, size_t end)
> {
> return s.subRange(beg, end).text;
> }
>
> unittest
> {
> assert("Hello".subString(0, 2) == "He");
> }
>
> Ali
Yeah that is of course easier and nicer than C++... :D Just kidding. I think the slice should be enough. This example would have deterred me from further use if I had seen it it in my beginning.
|
October 26, 2013 Re: How to get a substring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Namespace | On Saturday, 26 October 2013 at 23:19:56 UTC, Namespace wrote:
> On Saturday, 26 October 2013 at 22:17:33 UTC, Ali Çehreli wrote:
>> On 10/26/2013 02:25 PM, Namespace wrote:
>>> On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
>>>> Dumb Newbie Question: I've searched through the library reference, but
>>>> I haven't figured out how to extract a substring from a string. I'd
>>>> like something like string.substring("Hello", 0, 2) to return "Hel",
>>>> for example. What method am I looking for? Thanks!
>>>
>>> Use slices:
>>>
>>> string msg = "Hello";
>>> string sub = msg[0 .. 2];
>>
>> Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)
>>
>> I could not find a subString() function either but it turns out to be trivial to implement with Phobos:
>>
>> import std.range;
>> import std.algorithm;
>>
>> auto subRange(R)(R s, size_t beg, size_t end)
>> {
>> return s.dropExactly(beg).take(end - beg);
>> }
>>
>> unittest
>> {
>> assert("abcçdef".subRange(2, 4).equal("cç"));
>> }
>>
>> void main()
>> {}
>>
>> That function produces a lazy range. To convert it eagerly to a string:
>>
>> import std.conv;
>>
>> string subString(string s, size_t beg, size_t end)
>> {
>> return s.subRange(beg, end).text;
>> }
>>
>> unittest
>> {
>> assert("Hello".subString(0, 2) == "He");
>> }
>>
>> Ali
>
> Yeah that is of course easier and nicer than C++... :D Just kidding. I think the slice should be enough. This example would have deterred me from further use if I had seen it it in my beginning.
This functionality should really be provided in phobos str.string!
It is a very common function and I have also made the mistake of
slicing a range in the past :/
|
October 26, 2013 Re: How to get a substring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Saturday, 26 October 2013 at 22:17:33 UTC, Ali Çehreli wrote: >> Use slices: >> >> string msg = "Hello"; >> string sub = msg[0 .. 2]; > > Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.) But that isn't how substring works. At least it seams neither Java or C# take UTF-8 encoding into account (as expected). Though D generally has much better functions for some situations, find/until/countUntil/startsWith. |
October 27, 2013 Re: How to get a substring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli Attachments:
| I've posted a while back a string=>string substring function that doesn't allocating: google "nonallocating unicode string manipulations" code: auto slice(T)(T a,size_t u, size_t v)if(is(T==string)){//TODO:generalize to isSomeString import std.exception; auto m=a.length; size_t i; enforce(u<=v); import std.utf; while(u-- && i<m){ auto si=stride(a,i); i+=si; v--; } // assert(u==-1); // enforce(u==-1); size_t i2=i; while(v-- && i2<m){ auto si=stride(a,i2); i2+=si; } // assert(v==-1); enforce(v==-1); return a[i..i2]; } unittest{ import std.range; auto a="≈açç√ef"; auto b=a.slice(2,6); assert(a.slice(2,6)=="çç√e"); assert(a.slice(2,6).ptr==a.slice(2,3).ptr); assert(a.slice(0,a.walkLength) is a); import std.exception; assertThrown(a.slice(2,8)); assertThrown(a.slice(2,1)); } On Sat, Oct 26, 2013 at 3:17 PM, Ali Çehreli <acehreli@yahoo.com> wrote: > On 10/26/2013 02:25 PM, Namespace wrote: > >> On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote: >> >>> Dumb Newbie Question: I've searched through the library reference, but I haven't figured out how to extract a substring from a string. I'd like something like string.substring("Hello", 0, 2) to return "Hel", for example. What method am I looking for? Thanks! >>> >> >> Use slices: >> >> string msg = "Hello"; >> string sub = msg[0 .. 2]; >> > > Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.) > > I could not find a subString() function either but it turns out to be trivial to implement with Phobos: > > import std.range; > import std.algorithm; > > auto subRange(R)(R s, size_t beg, size_t end) > { > return s.dropExactly(beg).take(end - beg); > } > > unittest > { > assert("abcçdef".subRange(2, 4).equal("cç")); > } > > void main() > {} > > That function produces a lazy range. To convert it eagerly to a string: > > import std.conv; > > string subString(string s, size_t beg, size_t end) > { > return s.subRange(beg, end).text; > } > > unittest > { > assert("Hello".subString(0, 2) == "He"); > } > > Ali > > |
October 27, 2013 Re: How to get a substring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timothee Cour | On Sunday, 27 October 2013 at 00:18:41 UTC, Timothee Cour wrote: > I've posted a while back a string=>string substring function that doesn't > allocating: google > "nonallocating unicode string manipulations" > > code: > > auto slice(T)(T a,size_t u, size_t v)if(is(T==string)){//TODO:generalize to > isSomeString > import std.exception; > auto m=a.length; > size_t i; > enforce(u<=v); > import std.utf; > while(u-- && i<m){ > auto si=stride(a,i); > i+=si; > v--; > } > // assert(u==-1); > // enforce(u==-1); > size_t i2=i; > while(v-- && i2<m){ > auto si=stride(a,i2); > i2+=si; > } > // assert(v==-1); > enforce(v==-1); > return a[i..i2]; > } > unittest{ > import std.range; > auto a="≈açç√ef"; > auto b=a.slice(2,6); > assert(a.slice(2,6)=="çç√e"); > assert(a.slice(2,6).ptr==a.slice(2,3).ptr); > assert(a.slice(0,a.walkLength) is a); > import std.exception; > assertThrown(a.slice(2,8)); > assertThrown(a.slice(2,1)); > } > Another one, with negative index like Javascript's String.slice(): http://dpaste.dzfl.pl/608435c5 |
October 27, 2013 Re: How to get a substring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Saturday, October 26, 2013 15:17:33 Ali Çehreli wrote:
> On 10/26/2013 02:25 PM, Namespace wrote:
> > On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
> >> Dumb Newbie Question: I've searched through the library reference, but I haven't figured out how to extract a substring from a string. I'd like something like string.substring("Hello", 0, 2) to return "Hel", for example. What method am I looking for? Thanks!
> >
> > Use slices:
> >
> > string msg = "Hello";
> > string sub = msg[0 .. 2];
>
> Yes but that works only if the string is known to contain only ASCII codes. (Otherwise, a string is a collection of UTF-8 code units.)
>
> I could not find a subString() function either but it turns out to be trivial to implement with Phobos:
>
> import std.range;
> import std.algorithm;
>
> auto subRange(R)(R s, size_t beg, size_t end)
> {
> return s.dropExactly(beg).take(end - beg);
> }
>
> unittest
> {
> assert("abcçdef".subRange(2, 4).equal("cç"));
> }
>
> void main()
> {}
>
> That function produces a lazy range. To convert it eagerly to a string:
>
> import std.conv;
>
> string subString(string s, size_t beg, size_t end)
> {
> return s.subRange(beg, end).text;
> }
>
> unittest
> {
> assert("Hello".subString(0, 2) == "He");
> }
There's also std.utf.toUTFindex, which allows you to do
auto str = "Hello";
assert(str[0 .. str.toUTFindex(2)] == "He");
but you have to be careful with it when using anything other than 0 for the first index, because you don't want it to have to traverse the range multiple times. With your unicode example you're forced to do something like
auto str = "abcçdef";
immutable first = str.toUTFindex(2);
immutable second = str[first .. $].toUTFindex(2) + first;
assert(str[first .. second] == "cç");
It also has the advantage of the final result being a string without having to do any conversions. So, subString should probably be defined as
inout(C)[] subString(C)(inout(C)[] str, size_t i, size_t j)
if(isSomeChar!C)
{
import std.utf;
immutable first = str.toUTFindex(i);
immutable second = str[first .. $].toUTFindex(i) + first;
return str[first .. second];
}
Using drop/dropExactly with take/takeExactly makes more sense when you want to iterate over the characters but don't need a string (especially if you're not necessarily going to iterate over them all), but if you really want a string, then finding the right index for the slice and then slicing is arguably better.
- Jonathan M Davis
|
October 27, 2013 Re: How to get a substring? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nicolas Sicard Attachments:
| On Sat, Oct 26, 2013 at 6:24 PM, Nicolas Sicard <dransic@gmail.com> wrote:
> On Sunday, 27 October 2013 at 00:18:41 UTC, Timothee Cour wrote:
>
>> I've posted a while back a string=>string substring function that doesn't
>> allocating: google
>> "nonallocating unicode string manipulations"
>>
>> code:
>>
>> auto slice(T)(T a,size_t u, size_t v)if(is(T==string)){//TODO:**generalize
>> to
>> isSomeString
>> import std.exception;
>> auto m=a.length;
>> size_t i;
>> enforce(u<=v);
>> import std.utf;
>> while(u-- && i<m){
>> auto si=stride(a,i);
>> i+=si;
>> v--;
>> }
>> // assert(u==-1);
>> // enforce(u==-1);
>> size_t i2=i;
>> while(v-- && i2<m){
>> auto si=stride(a,i2);
>> i2+=si;
>> }
>> // assert(v==-1);
>> enforce(v==-1);
>> return a[i..i2];
>> }
>> unittest{
>> import std.range;
>> auto a="≈açç√ef";
>> auto b=a.slice(2,6);
>> assert(a.slice(2,6)=="çç√e");
>> assert(a.slice(2,6).ptr==a.**slice(2,3).ptr);
>> assert(a.slice(0,a.walkLength) is a);
>> import std.exception;
>> assertThrown(a.slice(2,8));
>> assertThrown(a.slice(2,1));
>> }
>>
>>
> Another one, with negative index like Javascript's String.slice():
> http://dpaste.dzfl.pl/608435c5
>
not as efficient as what I proposed since it's iterating over the string twice (the 2nd index redoes the work done by 1st index). Could be adapted though.
|
Copyright © 1999-2021 by the D Language Foundation