Thread overview
strtok?
Apr 10, 2005
Regan Heath
April 09, 2005
I have this C program (not written by me) that uses strtok. To port it to D, I wrote this:

//--------------------------------------------------------------
extern(C) char * strtok (char * strToken, char * strDelimit);

char [] tokenize(char [] str, char [] sep)
{
	char * arg1, arg2, res;
	arg2 = toStringz(sep);
	arg1 = (str.length>0) ? toStringz(str) : null;
	res = strtok(arg1,arg2);
	return toString(res);
}
//--------------------------------------------------------------

I would like to have a D only version of this. However, I'm not sure what strtok does. Does anybody know how to do this?

-- 
Carlos Santander Bernal

JP2, you'll always live in our minds
April 10, 2005
On Sat, 09 Apr 2005 11:32:02 -0500, Carlos Santander B. <csantander619@gmail.com> wrote:
> I have this C program (not written by me) that uses strtok. To port it to D, I wrote this:
>
> //--------------------------------------------------------------
> extern(C) char * strtok (char * strToken, char * strDelimit);
>
> char [] tokenize(char [] str, char [] sep)
> {
> 	char * arg1, arg2, res;
> 	arg2 = toStringz(sep);
> 	arg1 = (str.length>0) ? toStringz(str) : null;
> 	res = strtok(arg1,arg2);
> 	return toString(res);
> }
> //--------------------------------------------------------------
>
> I would like to have a D only version of this. However, I'm not sure what strtok does.

From MSDN:

char *strtok( char *strToken, const char *strDelimit );
wchar_t *wcstok( wchar_t *strToken, const wchar_t *strDelimit );
unsigned char *_mbstok( unsigned char*strToken, const unsigned char *strDelimit );

All of these functions return a pointer to the next token found in strToken. They return NULL when no more tokens are found. Each call modifies strToken by substituting a NULL character for each delimiter that is encountered.

The strtok function finds the next token in strToken. The set of characters in strDelimit specifies possible delimiters of the token to be found in strToken on the current call. wcstok and _mbstok are wide-character and multibyte-character versions of strtok. The arguments and return value of wcstok are wide-character strings; those of _mbstok are multibyte-character strings. These three functions behave identically otherwise.

On the first call to strtok, the function skips leading delimiters and returns a pointer to the first token in strToken, terminating the token with a null character. More tokens can be broken out of the remainder of strToken by a series of calls to strtok. Each call to strtok modifies strToken by inserting a null character after the token returned by that call. To read the next token from strToken, call strtok with a NULL value for the strToken argument. The NULL strToken argument causes strtok to search for the next token in the modified strToken. The strDelimit argument can take any value from one call to the next so that the set of delimiters may vary.

Warning   Each of these functions uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these function from within a loop where another routine may be called that uses the same function.  However, calling this function simultaneously from multiple threads does not have undesirable effects.

> Does anybody know how to do this?

D can do much better than C, using slices you can tokenize a string without modification and return all the results in an array.

import std.stdio;
import std.string;

char[][] tokenise(char[] input, char[] tokens)
{	
	char[][] res = null;
	int start = -1;
	
	foreach(int i, char c; input) {
		if (tokens.find(c) == -1) {
			if (start == -1) start = i;
		}
		else {
			if (start != -1) {
				res ~= input[start..i];
				start = -1;
			}
			
		}
	}
	if (start != -1) res ~= input[start..$];
	return res;
}

void main()
{
	char[] input = ",ab.c,,..def,.,g,,h..i,,jkl,";
	
	writefln(input);
	foreach(char[] s; tokenise(input,",."))
		writefln(s);
}

Regan
April 10, 2005
Regan Heath wrote:
>  From MSDN:
> 
> char *strtok( char *strToken, const char *strDelimit );
> wchar_t *wcstok( wchar_t *strToken, const wchar_t *strDelimit );
> unsigned char *_mbstok( unsigned char*strToken, const unsigned char  *strDelimit );
> 
> All of these functions return a pointer to the next token found in  strToken. They return NULL when no more tokens are found. Each call  modifies strToken by substituting a NULL character for each delimiter that  is encountered.
> 
> The strtok function finds the next token in strToken. The set of  characters in strDelimit specifies possible delimiters of the token to be  found in strToken on the current call. wcstok and _mbstok are  wide-character and multibyte-character versions of strtok. The arguments  and return value of wcstok are wide-character strings; those of _mbstok  are multibyte-character strings. These three functions behave identically  otherwise.
> 
> On the first call to strtok, the function skips leading delimiters and  returns a pointer to the first token in strToken, terminating the token  with a null character. More tokens can be broken out of the remainder of  strToken by a series of calls to strtok. Each call to strtok modifies  strToken by inserting a null character after the token returned by that  call. To read the next token from strToken, call strtok with a NULL value  for the strToken argument. The NULL strToken argument causes strtok to  search for the next token in the modified strToken. The strDelimit  argument can take any value from one call to the next so that the set of  delimiters may vary.
> 
> Warning   Each of these functions uses a static variable for parsing the  string into tokens. If multiple or simultaneous calls are made to the same  function, a high potential for data corruption and inaccurate results  exists. Therefore, do not attempt to call the same function simultaneously  for different strings and be aware of calling one of these function from  within a loop where another routine may be called that uses the same  function.  However, calling this function simultaneously from multiple  threads does not have undesirable effects.
> 

Thanks for that.

> 
> D can do much better than C, using slices you can tokenize a string  without modification and return all the results in an array.
> 
> import std.stdio;
> import std.string;
> 
> char[][] tokenise(char[] input, char[] tokens)
> {       char[][] res = null;
>     int start = -1;
>         foreach(int i, char c; input) {
>         if (tokens.find(c) == -1) {
>             if (start == -1) start = i;
>         }
>         else {
>             if (start != -1) {
>                 res ~= input[start..i];
>                 start = -1;
>             }
>                    }
>     }
>     if (start != -1) res ~= input[start..$];
>     return res;
> }
> 
> void main()
> {
>     char[] input = ",ab.c,,..def,.,g,,h..i,,jkl,";
>         writefln(input);
>     foreach(char[] s; tokenise(input,",."))
>         writefln(s);
> }
> 
> Regan

And especially thanks for that!

-- 
Carlos Santander Bernal

JP2, you'll always live in our minds