strtok?

Apr 09, 2005

Apr 10, 2005

I have this C program (not written by me) that uses strtok. To port it to D, I wrote this: //-------------------------------------------------------------- extern(C) char * strtok (char * strToken, char * strDelimit); char [] tokenize(char [] str, char [] sep) { char * arg1, arg2, res; arg2 = toStringz(sep); arg1 = (str.length>0) ? toStringz(str) : null; res = strtok(arg1,arg2); return toString(res); } //-------------------------------------------------------------- I would like to have a D only version of this. However, I'm not sure what strtok does. Does anybody know how to do this? -- Carlos Santander Bernal JP2, you'll always live in our minds

April 10, 2005

Re: strtok?

Posted by Regan Heath
in reply to Carlos Santander B.

Permalink

Regan Heath

Posted in reply to Carlos Santander B.

Permalink

On Sat, 09 Apr 2005 11:32:02 -0500, Carlos Santander B. <csantander619@gmail.com> wrote:
> I have this C program (not written by me) that uses strtok. To port it to D, I wrote this:
>
> //--------------------------------------------------------------
> extern(C) char * strtok (char * strToken, char * strDelimit);
>
> char [] tokenize(char [] str, char [] sep)
> {
> 	char * arg1, arg2, res;
> 	arg2 = toStringz(sep);
> 	arg1 = (str.length>0) ? toStringz(str) : null;
> 	res = strtok(arg1,arg2);
> 	return toString(res);
> }
> //--------------------------------------------------------------
>
> I would like to have a D only version of this. However, I'm not sure what strtok does.

From MSDN:

char *strtok( char *strToken, const char *strDelimit );
wchar_t *wcstok( wchar_t *strToken, const wchar_t *strDelimit );
unsigned char *_mbstok( unsigned char*strToken, const unsigned char *strDelimit );

All of these functions return a pointer to the next token found in strToken. They return NULL when no more tokens are found. Each call modifies strToken by substituting a NULL character for each delimiter that is encountered.

The strtok function finds the next token in strToken. The set of characters in strDelimit specifies possible delimiters of the token to be found in strToken on the current call. wcstok and _mbstok are wide-character and multibyte-character versions of strtok. The arguments and return value of wcstok are wide-character strings; those of _mbstok are multibyte-character strings. These three functions behave identically otherwise.

On the first call to strtok, the function skips leading delimiters and returns a pointer to the first token in strToken, terminating the token with a null character. More tokens can be broken out of the remainder of strToken by a series of calls to strtok. Each call to strtok modifies strToken by inserting a null character after the token returned by that call. To read the next token from strToken, call strtok with a NULL value for the strToken argument. The NULL strToken argument causes strtok to search for the next token in the modified strToken. The strDelimit argument can take any value from one call to the next so that the set of delimiters may vary.

Warning   Each of these functions uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these function from within a loop where another routine may be called that uses the same function.  However, calling this function simultaneously from multiple threads does not have undesirable effects.

> Does anybody know how to do this?

D can do much better than C, using slices you can tokenize a string without modification and return all the results in an array.

import std.stdio;
import std.string;

char[][] tokenise(char[] input, char[] tokens)
{	
	char[][] res = null;
	int start = -1;

	foreach(int i, char c; input) {
		if (tokens.find(c) == -1) {
			if (start == -1) start = i;
		}
		else {
			if (start != -1) {
				res ~= input[start..i];
				start = -1;
			}

		}
	}
	if (start != -1) res ~= input[start..$];
	return res;
}

void main()
{
	char[] input = ",ab.c,,..def,.,g,,h..i,,jkl,";

	writefln(input);
	foreach(char[] s; tokenise(input,",."))
		writefln(s);
}

Regan

Regan Heath wrote: > From MSDN: > > char *strtok( char *strToken, const char *strDelimit ); > wchar_t *wcstok( wchar_t *strToken, const wchar_t *strDelimit ); > unsigned char *_mbstok( unsigned char*strToken, const unsigned char *strDelimit ); > > All of these functions return a pointer to the next token found in strToken. They return NULL when no more tokens are found. Each call modifies strToken by substituting a NULL character for each delimiter that is encountered. > > The strtok function finds the next token in strToken. The set of characters in strDelimit specifies possible delimiters of the token to be found in strToken on the current call. wcstok and _mbstok are wide-character and multibyte-character versions of strtok. The arguments and return value of wcstok are wide-character strings; those of _mbstok are multibyte-character strings. These three functions behave identically otherwise. > > On the first call to strtok, the function skips leading delimiters and returns a pointer to the first token in strToken, terminating the token with a null character. More tokens can be broken out of the remainder of strToken by a series of calls to strtok. Each call to strtok modifies strToken by inserting a null character after the token returned by that call. To read the next token from strToken, call strtok with a NULL value for the strToken argument. The NULL strToken argument causes strtok to search for the next token in the modified strToken. The strDelimit argument can take any value from one call to the next so that the set of delimiters may vary. > > Warning Each of these functions uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these function from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects. > Thanks for that. > > D can do much better than C, using slices you can tokenize a string without modification and return all the results in an array. > > import std.stdio; > import std.string; > > char[][] tokenise(char[] input, char[] tokens) > { char[][] res = null; > int start = -1; > foreach(int i, char c; input) { > if (tokens.find(c) == -1) { > if (start == -1) start = i; > } > else { > if (start != -1) { > res ~= input[start..i]; > start = -1; > } > } > } > if (start != -1) res ~= input[start..$]; > return res; > } > > void main() > { > char[] input = ",ab.c,,..def,.,g,,h..i,,jkl,"; > writefln(input); > foreach(char[] s; tokenise(input,",.")) > writefln(s); > } > > Regan And especially thanks for that! -- Carlos Santander Bernal JP2, you'll always live in our minds

Forums