Jump to page: 1 2
Thread overview
Checking, whether string contains only ascii.
Feb 22, 2017
berni
Feb 22, 2017
H. S. Teoh
Feb 22, 2017
H. S. Teoh
Feb 22, 2017
jklm
Feb 22, 2017
jklm
Feb 22, 2017
Adam D. Ruppe
Feb 22, 2017
aberba
Feb 22, 2017
ag0aep6g
Feb 22, 2017
Ali Çehreli
Feb 22, 2017
kinke
Feb 22, 2017
H. S. Teoh
Feb 23, 2017
berni
Feb 23, 2017
HeiHon
Feb 23, 2017
berni
February 22, 2017
In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us. Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so?

Here a sketch of my function:

>void foo(string postscript)
>{
>    // throw Exception, if postscript is not all ascii
>    // other stuff, assuming codeunit=codepoint
>}
February 22, 2017
On Wed, Feb 22, 2017 at 07:26:15PM +0000, berni via Digitalmars-d-learn wrote:
> In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us.  Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so?
[...]

Hmm... What about:

	import std.range.primitives;

	bool isAsciiOnly(R)(R input)
		if (isInputRange!R && is(ElementType!R : dchar))
	{
		import std.algorithm.iteration : fold;
		return input.fold!((a, b) => a && b < 0x80)(true);
	}

	unittest
	{
		assert(isAsciiOnly("abcdefg"));
		assert(!isAsciiOnly("abcбвг"));
	}

Basically, it iterates over the string / range of characters and checks that every character is less than 0x80, since anything that's 0x80 or greater cannot be ASCII.


T

-- 
INTEL = Only half of "intelligence".
February 22, 2017
On Wed, Feb 22, 2017 at 11:43:00AM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]
> 	import std.range.primitives;
> 
> 	bool isAsciiOnly(R)(R input)
> 		if (isInputRange!R && is(ElementType!R : dchar))
> 	{
> 		import std.algorithm.iteration : fold;
> 		return input.fold!((a, b) => a && b < 0x80)(true);
> 	}
> 
> 	unittest
> 	{
> 		assert(isAsciiOnly("abcdefg"));
> 		assert(!isAsciiOnly("abcбвг"));
> 	}
[...]

Ah, missing the Exception part:

	void foo(string input)
	{
		if (!input.isAsciiOnly)
			throw new Exception("...");
	}


T

-- 
Why are you blatanly misspelling "blatant"? -- Branden Robinson
February 22, 2017
On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote:
> In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us. Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so?
>
> Here a sketch of my function:
>
>>void foo(string postscript)
>>{
>>    // throw Exception, if postscript is not all ascii
>>    // other stuff, assuming codeunit=codepoint
>>}


void foo(string postscript)
{
    import std.ascii, astd.algorithm.ieration;
    if (!args[0].filter!(a => !isASCII(a)).empty)
        throw new Exception("bla");
}
February 22, 2017
On Wednesday, 22 February 2017 at 19:57:22 UTC, jklm wrote:
> On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote:
>> In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us. Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so?
>>
>> Here a sketch of my function:
>>
>>>void foo(string postscript)
>>>{
>>>    // throw Exception, if postscript is not all ascii
>>>    // other stuff, assuming codeunit=codepoint
>>>}
>
>
> void foo(string postscript)
> {
>     import std.ascii, astd.algorithm.ieration;
>     if (!postscript.filter!(a => !isASCII(a)).empty)
>         throw new Exception("bla");
> }

\s  postscript args[0]
February 22, 2017
On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote:
> herefore I'd like to make sure that the string the program read is only made up of ascii characters.

Easiest:

foreach(char ch; postscript)
  if(ch > 127) throw new Exception("non-ascii detected");
February 22, 2017
On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote:
> In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us. Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so?
>
> Here a sketch of my function:
>
>>void foo(string postscript)
>>{
>>    // throw Exception, if postscript is not all ascii
>>    // other stuff, assuming codeunit=codepoint
>>}

Making full use of the standard library:

----
import std.algorithm: all;
import std.ascii: isASCII;
import std.exception: enforce;

enforce(postscript.all!isASCII);
----

That checks on the code point level (because strings are ranges of dchars). If you want to be clever, you can avoid decoding and check on the code unit level:

----
/* other imports as above */
import std.utf: byCodeUnit;

enforce(postscript.byCodeUnit.all!isASCII);
----

Or you can do it manually, avoiding all those imports:

----
foreach (char c; postscript) if (c > 0x7F) throw new Exception("not ASCII");
----
February 22, 2017
On 02/22/2017 12:02 PM, ag0aep6g wrote:
> On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote:
>> In my program, I read a postscript file. Normal postscript files
>> should only be composed of ascii characters, but one never knows what
>> users give us. Therefore I'd like to make sure that the string the
>> program read is only made up of ascii characters. This simplifies the
>> code thereafter, because I then can assume, that codeunit==codepoint.
>> Is there a simple way to do so?
>>
>> Here a sketch of my function:
>>
>>> void foo(string postscript)
>>> {
>>>    // throw Exception, if postscript is not all ascii
>>>    // other stuff, assuming codeunit=codepoint
>>> }
>
> Making full use of the standard library:
>
> ----
> import std.algorithm: all;
> import std.ascii: isASCII;
> import std.exception: enforce;
>
> enforce(postscript.all!isASCII);
> ----
>
> That checks on the code point level (because strings are ranges of
> dchars). If you want to be clever, you can avoid decoding and check on
> the code unit level:
>
> ----
> /* other imports as above */
> import std.utf: byCodeUnit;
>
> enforce(postscript.byCodeUnit.all!isASCII);
> ----
>
> Or you can do it manually, avoiding all those imports:
>
> ----
> foreach (char c; postscript) if (c > 0x7F) throw new Exception("not
> ASCII");
> ----

One more:

bool isAscii(string s) {
    import std.string : representation;
    import std.algorithm : canFind;
    return !s.representation.canFind!(c => c >= 0x80);
}

unittest {
    assert(isAscii("hello world"));
    assert(!isAscii("hellö wörld"));
}

Ali

February 22, 2017
On Wednesday, 22 February 2017 at 20:07:34 UTC, Ali Çehreli wrote:
> One more:
>
> bool isAscii(string s) {
>     import std.string : representation;
>     import std.algorithm : canFind;
>     return !s.representation.canFind!(c => c >= 0x80);
> }
>
> unittest {
>     assert(isAscii("hello world"));
>     assert(!isAscii("hellö wörld"));
> }
>
> Ali

One more again as I couldn't believe noone went for 'any' yet:

---
import std.algorithm;
return !s.any!"a > 127"; // code-point level
---
February 22, 2017
On Wed, Feb 22, 2017 at 09:16:24PM +0000, kinke via Digitalmars-d-learn wrote: [...]
> One more again as I couldn't believe noone went for 'any' yet:
> 
> ---
> import std.algorithm;
> return !s.any!"a > 127"; // code-point level
> ---

You win 1 intarwebs for the shortest solution posted so far. ;-)

Though, according to the OP, an exception is wanted, so it should be more along the lines of:

	enforce(!s.any!"a > 127");


T

-- 
A bend in the road is not the end of the road unless you fail to make the turn. -- Brian White
« First   ‹ Prev
1 2