Thread overview | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 22, 2017 Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us. Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so?
Here a sketch of my function:
>void foo(string postscript)
>{
> // throw Exception, if postscript is not all ascii
> // other stuff, assuming codeunit=codepoint
>}
|
February 22, 2017 Re: Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | On Wed, Feb 22, 2017 at 07:26:15PM +0000, berni via Digitalmars-d-learn wrote: > In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us. Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so? [...] Hmm... What about: import std.range.primitives; bool isAsciiOnly(R)(R input) if (isInputRange!R && is(ElementType!R : dchar)) { import std.algorithm.iteration : fold; return input.fold!((a, b) => a && b < 0x80)(true); } unittest { assert(isAsciiOnly("abcdefg")); assert(!isAsciiOnly("abcбвг")); } Basically, it iterates over the string / range of characters and checks that every character is less than 0x80, since anything that's 0x80 or greater cannot be ASCII. T -- INTEL = Only half of "intelligence". |
February 22, 2017 Re: Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
On Wed, Feb 22, 2017 at 11:43:00AM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...] > import std.range.primitives; > > bool isAsciiOnly(R)(R input) > if (isInputRange!R && is(ElementType!R : dchar)) > { > import std.algorithm.iteration : fold; > return input.fold!((a, b) => a && b < 0x80)(true); > } > > unittest > { > assert(isAsciiOnly("abcdefg")); > assert(!isAsciiOnly("abcбвг")); > } [...] Ah, missing the Exception part: void foo(string input) { if (!input.isAsciiOnly) throw new Exception("..."); } T -- Why are you blatanly misspelling "blatant"? -- Branden Robinson |
February 22, 2017 Re: Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote:
> In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us. Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so?
>
> Here a sketch of my function:
>
>>void foo(string postscript)
>>{
>> // throw Exception, if postscript is not all ascii
>> // other stuff, assuming codeunit=codepoint
>>}
void foo(string postscript)
{
import std.ascii, astd.algorithm.ieration;
if (!args[0].filter!(a => !isASCII(a)).empty)
throw new Exception("bla");
}
|
February 22, 2017 Re: Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
Posted in reply to jklm | On Wednesday, 22 February 2017 at 19:57:22 UTC, jklm wrote:
> On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote:
>> In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us. Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so?
>>
>> Here a sketch of my function:
>>
>>>void foo(string postscript)
>>>{
>>> // throw Exception, if postscript is not all ascii
>>> // other stuff, assuming codeunit=codepoint
>>>}
>
>
> void foo(string postscript)
> {
> import std.ascii, astd.algorithm.ieration;
> if (!postscript.filter!(a => !isASCII(a)).empty)
> throw new Exception("bla");
> }
\s postscript args[0]
|
February 22, 2017 Re: Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote:
> herefore I'd like to make sure that the string the program read is only made up of ascii characters.
Easiest:
foreach(char ch; postscript)
if(ch > 127) throw new Exception("non-ascii detected");
|
February 22, 2017 Re: Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote: > In my program, I read a postscript file. Normal postscript files should only be composed of ascii characters, but one never knows what users give us. Therefore I'd like to make sure that the string the program read is only made up of ascii characters. This simplifies the code thereafter, because I then can assume, that codeunit==codepoint. Is there a simple way to do so? > > Here a sketch of my function: > >>void foo(string postscript) >>{ >> // throw Exception, if postscript is not all ascii >> // other stuff, assuming codeunit=codepoint >>} Making full use of the standard library: ---- import std.algorithm: all; import std.ascii: isASCII; import std.exception: enforce; enforce(postscript.all!isASCII); ---- That checks on the code point level (because strings are ranges of dchars). If you want to be clever, you can avoid decoding and check on the code unit level: ---- /* other imports as above */ import std.utf: byCodeUnit; enforce(postscript.byCodeUnit.all!isASCII); ---- Or you can do it manually, avoiding all those imports: ---- foreach (char c; postscript) if (c > 0x7F) throw new Exception("not ASCII"); ---- |
February 22, 2017 Re: Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | On 02/22/2017 12:02 PM, ag0aep6g wrote: > On Wednesday, 22 February 2017 at 19:26:15 UTC, berni wrote: >> In my program, I read a postscript file. Normal postscript files >> should only be composed of ascii characters, but one never knows what >> users give us. Therefore I'd like to make sure that the string the >> program read is only made up of ascii characters. This simplifies the >> code thereafter, because I then can assume, that codeunit==codepoint. >> Is there a simple way to do so? >> >> Here a sketch of my function: >> >>> void foo(string postscript) >>> { >>> // throw Exception, if postscript is not all ascii >>> // other stuff, assuming codeunit=codepoint >>> } > > Making full use of the standard library: > > ---- > import std.algorithm: all; > import std.ascii: isASCII; > import std.exception: enforce; > > enforce(postscript.all!isASCII); > ---- > > That checks on the code point level (because strings are ranges of > dchars). If you want to be clever, you can avoid decoding and check on > the code unit level: > > ---- > /* other imports as above */ > import std.utf: byCodeUnit; > > enforce(postscript.byCodeUnit.all!isASCII); > ---- > > Or you can do it manually, avoiding all those imports: > > ---- > foreach (char c; postscript) if (c > 0x7F) throw new Exception("not > ASCII"); > ---- One more: bool isAscii(string s) { import std.string : representation; import std.algorithm : canFind; return !s.representation.canFind!(c => c >= 0x80); } unittest { assert(isAscii("hello world")); assert(!isAscii("hellö wörld")); } Ali |
February 22, 2017 Re: Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Wednesday, 22 February 2017 at 20:07:34 UTC, Ali Çehreli wrote:
> One more:
>
> bool isAscii(string s) {
> import std.string : representation;
> import std.algorithm : canFind;
> return !s.representation.canFind!(c => c >= 0x80);
> }
>
> unittest {
> assert(isAscii("hello world"));
> assert(!isAscii("hellö wörld"));
> }
>
> Ali
One more again as I couldn't believe noone went for 'any' yet:
---
import std.algorithm;
return !s.any!"a > 127"; // code-point level
---
|
February 22, 2017 Re: Checking, whether string contains only ascii. | ||||
---|---|---|---|---|
| ||||
Posted in reply to kinke | On Wed, Feb 22, 2017 at 09:16:24PM +0000, kinke via Digitalmars-d-learn wrote: [...] > One more again as I couldn't believe noone went for 'any' yet: > > --- > import std.algorithm; > return !s.any!"a > 127"; // code-point level > --- You win 1 intarwebs for the shortest solution posted so far. ;-) Though, according to the OP, an exception is wanted, so it should be more along the lines of: enforce(!s.any!"a > 127"); T -- A bend in the road is not the end of the road unless you fail to make the turn. -- Brian White |
Copyright © 1999-2021 by the D Language Foundation