January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote: > So is it 'correct'? Yes, with the caveat that it might find a surrogate pair (like H followed by an accent code point). That's what byGrapheme is about: combining those pairs. But meh, do you really care about that? indexOf does correctly handle the UTF formats and returns an index suitable for slicing (or -1). auto idx = "cool".indexOf("o"); if(idx == -1) throw new Exception("not found"); auto before = "cool"[0 .. idx]; auto after = "cool"[idx + 1 .. $]; Code like that will always yield valid UTF strings. Again, it *might* break up a pair of code points, but it *will* correctly handle multi-byte code points... so probably good enough for 99% of use cases. > Looks like bytes, but then it talks It is bytes on string, and wchars on wstring; it is whatever unit is correct for slicing the type you pass it. > The D docs are pretty terrible, they don't do much to help you find what you're looking for. I mostly agree (and this is partially why I started writing http://dpldocs.info/ but I never finished that so it isn't much better). I don't notice it so much because I already know where to look for most things but regardless I agree it is a pain for anything new. |
January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | BTW, I'll say it again: it was a *lot* easier to get started with this back in the phobos1 days, where std.string WAS the one-stop location for string stuff. At the least, we should get the docs to point people in the right place, but I think we should also do more conceptual overview pages that talk about cross-module things. |
January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Friday, 10 January 2014 at 00:52:27 UTC, H. S. Teoh wrote:
>
> <snip>
>
> So to summarize:
> (1) use sig constraints to define the scope of an overload; and
> (2) use static if inside the function body (or template body) to enforce
> type requirements within that scope.
>
> This solves the problem of needing the compiler to somehow read your
> mind and figure out exactly which of the 56 overloads of find() you
> intended to match but failed to.
>
>
> T
Ok, you've convinced me. I still think highlighting which constraints failed should happen but for well implemented modules like those in the standard library your approach offers even more helpful and tight error messages.
|
January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On Fri, Jan 10, 2014 at 01:18:01AM +0000, Adam D. Ruppe wrote: > BTW, I'll say it again: it was a *lot* easier to get started with this back in the phobos1 days, where std.string WAS the one-stop location for string stuff. I thought it still is? Except that a lot of it is now implicit via public import from std.array and std.algorithm and wherever else. (But I wouldn't know, though, I wasn't around in the D1 days.) > At the least, we should get the docs to point people in the right place, Yeah, I think all public imports should at least get a mention in the ddoc header so that people know what's *actually* getting imported, not just what the docs say are in the module. > but I think we should also do more conceptual overview pages that talk about cross-module things. +1. Currently Phobos has way too many modules under std, and unless you're already familiar with where things are, you wouldn't even know where to start looking when searching for new functionality. T -- Кто везде - тот нигде. |
January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Attachments:
| On 10 January 2014 06:27, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote: > On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote: > > On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote: > [...] > > >On a side note, am I the only one that finds std.algorithm/std.range/etc for string processing really obtuse? I can rarely understand the error messages, so say it's better than STL is optimistic. > > > > I absolutely hate the "does not match any template declaration" error. It's extremely unhelpful for figuring out what you need to do and anytime I try to do something fun with ranges I can expect to see it a dozen times. > > Yeah, that error drives me up the wall too. I often get screenfuls of errors, dumping 25 or so overloads of some obscure Phobos internal function (like toImpl) as though an end-user would understand any of it. You have to parse all the sig constraints (and boy some of them are obscure), *understand* what they mean (which requires understanding how Phobos works internally), and *then* try to figure out, by elimination, which is the one that you intended to match, and why your code failed to match it. > > I'm almost tempted to say that using sig constraints to differentiate between template overloads is a bad idea. Instead, consider this alternative implementation of toImpl: > > template toImpl(S,T) > // N.B.: no sig constraints here > { > static if (... /* sig constraint conditions for overload > #1 */) > { > S toImpl(T t) > { > // implementation here > } > } > else static if (... /* sig constraint conditions for > overload #2 */) > { > S toImpl(T t) > { > // implementation here > } > } > ... > else // N.B.: user-readable error message > { > static assert(0, "Unable to convert " ~ > T.stringof ~ " to " ~ S.stringof); > } > } > > By putting all overloads inside a single template, we can give a useful default message when no overloads match. > *THIS* .. I've always thought that, and intuitively written my D code that way. Funnily, I was always concerned I was being unidiomatic doing so, since the 'std' code is rarely written like that. Alternatively, maybe sig constraints can have an additional string > parameter that specifies a message that explains why that particular overload was rejected. These messages are not displayed if at least one overload matches; only if no overload matches, they will be displayed (so that the user can at least see why each of the overloads didn't match). > > > [...] > > >I also find the names of the generic algorithms are often unrelated to the name of the string operation. My feeling is, everyone is always on about how cool D is at string, but other than 'char[]', and the builtin slice operator, I feel really unproductive whenever I do any heavy string manipulation in D. > > Really?? I find myself much more productive, because I only have to learn one set of generic algorithms, and I can use them not just for strings but for all sorts of other stuff that implement the range API. > That sounds good in theory, but if any time you try and actually use D's generic algorithms you end up with many of the kind of errors you refer to in your prior paragraph, then that basically undermines the whole experience. I don't like wasting my time, and I don't like pushing my way through learning something that I feel is obtuse to begin with, so I usually take a side path and work around it (most things can be done easily with a couple of nested foreach-es). So, perhaps embarrassingly, despite my 3+ years spent hanging around here, part of the problem is that I barely know/use phobos. Call me lazy, but I don't think it's an unrealistic experience for any end-user. If it saves me time/headache (and bloat) not using it, why would I? ** Yes, it's the 'standard' library, and I like that concept in essence, and feel like I should make use of it on principle... but it's like, you need to already know phobos intimately to think it's awesome, which creates a weird barrier to entry. And the docs don't help a lot. Whereas in languages like C, sure you get familiar with string-specific > functions, but then when you need a similar-operating function for an array of ints, you have to name it something else, and then basically the same algorithm reimplemented for linked lists, called by yet another name, etc.. Added together, it's many times more mental load than just learning a single set of generic algorithms that work on (almost) everything. > > The composability of generic algorithms also allow me to think on a more abstract level -- instead of thinking about manipulating individual chars, I can figure out OK, if I split the string by "," then I can filter for the strings I'm looking for, then join them back again with another delimiter. Since the same set of algorithms work with other ranges too, I can apply exactly the same thought process for working with arrays, linked lists, and other containers, without having to remember 5 different names of essentially the same algorithm but applied to 5 different types. > See, I get that idea about composability. Maybe it's just baggage from C, but I just don't think that way. Maybe that's a large part of why I always go wrong with phobos. I would never think of doing something fundamental like string processing with a sequence of generic algorithm. I'd freak out about the relatively unknown performance characteristics. Algorithms are usually a lot simpler when performed on strings of bytes than they are performed on strings of objects with any imaginable copying mechanisms and allocations patterns. Unless I wrote something myself, I can never have faith that the sort of concessions required to make it generic also make it fast in the case it happens to be performed in a byte array. There's an argument that you can specialise for string types, which is true within single functions, but if you're 'composing' a function with generic parts, then you can't specialise for strings anymore... There's no way to specialise a call to a.b.c() as a compound operation. Like I say, it's probably psychological baggage, but I tend to unconsciously dismiss/reject that sort of thing without a second though... or maybe experience learned me my lesson (*cough* STL). > I actually feel a lot more productive in D than in C++ with strings. > > Boost's string algorithms library helps fill the gap (and at least you only have one place to look for documentation when you are using it) but overall I prefer my experience working in D with pseudo-member chains. > > I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested. > Perhaps you're right. But I think there's ***HUGE*** room for improvement. The key in your sentence is, it shouldn't require 'effort'; if it's not intuitive to programmers with decades of experience, then there are probably some fundamental design (or documentation/accessibility) deficiencies that needs to be prioritised. How is any junior programmer meant to take to D? |
January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Friday, 10 January 2014 at 01:26:50 UTC, H. S. Teoh wrote:
> I thought it still is?
Yeah, mostly, though sometimes the disambiguation leaks the other details (for example replace() sometimes has a name conflict, so you need to explicitly import it or use a full name to disambiguate).
But this is primarily a documentation problem rather than a code one.
Some code differences from the old days:
* before: converting to and from string was in std.string. Functions like toInt, toString, etc. Nowadays, this is all done with std.conv.to. The new way is way cool, but a newbie's first place to look might be for std.string.toString rather than std.conv.to!string.
* before: some char type stuff was in std.string (and the rest in std.ctype IIRC). Now, it is in std.ascii and std.uni.
* before: the signatures were char[] foo(char[]). Nowadays, it is S foo(S)(S s) if(isSomeString!S)... so much wordier! Better functionality, but omg it can be a pain to read and surely intimidating for newbs.
I think things are generally improved as for functionality and consistency, but the docs are more debatable.
|
January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Craig Dillabaugh Attachments:
| On 10 January 2014 06:53, Craig Dillabaugh <cdillaba@cg.scs.carleton.ca>wrote: > On Thursday, 9 January 2014 at 19:05:19 UTC, Adam D. Ruppe wrote: > >> On Thursday, 9 January 2014 at 18:57:26 UTC, Craig Dillabaugh wrote: >> >>> A while ago I was trying to do something with splitter on a string and I ended up asking a question on D.learn. [...] >>> >>> It would be nice if std.string in D provided a nice, easy, string manipulation that swept most of the difficulties under the table >>> >> >> http://dlang.org/phobos/std_array.html#split >> >> Note that std.array is publicly imported from std.string so this works: >> >> void main() { >> import std.string; >> auto parts = "hello".split("l"); >> >> import std.stdio; >> writeln(parts); >> } >> >> >> provided links in the documentation to the functions they wrap for when >>> people want to do more complex things. >>> >> >> Actually, when writing my D book, I decided to spend more time on the unicode stuff in strings than these basic operations, since I thought these were pretty straightforward. >> >> But maybe the docs suck more than I thought. I learned most of D string stuff from Phobos1 which kept it all simple... >> > > Thats the thing. In most cases the correct way to do something in D, does end up being rather nice. However, its often a bit of a challenge finding the that correct way! > > When I had my troubles I expected to find the library solutions in std.string (remember I rarely use D's string processing utilities). It never really occurred to me that I might want to check std.array for the function I wanted. So what it std.array is imported when I import std.string, as a programmer I still had no idea 'split()' was there! > > At the very least the documentation for std.string should say something along the lines of: > > "The libraries std.unicode and std.array also include a number of functions that operate on strings, so if what you are looking for isn't here, try looking there." > Or just alias the functions useful for string processing... |
January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
> [snip]
Using std.algorithm or std.range requires learning about ranges. You shouldn't be surprised that string handling with ranges works differently from specialized string handling functions, which is the norm in most languages. For anyone with even a cursory knowledge of ranges and range algorithms, it's no surprise when the result of a range composition is not of string type even when the input is a string.
If you don't want to learn about ranges, use std.string. If std.string is not sufficient, then you should consider learning about ranges, which means accepting that yes, things will be different. Learning about ranges and how to use them for string manipulation is not the easiest thing right now due to a dearth of learning material, but that's not a problem with ranges. Compiler error messages are indeed part of the problem, but they are a WIP. 2.065 contains an incremental improvement to error messages on failure of overload resolution (Thanks Kenji).
About Unicode, the unit that the language promotes and the standard library embraces is `dchar`, the Unicode code point. The choice of not using graphemes is a compromise between correctness and performance. That means that the onus is still on the user to cover the last mile of correctness, so the user is not exempt from having to learn at least the basics of Unicode in order to write Unicode-correct code in D. However, this is a surprisingly reasonable compromise: as long as all inputs are normalized to the same format (which may require std.uni.normalize if the source of the input does not guarantee a particular format), then outside of contrived examples it's very hard to break grapheme clusters by using range-based code, even though they are ranges of code points. Explicit handling of graphemes is typically only needed for very specific domains, like if you're writing a text rendering library or a text input box etc. Thus typical range-based string manipulation tends to be correct even for multi-code-point graphemes, without the author having to consciously handle it.
2.065 has std.uni.byGrapheme/byCodePoint for range-based grapheme manipulation. However, there is a performance cost involved so I recommend against using it dogmatically. The result of `byGrapheme` is not bidirectional yet - someone needs to take the time to implement `decodeGraphemeBack` and/or `graphemeStrideBack` first.
|
January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On Fri, Jan 10, 2014 at 01:34:46AM +0000, Adam D. Ruppe wrote: [...] > Some code differences from the old days: > > * before: converting to and from string was in std.string. Functions like toInt, toString, etc. Nowadays, this is all done with std.conv.to. The new way is way cool, but a newbie's first place to look might be for std.string.toString rather than std.conv.to!string. Right, so it should be mentioned in std.string. But probably your idea of more concept-oriented overview pages is better. It doesn't seem like the right solution to just insert hyperlinks to std.conv in every other Phobos module. > * before: some char type stuff was in std.string (and the rest in std.ctype IIRC). Now, it is in std.ascii and std.uni. Yeah, this is one of the things I found annoying. Sure I understand why std.ascii needs to be different from std.uni, but then you have stuff split across std.string, std.ascii, std.uni, and std.utf -- what's the diff between std.utf and std.uni?! (Yes I know what the diff is, the point is that it looks silly to a newcomer.) > * before: the signatures were char[] foo(char[]). Nowadays, it is S > foo(S)(S s) if(isSomeString!S)... so much wordier! Better > functionality, but omg it can be a pain to read and surely > intimidating for newbs. Sig constraints seriously need to be formatted differently from the way they are right now, which is an unreadable blob of obtuse text. Take std.algorithm.makeIndex, for example. How do you even *read* that mess??! It's 6 lines of dense, *bolded* text (on my browser anyway, YMMV), and it's not even clear that it's actually two overloads. I have trouble telling what exactly it returns, and where its parameter lists start and end. Nor what the sig constraints actually mean. Actually, this particular case seems to be a prime example of the sig constraint vs. static if idea I had in another post (i.e., sig constraints should only define the scope of the overload, and type requirements on arguments within that scope should be inside static ifs in the body of the function / template). From what I can see, makeIndex really should be in a *single* template, probably with no sig constraints (or only very simple ones), and everything else should be inside the template body as static if blocks. Whatever is unclear from the outer sig constraints should be explained in the text of the ddoc. Users shouldn't be expected to be able to parse sig constraints that are really Phobos internal implementation details. > I think things are generally improved as for functionality and consistency, but the docs are more debatable. I agree, functionality is more unified and consistent, but the docs are very newbie-unfriendly. T -- Why can't you just be a nonconformist like everyone else? -- YHL |
January 10, 2014 Re: Should this work? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote: > The D docs are pretty terrible, they don't do much to help you find what > you're looking for. > You have a massive block of function names at the top of the page, you have > to carefully scan through one by one, hoping that it's named something > obvious that will stand out to you, and in the event it doesn't have a > helper function, you need to work out the proper sequence of > algorithm/range/whatever operations to do what you want (and then repeat > the process finding the small parts you need across a bunch of modules). DDox improves on this a bit by giving a table with brief descriptions right up top: http://vibed.org/temp/dlang.org/library/std/string.html Still plenty left to do though. > > Blah! </endrant> |
Copyright © 1999-2021 by the D Language Foundation