On Monday, 8 November 2021 at 08:11:12 UTC, FeepingCreature wrote:
>On Sunday, 7 November 2021 at 04:18:25 UTC, Walter Bright wrote:
>It's much better than 0.0. 0.0 is indistinguishable from valid data, and is a very common valid value.
NaN and ReplacementChar are not valid and are easily distinguished.
No, that's exactly the problem. ReplacementChar is not easily distinguished, because it's a valid Unicode character - that's the whole point of it. So just like nan, it can propagate arbitrarily far through your processing pipeline before some downstream process decides that it actually doesn't like it.
Sorry, let me expand on this because I think it's the very core of the disagreement.
I feel you have two options with NaN/ReplacementChar. You can either just accept that this is what you get, and let it propagate throughout your entire pipeline. In that case it's no better than 0.0 - actually, NaN would be worse, because your process would be completely broken with no way to fix it, whereas at least with 0.0 you can maybe get some reasonably-usable data out.
Or you can say that "we don't want to be generating NaN/ReplacementChar." Then where do you draw the line? At the process input/output boundary? But then the process needs to be fixed if it generates nans/fffds. So you want to move your signaling as close to the production site as possible. Preferably, you want to fail at the exact line that the problematic data was produced. So we're back at exceptions in foreach. (Actually, an exception in cast(string) would be the best.)
And that's why I think ReplacementChar/NaN are no better than 0.0. You either embrace them fully as "valid" data, or you handle them at the site of origin; any compromise just makes you worse off than either extreme.