Is this a bug or a VERY sneaky case? (page 2)

On Thursday, 30 December 2021 at 08:26:17 UTC, rempas wrote:

[...]

use D's datatypes, not your own ones (if you want others to look/work on your code too it's better to use the common names for stuff) - but ofc staying consistent across your code is more important

I suppose you mean about "str" right? In this case, I would love using D's "string" if it wasn't immutable by default. It pisses me A LOT when a language tries to "protect" me from myself. There are a lot of other stuff that I would like "string" to have but I wouldn't mind them so much if string was mutable by default (or even better if string literals were "char*" like C and could get automatically casted so I could use char[] without the need of an ".dup"). Another thing is that I'm making a library so people will probably read the definition of "str" out of interest anyway and learn it if they want to use the library as users. And even for people that want to only contribute to a specific place in the code and not use the library (which why would you do that anyway?), the way "str" is used in the code, is similar to how "string" is used (check how they both have a ".ptr" property to get the actual pointer for example) so I don't really think that there is a problem with that.

No actually I meant the u8, u16, etc. - if you stay consistent it's fine, but most of the D ecosystem uses just the D native types (ubyte, ushort, etc.) which have guaranteed bit widths as well. Once working with other code it's possible they could also have custom definitions and then there are 3 or more different aliases for something meaning the same thing.

> >

use is(T == ubyte) etc. instead of your custom is_same!(val, ubyte) (same reason as above, people need to read the definition of is_same first)

This is a simple definition actually so why is it such of a big deal? Also we shouldn't use "is(T == ubyte)" but instead "is(typeof(val) == ubyte)" just like I'm doing it. This is because in variadic functions, "T" will have different type for each argument so it will not work (I made this mistake and people told me so that's how I know). So why do we have to type this much when we can automate this with a simple definition? There is also one for checking if a type is a number (integer), a floating point, a string (including my "str") etc. I don't find these hard to learn and memorize so I don't find a reason to not make my (our) life easier and just use "macros". This is the main reason I use D and not Vox in the first place (and the fact that in Vox you cannot fully work with Variadic functions yet).

If you stay consistent it's fine - once you work with other code which also has a template like this it starts to be possible that there is gonna be more than one definition to do the same simple thing.

Also the name is_same is a little confusing because there is also a __traits(isSame, a, b) which returns true if both arguments are the same symbol. It does not do the typeof(a) == b you are doing. (which I would btw not think of when I read "is_same")

> >

work with slices, not with pointers (if you plan to use your code from D, it's much cleaner and avoids bugs! does not need a trailing null terminator and works with @safe code)

Slices are objects tho and this means paying a runtime cost versus just using a variable. Also the only place I used pointers are with C-type string (char* or u8* in my custom "str") and one member (_count) in my custom "str" struct. And all of this cases were checked very carefully and they are very specific. People that will use the library should rarely need to use pointers and this is what I try to do with my library and why I don't use "libc". However! When it comes to the actual library itself, I want to go as low level as possible and have a library that is as performant as possible.

No slices are basically struct T[] { T* ptr; size_t length; } - it's returning pointer + length in a 16 byte struct (on 64 bit, possibly by returning via 2 registers) and does not introduce any indirections. I think it's the best way to handle more (or less) than one element pointers anywhere in D.

Slices do bounds checking and with that add more safety to your program. You can disable it globally (unless in @safe code) but I would recommend not doing so. It can be a performance issue for algorithms that are on a very hot code path, like in big loops. In these cases I would recommend using some kind of assert(maxValue < slice.length); before your loop and to disable single bounds checks inside the loop use slice.ptr[i] instead of slice[i].

Usually it is not necessary to do this unless you are working on some low-level algorithms like sorting, unicode processing, parsing, etc. Additionally having the length with your array can often give you performance improvements as you can just use a simple loop and don't need to check every item in your array to be the null terminator, which x86 processors can greatly speed up and parallelize!

December 30, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by rempas
in reply to WebFreak001

Permalink

rempas

Posted in reply to WebFreak001

Permalink

On Thursday, 30 December 2021 at 16:17:28 UTC, WebFreak001 wrote:

No actually I meant the u8, u16, etc. - if you stay consistent it's fine, but most of the D ecosystem uses just the D native types (ubyte, ushort, etc.) which have guaranteed bit widths as well.

Oh, these type are actually aliases so they will interact nicely with the language. So it's like "alias u8 = ubyte", "alias i8 = byte", "alias u16 = ushort" etc. A lot of other programming languages have used these names for their types because:

They are shorter
Look nicer (once you get used to them)
And for beginners (and for everyone actually as you can instantly notice), it is easier to know the size of each type (u8 is unsigned 8bit (1byte) integer, i32 is a signed 32bit (4 byte) integer etc.)

Once working with other code it's possible they could also have custom definitions and then there are 3 or more different aliases for something meaning the same thing.

My library will have no dependencies so it will not have to work with other code. These types will be the "official names" used for library development. People that will use the library don't have to use them of course (and that's the awesome thing).

Again we will not work with other code HOWEVER, this definitions are intended (but of course not forced) to be used from library uses. So yeah, "is_same" is indeed not good. Do you suggest any other name (that is not to long)? I'm thinking about "same_type" and "is_type". The first one makes a lot of sense and the second one doesn't make so much sense but it is small and cool ;)

Yeah, this is exactly what I'm saying! You get back a struct (which I suppose is heavier vs a simple variable) and some times you don't need it (which is the case for the places I'm using pointers). Slices are amazing (even in the cases where I mentioned) where you need to take a specific part of a string so you do two things in one place. In this case then ok, slices just make things easier (with less chances of making bugs)!

Yeah, I know. Tbh, to me (and adding to what we already said), slices just seem of a more "beginner friendly" to actually do the same thing you would do with a pointer + a variable to holds its length. And the bounds-checking can actually be unnecessary since we will probably do that ourselves anyway because you will go out of bounds either because:

You don't know what you are doing. Which means that you don't pay attention to what you are writing (don't code drunk please) or you don't understand exactly what your code does (which may also be the case when you copy paste code online). In this case, there is a general problem that you should fix and having the bounds automatically checked for you, is not gonna fix the problem (probably)
A user input value (I'm talking about reading from the standard input) that was out of bounds. In that case you would probably want to tell the user that they gave a wrong input rather than stop the execution of the program. This is the same way, to!byte(val) will throw an exception if they conversion fails rather then give you a value which you can check against your original value to see if the conversion failed (and why). So yeah....

Usually it is not necessary to do this unless you are working on some low-level algorithms like sorting, unicode processing, parsing, etc.

Yeah, makes sense.

Additionally having the length with your array can often give you performance improvements as you can just use a simple loop and don't need to check every item in your array to be the null terminator, which x86 processors can greatly speed up and parallelize!

Yeah, I thought the same. Well, don't worry tho, I decided (even before making this talk) to replace every place in my code that returns "char*" to return my custom "str" so we will eliminate the pointers anyway and string will have a ".length" property (just like how D's string do).

Anyway is that I haven't wrote a lot of code (in general) and mostly I'm thinking about C ways of doing things except for times where I feel limited and I think about something that I need. I will probably upload the code in some weeks (maybe days If I'm not lazy???) so it would be nice for you guys actually read it and give me some review. Thanks a lot for your time replying to me and I wish you a happy new year!!! Of course I'm not implying that we should stop talking, reply me if you need more ;)

Forums