Streams and encoding (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Streams and encoding (page 3)

August 04, 2004

Re: Streams and encoding

Posted by parabolis
in reply to antiAlias

parabolis

Posted in reply to antiAlias

antiAlias wrote:

> The primes.d thing is now a distant and foggy memory :-)
> 
> Can I hook you up with a copy of the latest (much better, with annotated
> source) documentation? You'll see Primes.d is gone, along with some other
> warts:
> http://svn.dsource.org/svn/projects/mango/downloads/mango_beta_9-2_doc.zip

lol - The Mango Tree... just got it from the docs :) I am now without question under the belief that the mango docs are great. I was going to suggest in my last post that I would like to see some docs that cover more of the concept area than just doxygen stuff. I decided that it would probably be to much to excpect :)

Quote:
================================================================
Note that these Tokenizers do not maintain any state of their own. Thus they are all thread-safe.
================================================================
This is always good to know from documentation. :)

However I am curious about IPickle's design. Would it not be possible to serialize objects based on the data in ClassInfo?

August 04, 2004

Re: Streams and encoding

Posted by parabolis
in reply to Regan Heath

parabolis

Posted in reply to Regan Heath

Regan Heath wrote:

> I didn't/don't use slicing. I think you may be confusing two different points I made.
> 
> My first point was that off and len were not required because you can slice into a ubyte[]. So _if_ you use ubyte[] you don't _need_ off and len.
> 
> My second point was that instead of ubyte[] you should use void* for convenience. If you use void* you definately need len.

I see now. I was confused. Sorry.

>>> Sure, and when/where you provide it, what will it look like if the underlying write operation takes a ubyte[] and not a void*? is it possible? is it worse than simply using a void*?
>>
>>
>> I am more concerned with the fact that a ubyte[] should help guard against the char* buffer overruns that creaed a huge security industry. In fact I suspect that you might be somebody from NAV or McAfee and are here only to ensure security holes remain rampant... :P
>>
>> One of the biggest breakthroughs Java made was in the area of security. Part of this breakthrough was a result of their eliminating that nasty char* and using arrays with length info builtin. Having said that... Of course it is possible to read a
>> int/long/real/whatever from a byte buffer. Moreover you can test
>> to see if something went wrong in the buffer because you know how long it is...
>> ================================================================
>> int readInt( ubyte buf, uint off = 0 ) {
> 
> 
> Typo, you missed the [], I have added them below.
> 
>> int readInt( ubyte[] buf, uint off = 0 ) {
>>      if( buf.length <= off+4 )
>>          throw Error( "Buffer overrun" );
>>      uint result = buf[off+0];
>>      result |= (cast(int)(buf[off+1])) << 8;
>>      result |= (cast(int)(buf[off+2])) << 16;
>>      result |= (cast(int)(buf[off+3])) << 24;
>>      return result;
>> }
>> ================================================================
> 
> 
> And this is supposed to be nicer/easier/more efficient than..
> 
> bool readInt(out int x) {
>   if (read(&x,x.sizeof) != x.sizeof)
>     throw new Exception("Out of data");
>   return true;
> }
> 
> As you can see using void* allows very convenient and totally buffer overrun safe code.

Show me a safe function that takes void* as a parameter. That was really more the point I was making. There is no way to guanratee in read(void*,uint len) that len is not actually longer than the array someone passes in. When that happens your read function will overwrite the end of the array and eventually write over executable code. Somebody will find that bug and send a specially formatted overly long string that has machine code in it and hijack the program.

> 
> <snip>
> 
>> I am glad to hear you decided to split them. I think you will find it makes life simpler.
>>
>> I am not much of a generic programmer. So I am waiting to see how you deal with the combinatorial problem before I am sold on the idea. If you can pull it off then you might be onto something. :)
> 
> 
> You mean the problem you see with threads and shared buffers?
> 

Sorry I meant the problem with threads and shared buffers should be easier now.

The bit about the combinatorial problem goes back to the other thread in which I wanted to see how you combine multiple streams...

August 04, 2004

Re: Streams and encoding

Posted by antiAlias
in reply to parabolis

antiAlias

Posted in reply to parabolis

"parabolis" wrote..

> Quote: ================================================================ Note that these Tokenizers do not maintain any state of their own. Thus they are all thread-safe. ================================================================ This is always good to know from documentation. :)
>
>
> However I am curious about IPickle's design. Would it not be possible to serialize objects based on the data in ClassInfo?

Doing it the introspection way (ala Java) has a bunch of issues all of it's
own, and D doesn't have the power to expose all the requisite data as yet (I
could be wrong on the latter though).

IPickle was a nice and simple way to approach it; there's no monkey business anywhere (like Java has), it's explicit, and it's very fast. While not an overriding design factor, throughput is one of the main things all the Mango branches/packages keep an watchful eye upon. Frankly, I'd like to see a decent introspection approach emerge along the way; perhaps as a complement rather than a replacement: within Mango there's no obvious reason why the two approaches could not produce an equivalent serialized stream, and therefore be interchangeable at the endpoints.

This is one area where I think getting other people involved in the project would help tremendously.

August 04, 2004

Re: Streams and encoding

Posted by antiAlias
in reply to Sean Kelly

antiAlias

Posted in reply to Sean Kelly

You are absolutely right. But not many people seem to know about Mango, so the opportunity for "spreading the news" was too great to pass up :-)

"Sean Kelly" <sean@f4.ca> wrote in message news:cep64d$1o0t$1@digitaldaemon.com...
> In article <cep4dd$1nde$1@digitaldaemon.com>, antiAlias says...
> >
> >What you good folks seem to be describing is pretty much how mango.io operates. All the questions raised so far are quietly handled by that library (even the separate input & output buffers, if you want that), so
it
> >might be worthwhile checking it out. It's also house-trained, documented, and has a raft of additional features that you selectively apply where appropriate (it's not all tragically intertwined).
>
> Yup.  I've played around with Mango and kind of like it.  One of the
reasons I
> started these stream mods was to have an alternate design to compare to
Mango
> for the sake of discussion.  ie. I don't want folks to settle on Mango
simply
> because the other choices are missing features.
>
> >I think it's great to have "competing" libraries under way, but at some point is it worth considering funneling efforts instead? Perhaps not?
>
> Definately.
>
>
> Sean
>
>

August 04, 2004

Re: Streams and encoding

Posted by Regan Heath
in reply to parabolis

Regan Heath

Posted in reply to parabolis

On Tue, 03 Aug 2004 23:30:03 -0400, parabolis <parabolis@softhome.net> wrote:

<snip>

> Show me a safe function that takes void* as a parameter. That was really more the point I was making. There is no way to guanratee in read(void*,uint len) that len is not actually longer than the array someone passes in. When that happens your read function will overwrite the end of the array and eventually write over executable code. Somebody will find that bug and send a specially formatted overly long string that has machine code in it and hijack the program.

I agree this is a problem, I have been dealing with it for years at work (we work with C only).

The solution in this case is that nobody outside the Stream template class actually calls the read/write functions that take void* instead they call the ones provided for int, float, ubyte[], and so on.

However, someone might want the void* ones in order to read/write a struct..

..

I have just discovered you can use ubyte[] and get the same sort of function as my void* one, check out...

class Stream
{
	ulong read(ubyte[] buffer, ulong length = 0, ulong offset = 0)
	{
		if (length == 0) length = buffer.length;
		buffer[offset..length] = 65;
		return length-offset;
	}

	bool read(out char x)
	{
		if (read(cast(ubyte[])(&x)[0..x.sizeof]) != x.sizeof)
			return false;
		return true;
	}
}

void main()
{
	Stream st = new Stream();
	char c;

	st.read(c);
	printf("%c\n",c);
}

as you can see using a cast, a slice and the address of the char we can do the same thing as with a void *.

So now the read function takes a ubyte[] and is itself buffer safe.. however this does not mean buffer overruns are not possible, consider...

void badBuggyRead(out char x)
{
	read(cast(ubyte[])(&x)[0..1000]);
}

so even tho read uses a ubyte[] it can still overrun.

>> <snip>
>>
>>> I am glad to hear you decided to split them. I think you will find it makes life simpler.
>>>
>>> I am not much of a generic programmer. So I am waiting to see how you deal with the combinatorial problem before I am sold on the idea. If you can pull it off then you might be onto something. :)
>>
>>
>> You mean the problem you see with threads and shared buffers?
>>
>
> Sorry I meant the problem with threads and shared buffers should be easier now.

:)

> The bit about the combinatorial problem goes back to the other thread in which I wanted to see how you combine multiple streams...

Ahh yes.. I am waiting for an idea to come to me.. my first idea is that I combine them in the same way as I combine the ones I currently have eg.

alias OutputStream!(InputStream!(RawFile)) File;

or something, I have not tried splitting them yet, then..

alias CRCReader!(File) CRCFileReader;
alias CRCWriter!(File) CRCFileWriter;

alias ZIPReader!(File) ZIPFileReader;
alias ZIPWriter!(File) ZIPFileWriter;

now, this is fine for types we know about at compile time, however we may need to choose at runtime, so some sort of factory approach will have to be used...

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

August 04, 2004

Re: Streams and encoding

Posted by Andy Friesen
in reply to parabolis

Andy Friesen

Posted in reply to parabolis

parabolis wrote:

> Regan Heath wrote:
> 
>> On Tue, 03 Aug 2004 16:21:05 -0400, parabolis <parabolis@softhome.net> wrote:
>>
>> <snip>
>>
>>> Here is the foundation of the stream library I imagine:
>>> ================================================================
>>> interface DataSink {
>>>      uint write( ubyte[] data, uint off = 0, uint len = 0);
>>> }
>>>
>>> interface DataSource {
>>>      uint read( inout ubyte[] data, uint off = 0, uint len = 0);
>>>      ulong seek( ulong size );
>>> }
>>> ================================================================
>>
>>
>>
>> I think you need functions in the form:
>>
>>   ulong write(void* data, ulong len = 0, ulong off = 0);
>>
>> notice I have changed ubyte[] to void*, changed the order of the last two parameters and changed uint into ulong.
>>
>> If you use ubyte[] you don't need len or off as you can call with:
>>   ubyte[] big = "regan was here";
>>   write(big[6..9]);
>> to achieve both.
> 
> 
> I will concede the order was wrong. However I believe the slicing will need to create another array wrapper in memory which is then going to have to be GCed. The len and off parameters allow a caller to take either approach.

Slicing does not create garbage.  Arrays really are value types that get copied when you pass them to a function.  You can generally treat them as reference types because the data they refer to is not copied along with them.

An array is quite literally little more than this:

    struct Array(T) {
        T* data;
        int length;
    }

Might I suggest that DataSources and DataSinks use void[]?

void[] knows how many bytes it points to and is slicable.  Whether or not void[] was created for this exact scenerio is uncertain, but they are exceptionally well suited to the task regardless.

(incidently, slicing void* is legal as well)

> The void* is a pointer with no associated type. The arrays in D are infinitely better than void* pointers because arrays have extra information. As I said earlier in my post the behavior of providing data in a particular non-byte format should be done elsewhere in a single DataXXStream.

The whole idea behind DataSources and DataSinks is that they just pull bytes in and out of some other place without ever having any concern for their meaning.

This is a textbook case of the right place to use void*. :)  (or void[])

 -- andy

August 04, 2004

Re: Streams and encoding

Posted by Regan Heath
in reply to Andy Friesen

Regan Heath

Posted in reply to Andy Friesen

On Tue, 03 Aug 2004 21:30:29 -0700, Andy Friesen <andy@ikagames.com> wrote:
> On Tue, 03 Aug 2004 16:21:05 -0400, parabolis <parabolis@softhome.net> wrote:
>> I will concede the order was wrong. However I believe the slicing will need to create another array wrapper in memory which is then going to have to be GCed. The len and off parameters allow a caller to take either approach.
>
> Slicing does not create garbage.

Really? doesn't slicing create another array structure (the one you have described below) exactly the same as if/when you pass one to a function, so..

void foo(char[] a)
{
}
void main()
{
  char[] a = "12345";
  foo(a[1..3]);
}

the above code creates 3 arrays:
 1- 'a' at the start of main
 2- one for the slice
 3- one for the function call.

leaving out the slice creates one less copy of the array (not the data)

I think that is what parabolis meant.

> Arrays really are value types that get copied when you pass them to a function.  You can generally treat them as reference types because the data they refer to is not copied along with them.
>
> An array is quite literally little more than this:
>
>      struct Array(T) {
>          T* data;
>          int length;
>      }
>
> Might I suggest that DataSources and DataSinks use void[]?
>
> void[] knows how many bytes it points to and is slicable.  Whether or not void[] was created for this exact scenerio is uncertain, but they are exceptionally well suited to the task regardless.
>
> (incidently, slicing void* is legal as well)
>
>> The void* is a pointer with no associated type. The arrays in D are infinitely better than void* pointers because arrays have extra information. As I said earlier in my post the behavior of providing data in a particular non-byte format should be done elsewhere in a single DataXXStream.
>
> The whole idea behind DataSources and DataSinks is that they just pull bytes in and out of some other place without ever having any concern for their meaning.
>
> This is a textbook case of the right place to use void*. :)  (or void[])

I agree void* or void[] should be used.

Parabolis's other concern was a buffer overrun, but as I see it neither void[], void * or ubyte[] are any more buffer safe (see my other post for a detailed explaination)

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

August 04, 2004

Re: Streams and encoding

Posted by parabolis
in reply to antiAlias

parabolis

Posted in reply to antiAlias

antiAlias wrote:

> "parabolis" wrote..
> 
> 
>>Quote:
>>================================================================
>>Note that these Tokenizers do not maintain any state of their
>>own. Thus they are all thread-safe.
>>================================================================
>>This is always good to know from documentation. :)
>>
>>
>>However I am curious about IPickle's design. Would it not be
>>possible to serialize objects based on the data in ClassInfo?
> 
> 
> Doing it the introspection way (ala Java) has a bunch of issues all of it's
> own, and D doesn't have the power to expose all the requisite data as yet (I
> could be wrong on the latter though).

I think I was premature to suppose D could do that. I just gave the issue some thought and there is just enough introspection to make a shallow copy which is obviously not sufficient.

> IPickle was a nice and simple way to approach it; there's no monkey business
> anywhere (like Java has), it's explicit, and it's very fast. While not an
> overriding design factor, throughput is one of the main things all the Mango
> branches/packages keep an watchful eye upon. Frankly, I'd like to see a
> decent introspection approach emerge along the way; perhaps as a complement
> rather than a replacement: within Mango there's no obvious reason why the
> two approaches could not produce an equivalent serialized stream, and
> therefore be interchangeable at the endpoints.

Any automated serializing algorithm would have to either allow IPickles to [de-]serialize themselves or ignore read/write. However given one of those holds then the serialization ought to be compatible.

> This is one area where I think getting other people involved in the project
> would help tremendously.

I think I am probably sold on being willing to help. It is more an issue of whether I can provide anything that will further mango. :)

August 04, 2004

Re: Streams and encoding

Posted by parabolis
in reply to Andy Friesen

parabolis

Posted in reply to Andy Friesen

Andy Friesen wrote:
>> I will concede the order was wrong. However I believe the slicing will need to create another array wrapper in memory which is then going to have to be GCed. The len and off parameters allow a caller to take either approach.
> 
> 
> Slicing does not create garbage.  Arrays really are value types that get copied when you pass them to a function.  You can generally treat them as reference types because the data they refer to is not copied along with them.
> 
> An array is quite literally little more than this:
> 
>     struct Array(T) {
>         T* data;
>         int length;
>     }
> 

That is what I meant by a wrapper. It is actually defined in
    phobos\internal\adi.d
Given that it is a struct it will be created on the stack and thus not GCed. However I still like to have the option to decide between the two. :)


> Might I suggest that DataSources and DataSinks use void[]?
> 
> void[] knows how many bytes it points to and is slicable.  Whether or not void[] was created for this exact scenerio is uncertain, but they are exceptionally well suited to the task regardless.
> 
> (incidently, slicing void* is legal as well)
> 
>> The void* is a pointer with no associated type. The arrays in D are infinitely better than void* pointers because arrays have extra information. As I said earlier in my post the behavior of providing data in a particular non-byte format should be done elsewhere in a single DataXXStream.
> 
> 
> The whole idea behind DataSources and DataSinks is that they just pull bytes in and out of some other place without ever having any concern for their meaning.
> 
> This is a textbook case of the right place to use void*. :)  (or void[])

I had no idea there is a void[] in D and will have to consider it. As I explained in another post this is a textbook example of when *not* to use void*. If void[] exists then its use might be justified but honestly it warps my mind even trying to consider it.

August 04, 2004

Re: Streams and encoding

Posted by parabolis
in reply to Regan Heath

parabolis

Posted in reply to Regan Heath

Regan Heath wrote:

> On Tue, 03 Aug 2004 23:30:03 -0400, parabolis <parabolis@softhome.net> wrote:
> 
>> Show me a safe function that takes void* as a parameter. That was really more the point I was making. There is no way to guanratee in read(void*,uint len) that len is not actually longer than the array someone passes in. When that happens your read function will overwrite the end of the array and eventually write over executable code. Somebody will find that bug and send a specially formatted overly long string that has machine code in it and hijack the program.
> 
> 
> I agree this is a problem, I have been dealing with it for years at work (we work with C only).
> 
> The solution in this case is that nobody outside the Stream template class actually calls the read/write functions that take void* instead they call the ones provided for int, float, ubyte[], and so on.
> 
> However, someone might want the void* ones in order to read/write a struct..

That is a good point.

> 
> ...
> 
> I have just discovered you can use ubyte[] and get the same sort of function as my void* one, check out...
> 
> class Stream
> {
>     ulong read(ubyte[] buffer, ulong length = 0, ulong offset = 0)
>     {
>         if (length == 0) length = buffer.length;
>         buffer[offset..length] = 65;

Now that is pretty neat.

> 
> So now the read function takes a ubyte[] and is itself buffer safe.. however this does not mean buffer overruns are not possible, consider...
> 
> void badBuggyRead(out char x)
> {
>     read(cast(ubyte[])(&x)[0..1000]);
> }
> 
> so even tho read uses a ubyte[] it can still overrun.

You can always circumvent a security measure. The point is that with the measure there you *have* to go out of your way to get around it.

> 
>>> <snip>
>>>
>>>> I am glad to hear you decided to split them. I think you will find it makes life simpler.
>>>>
>>>> I am not much of a generic programmer. So I am waiting to see how you deal with the combinatorial problem before I am sold on the idea. If you can pull it off then you might be onto something. :)
>>>
>>>
>>>
>>> You mean the problem you see with threads and shared buffers?
>>>
>>
>> Sorry I meant the problem with threads and shared buffers should be easier now.
> 
> 
> :)
> 
>> The bit about the combinatorial problem goes back to the other thread in which I wanted to see how you combine multiple streams...
> 
> 
> Ahh yes.. I am waiting for an idea to come to me.. my first idea is that I combine them in the same way as I combine the ones I currently have eg.
> 
> alias OutputStream!(InputStream!(RawFile)) File;
> 
> or something, I have not tried splitting them yet, then..
> 
> alias CRCReader!(File) CRCFileReader;
> alias CRCWriter!(File) CRCFileWriter;
> 
> alias ZIPReader!(File) ZIPFileReader;
> alias ZIPWriter!(File) ZIPFileWriter;
> 
> now, this is fine for types we know about at compile time, however we may need to choose at runtime, so some sort of factory approach will have to be used...
> 
> Regan
> 

Consider the number of combinations of just Readers that are possible:

   File,Net,Mem - choose 1 of 3

   Compression
   CRC           } - choose any number and in any order
   Buffering

   Image,Audio,Video - choose 1 of 3

If I am not to sleepy to be thinking straight then there are rougly 100 combinations of readers with just these 9 classes.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation