December 17, 2012
On 12/17/2012 3:02 AM, mist wrote:
> AFAIK those are more like Windows API & ABI reverse engineered and reimplemented
> and that is a huge difference.

Yup. I'd be very surprised if they were based on decompiled Windows executables.

Not only that, I didn't say decompiling by hand was impossible. I repeatedly said that it can be done by an expert with a lot of patience.

But not automatically. Java .class files can be done automatically with free tools.

December 17, 2012
On 2012-12-17 16:01, Walter Bright wrote:

> Yup. I'd be very surprised if they were based on decompiled Windows
> executables.
>
> Not only that, I didn't say decompiling by hand was impossible. I
> repeatedly said that it can be done by an expert with a lot of patience.
>
> But not automatically. Java .class files can be done automatically with
> free tools.

Fair enough.

-- 
/Jacob Carlborg
December 17, 2012
Am 16.12.2012 23:32, schrieb Andrej Mitrovic:
> On 12/16/12, Paulo Pinto <pjmlp@progtools.org> wrote:
>> If modules are used correctly, a .di should be created with the public
>> interface and everything else is already in binary format, thus the
>> compiler is not really parsing everything all the time.
>
> A lot of D code tends to be templated code, .di files don't help you
> in that case.
>

Why not?

Ada, Modula-3, Eiffel, ML languages are just a few examples of languages that support modules and genericity.

So clearly there are some ways of doing it.

Granted, in Ada and Modula-3 case you actually have to define the types
when importing a module, so there is already a difference.

A second issue is that their generic systems are not as powerful as D.

I think that the main issue is that the majority seems to be ok with
having template code in .di files, and that is ok.

But if there was interest, I am sure there could be a way to store the
template information in the compiled module, while exposing the required type parameters for the template in the .di file, a la Ada.

--
Paulo
December 17, 2012
On Monday, 17 December 2012 at 09:37:48 UTC, Walter Bright wrote:
> On 12/17/2012 12:55 AM, Paulo Pinto wrote:
>> Assembly is no different than reversing any other type of bytecode:
>
> This is simply not true for Java bytecode.
>
> About the only thing you lose with Java bytecode are local variable names. Full type information and the variables themselves are intact.
>

It depends on compiler switch you use. You can strip name, but it obviously impact reflection capabilities.

Also, Java is quite easy to decompile due to the very simple structure of the language. Even if in some case, optimization can confuse the decompiler.

Try on other JVM languages like closure, scale or groovy. The produced code can hardly be understood except by a specialist.

Granted, this is still easier than assembly, but you neglected the fact that java is rather simple, where D isn't. It is unlikely that an optimized D bytecode can ever be decompiled in a satisfying way.
December 17, 2012
On Monday, 17 December 2012 at 09:40:22 UTC, Walter Bright wrote:
> On 12/17/2012 12:54 AM, deadalnix wrote:
>> More seriously, I understand that in some cases, di are interesting. Mostly if
>> you want to provide a closed source library to be used by 3rd party devs.
>
> You're missing another major use - encapsulation and isolation, reducing the dependencies between parts of your system.
>

For such a case, bytecode is a superior solution as it would allow CTFE.

> Do you really want to be recompiling the garbage collector for every module you compile? It's not because the gc is closed source that .di files are useful.

December 17, 2012
On Monday, 17 December 2012 at 20:31:08 UTC, Paulo Pinto wrote:
> But if there was interest, I am sure there could be a way to store the
> template information in the compiled module, while exposing the required type parameters for the template in the .di file, a la Ada.
>
> --
> Paulo

Maybe the focus should not be on obfuscation directly, but on making the .di packaging system perform better. If library content can be packaged in a more efficient way through the use of D interface files, then at least there's some practical use for it that may one day get implemented.

--rt
December 17, 2012
On 12/17/2012 12:49 PM, deadalnix wrote:
> Granted, this is still easier than assembly, but you neglected the fact that
> java is rather simple, where D isn't. It is unlikely that an optimized D
> bytecode can ever be decompiled in a satisfying way.

Please listen to me.

You have FULL TYPE INFORMATION in the Java bytecode.

You have ZERO, ZERO, ZERO type information in object code. (Well, you might be able to extract some from mangled global symbol names, for C++ and D (not C), if they haven't been stripped.) Do not underestimate what the loss of ALL the type information means to be able to do meaningful decompilation.

Please understand that I actually do know what I'm talking about with this stuff. I have written a Java compiler. I know what it emits. I know what's in Java bytecode, and how it is TRIVIALLY reversed back into Java source.

The only difference between Java source code and Java bytecode is the latter has local symbol names and comments stripped out. There's a 1:1 correspondence.

This is not at all true with object code.

(Because .class files have full type information, a Java compiler can "import" either a .java file or a .class file with equal facility.)
December 17, 2012
On 12/17/2012 12:51 PM, deadalnix wrote:
> On Monday, 17 December 2012 at 09:40:22 UTC, Walter Bright wrote:
>> On 12/17/2012 12:54 AM, deadalnix wrote:
>>> More seriously, I understand that in some cases, di are interesting. Mostly if
>>> you want to provide a closed source library to be used by 3rd party devs.
>>
>> You're missing another major use - encapsulation and isolation, reducing the
>> dependencies between parts of your system.
>>
>
> For such a case, bytecode is a superior solution as it would allow CTFE.

There is no substantive difference between bytecode and source code, as I've been trying to explain. It is not superior in any way, (other than being shorter, and hence less costly to transmit over the internet).

I've also done precompiled headers for C and C++, which are more or less a binary module importation format.

So, I have extensive personal experience with:

1. bytecode modules
2. binary symbolic modules
3. modules as source code

I picked (3) for D, based on real experience with other methods of doing it. (3) really is the best solution.

I've often thought Java bytecode was a complete joke. It doesn't deliver any of its promises. You could tokenize Java source code, run the result through an lzw compressor, and get the equivalent functionality in every way.

And yes, you can do the same with D modules. Tokenize, run through an lzw compressor, and voila! a "binary" module import format that is small, loads fast, and "obfuscated", for whatever little that is worth.


December 17, 2012
On Monday, 17 December 2012 at 12:54:46 UTC, jerro wrote:
>> If we want to allow D to fit into various niche markets overlooked by C++, for added security, encryption could be added, where the person compiling encrypted .di files would have to supply a key. That would work only for certain situations, not for mass distribution, but it may be useful to enough people.
>
> I can't imagine a situation where encrypting .di files would make any sense. Such files would be completely useless without the key, so you would have to either distribute the key along with the files or the compiler would need to contain the key. The former obviously makes encryption pointless and you could only make the latter work by attempting to hide the key inside the compiler. The fact that the compiler is open source would make that harder and someone would eventually manage to extract the key in any case. This whole DRM business would also prevent D from ever being added to GCC.

Of course open source code would never be encrypted, I was suggesting an entirely optional convenience feature for users of the compiler and not a general method of storing library files or for providing a fool proof method for the mass distribution of hidden content.

Having such a feature would allow a company or individual to package up their source code in a way that no one could look at with out a specific key. It does not matter if the compiler is open source or not, only a user with the correct key could potentially decrypt the contents in a way that was unintended.

Obviously anyone who has enough skills and the correct key to a specific encrypted package could decrypt the contents of that specific package (and then post it on usenet or bt or in a million+1 other ways), but you would still need access to the key and you would need access to a tool that decrypts the contents (such as the compiler itself), but that's what security is all about, it's simply a set of barriers that make it difficult, but not impossible, to break through. All security systems are breakable, end of story, no debate there, just take a look around you and see for yourself.

The difference between packaging in an encrypted archive, which is later decrypted and installed for use, is that in this case the source code is never left lying around in a form that is decrypted, it is also more secure because the source data is decrypted only when it is being compiled, and the decrypted content is immediately discarded (in a secure way) afterwards.

BTW, for the record I'm no fan of DRM in the general sense, but many companies think they need to lock out prying eyes and it's not my place to tell them that they should not be worried about it and fully open up their doors to whatever content they want to distribute.

--rt
December 17, 2012
On Monday, 17 December 2012 at 21:47:36 UTC, Walter Bright wrote:
>
> There is no substantive difference between bytecode and source code, as I've been trying to explain. It is not superior in any way, (other than being shorter, and hence less costly to transmit over the internet).
>

I mentioned in a previous post that we should perhaps focus on making the .di concept more efficient rather than focus on obfuscation.

Shorter file sizes is a potential use case, and you could even allow a distributor of byte code to optionally supply in compressed form that is automatically uncompressed when compiling, although as a trade off that would add on a small compilation performance hit.

--rt