December 17, 2012
On 12/17/2012 1:15 AM, Paulo Pinto wrote:
> http://www.hopperapp.com/
>
> I really like the way it generates pseudo-code and basic block graphs out of
> instruction sequences.

I looked at their examples. Sorry, that's just step one of reverse engineering an object file. It's a loooooong way from turning it into source code.

For example, consider an optimizer that puts variables int x, class c, and pointer p all in register EBX. Figure that one out programmatically. Or the result of a CTFE calculation. Or a template after it's been expanded and inlined.
December 17, 2012
On Monday, 17 December 2012 at 09:29:28 UTC, Walter Bright wrote:
> On 12/17/2012 1:15 AM, Paulo Pinto wrote:
>> http://www.hopperapp.com/
>>
>> I really like the way it generates pseudo-code and basic block graphs out of
>> instruction sequences.
>
> I looked at their examples. Sorry, that's just step one of reverse engineering an object file. It's a loooooong way from turning it into source code.
>
> For example, consider an optimizer that puts variables int x, class c, and pointer p all in register EBX. Figure that one out programmatically. Or the result of a CTFE calculation. Or a template after it's been expanded and inlined.

I didn't say it was easy, but it is possible.

You don't need to fully get the source code of CTFE or templates.

It suffices to get the general algorithm behind the code, and that is impossible to hide, unless the developer resorts to cryptography.

Then again, I heard that universities no longer teach assembly.

--
Paulo
December 17, 2012
On 12/17/2012 12:55 AM, Paulo Pinto wrote:
> Assembly is no different than reversing any other type of bytecode:

This is simply not true for Java bytecode.

About the only thing you lose with Java bytecode are local variable names. Full type information and the variables themselves are intact.

With assembler, you lose all type information for starters. ALL of it. And that's just for starters. You have no idea what 2[EAX] represents.
December 17, 2012
On 12/17/2012 12:54 AM, deadalnix wrote:
> More seriously, I understand that in some cases, di are interesting. Mostly if
> you want to provide a closed source library to be used by 3rd party devs.

You're missing another major use - encapsulation and isolation, reducing the dependencies between parts of your system.

Do you really want to be recompiling the garbage collector for every module you compile? It's not because the gc is closed source that .di files are useful.

December 17, 2012
On Monday, 17 December 2012 at 09:37:48 UTC, Walter Bright wrote:
> On 12/17/2012 12:55 AM, Paulo Pinto wrote:
>> Assembly is no different than reversing any other type of bytecode:
>
> This is simply not true for Java bytecode.
>
> About the only thing you lose with Java bytecode are local variable names. Full type information and the variables themselves are intact.
>
> With assembler, you lose all type information for starters. ALL of it. And that's just for starters. You have no idea what 2[EAX] represents.

Pencil and paper?
December 17, 2012
On 12/17/2012 1:45 AM, Paulo Pinto wrote:
> Pencil and paper?

Yes, as I wrote, you can reverse engineer object files, instruction by instruction, by an expert with pencil and paper.

You can't make a tool to do it automatically.

You *can* make such a tool for Java bytecode files, and such free tools appeared right after Java was initially released.
December 17, 2012
On 2012-12-17 09:19, deadalnix wrote:

> I can't stop myself laughing at people that may think any business can
> be based on java, PHP or C#. That is a mere dream ! Such technology will
> simply never get used in companies, because bytecode can be decoded !

Yet there are a lot of business that are based on these languages.

-- 
/Jacob Carlborg
December 17, 2012
On 12/17/2012 1:35 AM, Paulo Pinto wrote:
> It suffices to get the general algorithm behind the code, and that is impossible
> to hide, unless the developer resorts to cryptography.

I'll say again, with enough effort, an expert *can* decompile object files by hand. You can't make a tool to do that for you, though.

It can also be pretty damned challenging to figure out the algorithm used in a bit of non-trivial assembler after it's gone through a modern compiler optimizer.

I know nobody here wants to believe me, but it is trivial to automatically turn Java bytecode back into source code.

Google "convert .class file to .java":

    http://java.decompiler.free.fr/

Now try:

Google "convert object file to C"

If you don't believe me, a guy who's been working on C compilers for 30 years, and who also wrote a Java compiler, that should be a helpful data point.
December 17, 2012
On 2012-12-17 09:21, Walter Bright wrote:

> I know what I'm talking about with this. The only time they get reverse
> engineered is when somebody really really REALLY wants to do it, an
> expert is necessary to do the job, and it's completely impractical for
> larger sets of files. You cannot build a tool to do it, it must be done
> by hand, line by line. It's the proverbial turning of hamburger back
> into a cow.

Evert heard of Wine or ReactOS, it's basically Windows reversed engineered.

-- 
/Jacob Carlborg
December 17, 2012
On 17 December 2012 09:29, Walter Bright <newshound2@digitalmars.com> wrote:

> On 12/17/2012 1:15 AM, Paulo Pinto wrote:
>
>> http://www.hopperapp.com/
>>
>> I really like the way it generates pseudo-code and basic block graphs out
>> of
>> instruction sequences.
>>
>
> I looked at their examples. Sorry, that's just step one of reverse engineering an object file. It's a loooooong way from turning it into source code.
>
> For example, consider an optimizer that puts variables int x, class c, and pointer p all in register EBX. Figure that one out programmatically. Or the result of a CTFE calculation. Or a template after it's been expanded and inlined.
>


Right, there is practically zero chance of being able to come up with 100% identical D code from an object dump / assembly code.  Possibly with exception to a few *very* simple cases (hello world!).  However it looks like you might just be able to decode it into a bastardised C version.  I can't see that hopperapp to be very practical beyond small stuff though...


Regards,
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';