Almost every week requests for decompilation programs are made in newsgroups (like comp.lang.c), and these are usually replied with: It is not possible! People claim that decompilation is similar to converting a hamburger back into a cow, or unscrambling an omelette back to an egg. Here is a typical FAQ entry from C++-FAQ-Lite, and here is my refutation of it. People even write tech reports on the subject. They are far from the truth.
Certainly, fully automated decompilation of arbitrary machine-code programs is not possible -- this problem is theoretically equivalent to the Halting Problem, an undecidable problem in Computer Science. What this means is that automatic (no expert intervention) decompilation cannot be achieved for all possible programs that are ever written. Further, even if a certain degree of success is achieved, the automatically generated program will probably lack meaningful variable and function names as these are not normally stored in an executable file (except when stored for debugging purposes).
However, many useful algorithms have this sort of theoretical limitation. It is not possible to write a computer program that can perfectly route any arbitrary printed circuit board automatically. However, the best ones do a good job that can be improved with some expert intervention. They save an enormous amount of work. Similarly, it is not possible in general to solve problems as simple as the travelling salesman problem. Despite this, salesmen continue to travel! The interesting question is not "can we decompile arbitrary machine code programs?", because the answer is no. The interesting question is rather "what proportion of real-world programs can be decompiled to source code that is useful for various purposes?". The answer to this question is very much open at this point.
It may even be possible for a decompiler to automatically (i.e. with no human input) decompile a large fraction of real-world machine code programs. MikeVanEmmerik has shown in his confirmation report that a machine code disassembler (recover assembly language) has three fundamental problems to solve, while a machine code decompiler (recover high-level language) has seven such problems. Both require the separation of code and data, and of pointers and constants. In effect, decompilation is "merely" an extension of disassembly. If you can produce assembly language source for a program, then you can produce high-level language source for that program with more effort.
Note that producing assembly language source for a program is a lot harder than just running a dumb disassembler over the machine code. Compare saving the result from running the tool objdump -d
(a dumb disassembler) to running a sophisticated disassembler sich as IDA Pro
(an interactive disassembler), taking weeks to manually separate the code and data, adding meaningful names, comments, etc.
It is interesting to note that the authors of IDA Pro do not guarantee reassembly of the output from IDA Pro, or even believe that this is a good goal for their tool. (Source: mainly J. C. Roberts' comments in this General IDA Pro Forum titled "How do I translate EXE->ASM->EXE with IDA Pro?"). There are some technical issues like "what do you do with Delphi forms" and other resources; I simply dismiss these other objects as not being part of the source code. (You still need them to generate an equivalent program, but there are resource rippers and other programs to perform that task.) However, they seem to have resigned themselves to the fact that reassembling programs disassembled with IDA Pro is just too difficult for most people, and it would be too much work to support this activity, so they officially discourage it.