Boomerang is an attempt at a complete, retargetable decompiler for native executable programs, released under a BSD style (open source) license.
In July 2004, there was a preview binary distribution (version alpha 0.1). Source code is available via CVS. Boomerang is now at the stage where it can decompile correctly several small Pentium and SPARC programs. The dataflow design is finally complete; it uses a combination of fairly standard SSA (Static Single Assignment) and a small theorem prover. It recovers parameters and return values fairly well.
Boomerang was used in a small, successful commercial decompilation project.
An experience report will be published at WCRE 2004 (November), titled
"Using a Decompiler for Real-World Source Recovery".
During this project, several significant enhancements were made to Boomerang, including the ability to
specify names and types in a symbol file, support for the thiscall
calling convention, and so on.
The next major step is that of type analysis; work has started.
The SourceForge page is http://boomerang.sourceforge.net.
#include < stdio.h > int fib (int x) { if (x > 1) return (fib(x - 1) + fib(x - 2)); else return (x); } int main (void) { int number, value; printf ("Input number: "); scanf ("%d", &number); value = fib(number); printf("fibonacci(%d) = %d\n", number, value); return (0); }As of August 2003, this is the output from Boomerang (
test/pentium/switch_gcc
):
int main() { int local0; ... int local9; printf("Input number: "); scanf("%d", &local7); if (local7 <= 1) { local9 = local7; } else { local9 = fib(local7 - 1); local8 = fib(local7 - 2); local9 = local8 + local9; } printf("fibonacci(%d) = %d\n", local7, local9); local9 = 0; return local9; } int fib(int param8) { int local0; ... int local7; if (param8 <= 1) { local7 = param8; } else { local7 = fib(param8 - 1); local6 = fib(param8 - 2); local7 = local6 + local7; } return local7; }Boomerang is finally able to process Fibo. The outout compiles and runs correctly with no modifications. There are a few more local variables that I would like, but this is a minor point, and could be fixed soon anyway. The Pentium and Sparc versions both decompile to very similar code, despit the very different code at the machine language level.
int proc1(int a, int b) { return a + b; } int main() { printf("%i\n", proc1(3, 4)); }The Boomerang output is
int main() { int local0; ... int local5; local5 = proc1(4, 3); local5 = printf("%i\n", local5); return local5; } int proc1(int param4, int param7) { int local0; int local1; local1 = param4 + param7; return local1; }Again, the code recompiles and runs correctly with no modifications. Again, there are too many local variables. At one stage, with global dataflow analysis, main transformed to essentially "printf("%i\n", 7)", but this no longer happens (dataflow is local to a procedure again).
int main(int argc) { switch(argc) { case 2: printf("Two!\n"); break; case 3: printf("Three!\n"); break; case 4: printf("Four!\n"); break; case 5: printf( "Five!\n"); break; case 6: printf( "Six!\n"); break; case 7: printf( "Seven!\n"); break; default: printf( "Other!\n"); break; } return 0; }
Here is the Boomerang output (July 2004), using the -dt (use and debug type analysis):
int global0[1] = { 134515020 }; char* global1 = "Other!\n"; int main(int argc, char** argv, char** envp) { int local1; // m[r28{0} - 8] int local3; // r29 int local4; // r28 if ((unsigned)(argc - 2) > 5) { local1 = "Other!\n"; } else { switch(argc) { case 2: local1 = "Two!\n"; break; case 3: local1 = "Three!\n"; break; case 4: local1 = "Four!\n"; break; case 5: local1 = "Five!\n"; break; case 6: local1 = "Six!\n"; break; case 7: local1 = "Seven!\n"; break; } } printf(local1, local3, *(int*)local4, argc, argv); return 0;/* local4 + 4, 0*/ }
This compiles and runs correctly. It would be nice to convert the if statement near the start into a "default" case. Again, there are too many locals; this should be fixed soon.