Introduction
When performing malware analysis, one of the most common techniques of hiding strings is by simply “stacking” them or building them into a buffer to be called later. This technique has been discussed time and time again, but it’s not uncommon to find new pieces of malware that use this.
Understanding string stacking is important because many malware analysis tools still rely (and display) strings alone. In some cases, the tool will show the ascii strings and omit the unicode strings, which is an even worse situation.
When reading through security blog posts, another theme is the high barrier of entry that exists in malware analysis. IDA Pro is expensive and while the tool is very nice, it’s not always available. BinaryNinja attempts to solve this problem but still costs money. Cutter, on the other hand, is free to download and use.
In this blog post we’ll be writing a simple script to rebuild stack strings and automatically add comments to a binary.
This script should give a fairly gentle introduction to writing a plugin that can interface with Cutter and Radare2. Before writing a plugin, let’s understand string stacking techniques more in-depth.
Stacking String, 1, 4, 8 bytes at a time
Consider the following elementary C-code:
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char **argv) {
char one[14] = "goodbye ";
time_t t;
srand((unsigned) time(&t));
if( (rand() % 10) >= 5 ){
strcat(one, "world");
}
else {
strcat(one, "moon");
}
printf("%s\n", one);
return 0;
}
When running strings against the compiled binary, the following strings appear (using rabin2 -zzz
for strings output):
019 0x0000079a 0x0000079a 9 10 (.text) ascii goodbye H
020 0x000007cc 0x000007cc 5 7 (.text) utf8 zgfff
021 0x0000081d 0x0000081d 5 6 (.text) ascii worlf
022 0x00000854 0x00000854 4 5 (.text) ascii moon
These results are expected – the ascii strings show up in the .text (executable code) section of the binary. If the intent is to hide these strings from a simple strings-like utility, the characters of the string can be broken up and placed individually into a char array.
The following code is a similar program, now with strings broken up into individual characters.
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
int main(int argc, char **argv) {
char one[14];
// byte MOVs
one[0] = 'g';
one[1] = 'o';
one[2] = 'o';
one[3] = 'd';
one[4] = 'b';
one[5] = 'y';
one[6] = 'e';
one[7] = ' ';
if (rand() == 0) {
one[8] = 'w';
one[9] = 'o';
one[10] = 'r';
one[11] = 'l';
one[12] = 'd';
one[13] = 0;
printf("%s\n", one);
} else {
one[8] = 'm';
one[9] = 'o';
one[10] = 'o';
one[11] = 'n';
one[12] = 0;
one[13] = 0;
printf("%s\n", one);
}
return 0;
}
When running strings against the compiled output, the strings “goodbye” “world” and “moon” no longer appear.
In a hexdump the following characters are still able to be seen, but because of the opcodes responsible for placing them into the array the strings utility is no longer recognizes the sequence of characters as a contiguous string.
Below is a hexdump of the string being constructued:
000007a0 9c fe ff ff 89 c7 e8 85 fe ff ff c6 45 ea 67 c6 |............E.g.|
000007b0 45 eb 6f c6 45 ec 6f c6 45 ed 64 c6 45 ee 62 c6 |E.o.E.o.E.d.E.b.|
000007c0 45 ef 79 c6 45 f0 65 c6 45 f1 20 e8 80 fe ff ff |E.y.E.e.E. .....|
000007d0 89 c1 ba 67 66 66 66 89 c8 f7 ea c1 fa 02 89 c8 |...gfff.........|
000007e0 c1 f8 1f 29 c2 89 d0 c1 e0 02 01 d0 01 c0 29 c1 |...)..........).|
000007f0 89 ca 83 fa 04 7e 26 c6 45 f2 77 c6 45 f3 6f c6 |.....~&.E.w.E.o.|
00000800 45 f4 72 c6 45 f5 6c c6 45 f6 64 c6 45 f7 00 48 |E.r.E.l.E.d.E..H|
00000810 8d 45 ea 48 89 c7 e8 f5 fd ff ff eb 24 c6 45 f2 |.E.H........$.E.|
00000820 6d c6 45 f3 6f c6 45 f4 6f c6 45 f5 6e c6 45 f6 |m.E.o.E.o.E.n.E.|
00000830 00 c6 45 f7 00 48 8d 45 ea 48 89 c7 e8 cf fd ff |..E..H.E.H......|
When opening this with a disassembler it becomes obvious how these bytes are treated.
| 0x000007a6 e885feffff call sym.imp.srand ; void srand(int seed)
| 0x000007ab c645ea67 mov byte [local_16h], 0x67 ; 'g'
| 0x000007af c645eb6f mov byte [local_15h], 0x6f ; 'o'
| 0x000007b3 c645ec6f mov byte [local_14h], 0x6f ; 'o'
| 0x000007b7 c645ed64 mov byte [local_13h], 0x64 ; 'd'
| 0x000007bb c645ee62 mov byte [local_12h], 0x62 ; 'b'
| 0x000007bf c645ef79 mov byte [local_11h], 0x79 ; 'y'
| 0x000007c3 c645f065 mov byte [local_10h], 0x65 ; 'e'
| 0x000007c7 c645f120 mov byte [local_fh], 0x20 ; "@"
| 0x000007cb e880feffff call sym.imp.rand ; int rand(void)
Similar to how the code is written in the source C, the bytes are stored a single char at a time into offsets into a local array.
This code can be rewritten to do 4 byte mov’s (hex encoded string is “Error Command”).
int main(int argc, char **argv) {
unsigned char* stack_string[20];
((uint32_t*)stack_string)[0] = 0x68656c6c;
((uint32_t*)stack_string)[1] = 0x6f20776f;
((uint32_t*)stack_string)[2] = 0x726c6420;
((uint32_t*)stack_string)[3] = 0x6869;
printf("%s\n", stack_string);
return 0;
}
Which when running strings, we can see parts of the full string.
007 0x0000045c 0x0040045c 5 6 (.text) ascii $Erro
008 0x00000465 0x00400465 4 5 (.text) ascii r Co
009 0x0000046a 0x0040046a 7 8 (.text) ascii D$\bmman
010 0x00000472 0x00400472 7 8 (.text) ascii D$\fd.\r\n
When viewing the output in a disassembler, the DWORDS are stored in a local buffer.
| 0x0040045a c70424457272. mov dword [rsp], 0x6f727245 ; [0x6f727245:4]=-1
| 0x00400461 c74424047220. mov dword [local_4h], 0x6f432072 ; [0x6f432072:4]=-1
| 0x00400469 c74424086d6d. mov dword [local_8h], 0x6e616d6d ; [0x6e616d6d:4]=-1
| 0x00400471 c744240c642e. mov dword [local_ch], 0xa0d2e64 ; [0xa0d2e64:4]=-1
| 0x00400479 e8a2ffffff call sym.imp.puts ; int puts(const char *s)
We can even go further and write these as a series of 8 byte mov’s with the following code.
int main(int argc, char **argv) {
unsigned char* stack_string[20];
((uint64_t*)stack_string)[0] = 0x6f4320726f727245;
((uint64_t*)stack_string)[1] = 0xa0d2e646e616d6d;
printf("%s\n", stack_string);
return 0;
}
This results in the following strings:
010 0x000004f1 0x004004f1 4 5 (.text) ascii =A\v
011 0x00000546 0x00400546 9 10 (.text) ascii Error CoH
012 0x00000554 0x00400554 9 10 (.text) ascii mmand.\r\nH
013 0x0000055e 0x0040055e 4 5 (.text) ascii D$\bH
With bigger mov’s more of the string is able to be observed in the strings output. Depending on how the string is pushed, the order may be reversed, which can lead to an oversight when viewing strings output. When viewing the disassembled output, the QWORDS are stored and moved into a buffer.
| 0x00400544 48b84572726f. movabs rax, 0x6f4320726f727245
| 0x0040054e 48890424 mov qword [rsp], rax
| 0x00400552 48b86d6d616e. movabs rax, 0xa0d2e646e616d6d
| 0x0040055c 4889442408 mov qword [local_8h], rax
Note: Modifying compiler optimization can potentially lead to changes in how the strings are constructed in the assembly.
Enter Cutter
Cutter is the growing and maturing GUI for the radare2 reverse engineering framework. Cutter provides an integrated iPython shell and a scripting interface.
By leveraging both Cutter and Radare, a script can be created that will automatically rebuild and comment the strings, which can be tedious in large binaries.
The script simply crawls each function within the binary and builds a list of candidate stack strings, doing some simple filtering at the end. Once a string is found it’s commented at the appropriate offset.
To open iPython within Cutter, select Juypter from the menu and open the python script.
Opening the 8byte mov binary with Cutter and running the script will produce the following output:
Looking at an APT backdoor like IXESHE, analyzing the backdoor becomes much easier when a script is rebuilding the strings. By viewing the comments section within Cutter, one is able to cross-reference the hidden strings from where they existed within the binary.
Double clicking on the comment will automatically browse to the location of the comment within the code. By cross-referencing the string at “UNKNOWN COMMAND” we can discern how it was built character by character.
In addition to adding a comment, the script will log the output to the Juypter console.
Conclusion
A full copy of this script can be found HERE on my github. I welcome any improvements.
This method is well documented and not new or unique. With that said, it’s still in use today and many malware zoos do not automatically look for stacked strings, which can, in some cases mislead an analyst. Understanding this method is important to a malware researcher and automating it can save valuable time.
There are plenty of ways to bypass this type of script and we’ll slowly release more tools that deal with these one-off cases. Until then, looking at tools like FLOSS and reading Megabeets excellent writeup of DROPSHOT can provide insight into other methods of decoding and finding strings in malware.
By extending the capabilities of open source software the gap is slowly closing on expensive reversing tools.
Appendix
Some Notes on Rust
While C is easy enough to profile and study, languages like Rust make it a little more difficult. As with C, there exists a variety of ways to hide strings within a Rust application. The hope is that compiled output from rustc resembles source code close enough for the approach detailed in this post to work. Since it doesn’t rely on an intermediate language there’s a good chance the logic above will still work.
Zignatures
When building an application with Rust, one of the first things that is noticed in a disassembler is the sheer amount of functions that even a simple program contains.
This has been blogged about in the past, but if the application is stripped, generating zignatures for a simple rust application can save loads of time analyzing a payload. Zignatures will profile each function and apply the symbol name to any matched functions.
To generate a zignature, use the following commands.
aaa; zg; zos rust_signature_names.zig
To load in zignatures
zo rust_signature_names.zig
Regular Strings in Rust
Consider the following application:
fn main() {
let foo = String::from("On the moon!");
println!("{}", foo);
}
When running strings against the complied application, the ascii strings exist in clear text. It is interesting to note the string appears in section .rodata.
4806 0x00057f00 0x00057f00 13 14 (.rodata) ascii On the moon!\n
If we change the string to mutable and append information onto it the string will still be written to .rodata.
5110 0x00059e70 0x00059e70 126 127 (.rodata) ascii attempt to divide by zerodestination and source slices
have different lengthslibcore/slice/mod.rsOn the moon! And back again!\n
Hidden Strings
A sample program is written using the string push technique provided in Rust’s standard library:
fn main() {
let mut hidden_string = String::from("O");
hidden_string.push('n');
hidden_string.push(' ');
hidden_string.push('t');
hidden_string.push('h');
hidden_string.push('e');
hidden_string.push(' ');
hidden_string.push('M');
hidden_string.push('o');
hidden_string.push('o');
hidden_string.push('n');
hidden_string.push('!');
println!("{}", hidden_string);
}
After compiling and opening in Cutter, the following code can be observed.
The script that is written won’t cover this case, but the code is repeatable enough that a modification can be done to rebuild these strings.
Vector to String
Consider the following code:
fn main() {
let bytes = vec![0x66, 0x69, 0x6f, 0x6e, 0x61, 0x20, 0x74, 0x68, 0x65, 0x20, 0x68, 0x69, 0x70, 0x70, 0x6f,0];
let s = String::from_utf8(bytes).expect("Found invalid UTF-8");
println!("{}", s);
}
If we use the structure to store our bytes of a string as a vector, it’ll conveniently lay out our strings in a C-style manner as seen below. The script posted in this writeup won’t need any modifications to be able to read and reassemble the following string:
The script provided in this writeup gives a good initial base to build upon and cover the fringe cases. One of the best methods of learning is to build small applications that hide strings in a variety of ways and attempt to recover them.