Malware Analysis Series (MAS): Part 2

Chamindu Pushpika
12 min readFeb 14, 2023

Malware analysis goals

No doubts, It’s an interesting point: what are we looking for while analyzing a binary?

The question is relevant because there’re many possible objectives and aspects to be regarded while analyzing a malware. Nonetheless, during a real-world investigation, there are other important areas as malware analysis and, of course, we should consider them in all moments of our analysis:

  • Memory Analysis: it’s an extremely powerful technique, which has proved its unlimited value in the last 10 years and used as a first-approach method during investigations to understand the malware infection events, its consequences, side effects and makes possible to acquire tons of evidences that might be hard to collect from disk or any other source.
  • Network Analysis: it is a quite useful resource (pcap files, for example) to understand and detect non-authorized communication (C2 — command and control channel) through traffic analysis and makes artifact gathering (for example, binary files, malicious documents and Cobalt Strike beacons).
  • Filesystem/Disk Analysis: the last frontier of any investigation, where we can analyze and detect side effects of a adversary invasion, breaches, frauds, leaks and, of course, malware infections.

Once again, all of them are very important and must be used in all real world investigation. However, let’s return to the key point: why should you learn about reverse engineering and, in special, malware analysis? Simple: through malware analysis you have the opportunity to learn from the source of the evil about intentions and objectives of the the adversary and not only its effects. In other words, you can learn techniques, tricks, evasion strategies and, if you’re lucky, you’ll can collect important artifacts to make the correct attribution (most of the time, it’s a hard task) and, who knows, help to arrest the bad guys.

Therefore, before starting any reversing task, we should remember that there’re many questions that we should to consider and ask to ourselves:

  • Is the binary packed? If it’s, so malware is using a well-known packer or a custom one?
  • What’s the networking communication technique/API set being used by malware? From available techniques such as Winsock2, Wininet, COM (Component Object Model) or something in a lower level such as WSK (Winsock Kernel) or even custom implemented technique, which is being used?
  • Is there any code injection or hooking technique being used? Which one?
  • What are the anti-forensic techniques used? Is there any anti-debugging technique? Antidisassembly? Anti-VM/Sandbox?
  • Is there any API/DLL encoding?
  • Are strings encrypted?
  • What synchronization primitives are being used by the malware? Sometimes they hide important anti-debugging techniques.
  • What are cryptography algorithms being used by the malware?
  • What persistence methods are being used by the threat: Registry, services, tasks or kernel drivers?
  • Is there any shellcode being injected into a operating system process?
  • Is there any file system mini-filter driver being installed by malware?
  • If there’s a kernel drivers being installed, is there any callback (a kind of modern hook) or timer being installed?

In this first article, we’ll focus on only two short objectives:

  1. unpacking the malware
  2. extracting and decrypting its C2’s configuration data

We’re reviewing some well-known techniques for unpacking malware threats as well as different methods to extract C2’s configuration data. Furthermore, I am going to provide a minimum background for some basic topics to help readers to be able to continue their own research about the mentioned topics.

Gathering initial information

The first sample has the following hash:

(SHA256) 8ff43b6ddf6243bd5ee073f9987920fa223809f589d151d7e438fd8cc08ce292

We’re able to collect so much information from many endpoints such as Malware Bazaar, as shown in the figure below:

According to the Figure 1, we have some important information:

  • The target malware seems to be from Hancitor family.
  • It uses EnumerateProcesses( ) function, so it could be interesting to understand whether any special reason for that (code injection, for example);
  • WriteProcessMemory( ) is triggered as usually we have seen in unpacking procedures and code injection, so no news is a good news here.

Unpacking Concepts Review

Every single time I’ve heard someone talking about unpacking it seems impressions convert to the same conclusion: it might be not so easy. Of course, as we mentioned previously, unpacking a sample is likely the first step before possible string decryption and API/DLL resolving, for example, but we need to start from somewhere and with a goal.

There’s a long list of reasons and aspects associated to motivations about packing a malicious code:

  • It makes the malicious code “hidden” from AV. Of course, it isn’t so hidden, but it’s a soft evasion technique that make analyst’s life a bit harder and, eventually, cause some problems to defenses.
  • Packed sample doesn’t reveal the actual goals of the actual malware.
  • It could be difficult unpacking it dynamically due many anti-analysis techniques (anti-debugger and anti-vm tricks) to be circumvented.
  • Malware usually packs valuable code in several layers using customized routines.
  • Eventually, the whole malware or only the unpacking code might be polymorphic.

There are a lot of old well-known packers which we have procedures to unpack the code generated hidden by them, but most of malware authors have been used customized packers to turn code undetectable under security defenses monitoring. Additionally, there are some special packers (as known as protector) such as Themida, Arxan, VMProtect, Agile .NET and many others that usually virtualize their instructions and implement all kind of anti-forensic and obfuscation techniques, where few of characteristics are presented below:

  • They have been used on 64-bit binaries.
  • The IAT (Import Address Table) might have been removed or, at maximum, there could be only one imported function.
  • As usual, most strings are encrypted.
  • Memory integrity is checked and protected, so it isn’t possible to dump a clean executable from memory because original instructions are not completely decoded there.
  • Instructions are virtualized and, surprisingly, translated to RISC instructions.
  • These virtualized instructions are encrypted on memory.
  • The obfuscation is stack-based, so it quite difficult to handle virtualized code using static approach.
  • Most of virtualized code is polymorphic, so there are many virtual instructions referring to the same original instruction.
  • There’re thousand lines of fake “push” instructions and, of course, many of them contains dead and useless code.
  • These protectors implement code reordering using unconditional jumps.
  • All these modern packers use code flattening, many anti-debugging and anti-vm techniques.
  • Not all x64 instructions are virtualized, so you will find a binary code containing a mix of virtualized and not virtualized (native) instructions.
  • Most of time, prologues and epilogues of functions are not virtualized.
  • Original code section could be “splitted” and/or scattered around the program, so instructions and data would be mixed.
  • Instructions referring to imported function might be zeroed or even replaced by NOP, so in this case these “references” will be restored dynamically. Sometimes these same references aren’t zeroed, but replaced by jump instructions using RVA to the same import address, as well known as “IAT obfuscation”.
  • As used in shellcodes and common malware, API names are hashed.
  • The translation from native register to virtualized register is usually one-to-one, but not always. Furthermore, there is a context switch component that is responsible for transferring registers and flag information into the virtual machine context.
  • Virtual machines handlers come from data blocks.
  • Many native APIs are redirected to stub code that forwards the call.
  • Obfuscation techniques such as constant unfolding, pattern-based obfuscation, control indirection, inline functions, code duplication and mainly opaque predicate are used.

Before and during the unpacking task, there’re many observations and questions that we could think about:

  • Is the malware really packed?
  • What are the evidences of having a packed code?
  • Does the malware perform self-injection or remote injection?
  • Does the malware perform self-overwriting?
  • Where is the payload being written?
  • How the payload is going to be executed?
  • What are evidenced of having an unpacked code after the unpacking procedure?
  • Are there additional packed layers?

The first point of the list above rises a key question: how do we know whether a malware is really packed?

There isn’t an easy and definitive answer to this question, but eventually a set of two or more evidences could indicate that sample is packed:

  • The binary sample has few imported DLLs and functions.
  • There are many obfuscated strings.
  • Existence of specific system calls.
  • Non-standard section names.
  • Non-common executable binary sections (only .text/.code section should be executable)
  • Unexpected writable sections.
  • High entropy sections (usually above 7.0, but not always — this is a weak indicator).
  • Substantial difference between the raw size and the virtual size of a section.
  • Zero-sized sections.
  • Missing APIs related to network communication.
  • Lack of essential APIs for the malware functionalities (Crypt* functions in a ransomware, for example).
  • Unusual file format and headers.
  • Entry-point pointing to other section than .text/.code section.
  • Significant size of resource section (.rsrc section) followed by LoadResource( ) function in the code.
  • Presence of an overlay.
  • Opening it up on IDA Pro and observing a big amount of data or unexplored code on colored bar.

It’s very relevant and suitable to highlight one point: the occurrence of only one characteristic from the list above doesn’t determine that the malware is packed. Thus, it’s quite important to consider two or more of them. Furthermore, there are further observations to be considered:

  • Most samples resolve dynamically their APIs using LoadLibrary( ) followed by GetProcAddress( ), for example (except on reflective code injection cases).
  • Network APIs also could be dynamically resolved.
  • Malformed headers might be a bit difficult to detect at the first analysis.
  • Big resource section might not be relevant because it might contain only GUI artifacts and digital certificates.
  • There might be a mix of encrypted/obfuscated strings and plain text strings, so making a bit harder to decide whether the binary is or not packed.

The unpacking procedure using a debugger might bring a list of challenges to be understood and bypassed:

  • Anti-debugging techniques (time checking, CPUID, heap checking, debugging flag checking, NtSetInformationThread( ), and so on), so it’s recommended to use an anti-debugger plugin such as ScyllaHide on x64dbg/x32dbg or even StrongOD on OllyDbg (there’re some repositories containing OllyDbg and all associated useful plugins already built-in. Use Google for finding them).
  • Anti-VM tricks checking for VMware, VirtualBox, Hyper-V and Qemu artifacts, for example.
  • Filename, hostname and account checking (avoid using the hash as filename).
  • Available disk size on virtual machine (it’s recommended 100 GB, at least)
  • Number of processors on the testing virtual machine (two or more would be suitable)
  • Uptime (try to keep a virtual machine snapshot with uptime above 20 minutes).
  • Many non-sense calls (result is not used any longer) and non-existing APIs (fake APIs).
  • Exception handlers being used as anti-debugging technique.
  • Software breakpoints being cleared and registers (DR#) being manipulated (anti-breakpoint techniques)
  • Hash functions using typical algorithms (for example, crc32, conti, add_ror13,…) being used.
  • Malicious code checking for well-known tools such Process Hacker, Process Explorer, Process Monitor and so on (it’s recommended to rename these executable binaries before using them).

Unfortunately, anti-VM tricks and anti-debugger techniques cannot be always handled by plugins and we will have to manage to bypass them using the debugger. In this case, we have an interesting possibility of using a different debugger like WinDbg to manage some malware threats expecting for ring 3 debuggers only and not kernel debuggers (a recent case is the GuLoader malware).

Even during or after unpacking procedures, we could need to fix the resulting binary because one or more of following issues:

  • The DOS/PE header could have been destroyed on the memory or modified by a compress library.
  • In many cases, when you extract a binary from memory, you need to clean it up because there’s some garbage before its DOS header (MZ signature) and PE header.
  • The entry-point (EP) could have been zeroed or wrong.
  • The unpacked binary might have its Import table destroyed due to the fact it has been dumped and its address refers to virtual addresses (mapped version instead of unmapped version), so showing unaligned sections or none section.
  • Base address is wrong.
  • PE format’s field presents some inconsistence.
  • It could be hard to determine the OEP (Original Entry Point) , which usually appears after a transition from unpacker code using an indirect call (call [eax] or jmp [eax], for example). Additionally, existence of non-resolved APIs could be an evidence of the malicious code hasn’t reached the OEP yet. On time: OEP is the entry point (EP) of an executable before it being packed. After it has been packed, a new EP is associated to packer itself.
  • Mutexes being used as a kind of “unlock key” between two unpacking layers. In this case, the second stage of unpacking doesn’t happen without the first stage has happened, and if it’s happened, so the mutex existence is confirmed.
  • The code might be executing self-overwriting.
  • The first stage of unpacked code doesn’t run from any directory, but only from a specific one.
  • You can have extracted a decoy binary. In many real cases, malware authors packs one or more useless executables as decoy to consume time of the analyst. Thus, it would be wise not believe you’ve unpacked the correct binary from memory at first attempt.

This list of issues is very limited and there’re endless other possible side effects on unpacked binaries. Of course, distinct solutions for each one of these presented issues exist and they will be explained and given examples in the next articles of this series. Anyway, few approaches for handling some issues are:

  • Copy a good PE header from another executable (or from the own malware sample) and align sections considering whether the unpacked binary is unmapped (.text section usually starts on 0x400) or mapped (.text section usually starts on 0x1000).
  • Align sections of an unpacked binary (mapped addressing) by fixing its respective Raw Address and Raw size. This action usually fixes the Import Table and makes possible to visualize imported functions without any issues. Pay attention to possible “traps”: some unpacked binaries don’t show its Import Table until you’ve aligned their sections. However, other malware threats don’t have any function in the Import Table even after you having unpacked the binary, so it doesn’t mean you made any mistake, but it does that the malware resolve all its APIs dynamically.
  • Reconstruct the IAT and forcing the OEP (Original Entry Point).
  • If you’re facing problems in finding the OEP, so remember that OEP likely comes after the IAT has been resolved. In this case, one of possible approaches would be to check whether IAT is already resolved (check for Intermodular Calls on x64dbg or OllyDbg) or setting a breakpoint on a critical API that would be executed during a key operation of malware (CryptoAcquireContext( ) in ransomware threats, for example) because certainly IAT will be resolved when execution reaches these critical APIs . Afterwards, the suggestion is looking for unconditional jumps to specific memory addresses or even indirect calls (call [eax], for example). Another interesting approach would be using the graphical visualization of a debugger (“g” on x64dbg) and check for these transition points (indirect calls or unconditional jumps for memory addresses) at the last “code blocks”. Finally, a specialized tool might help you to find out the OEP. As you’ve noticed, there isn’t a single approach to do this.
  • Adjust the base address to match with the segment’s base address dumped from memory.
  • To detect malware performing self-overwriting, we could try to set a breakpoint on the .text/.code section. In this case, we could choose to trigger this breakpoint during code writing or execution.
  • In two-stage unpacking cases, the first unpacked binary might be a DLL. Therefore, depending on the context, it might be useful to convert the DLL binary to executable, and there’re many ways to accomplish this task, but my favorite method is editing the PE header to alter the Characteristics field and make the entry of the exported function as an entry-point

To visualize, handle and fix most of issues after unpacking a binary you can use the following well-know tools such as:

  • PEBear is an excellent tool written by Aleksandra Doniec (a.k.a Hasherezade) that’s used to visualize details of a PE Header and fix many binary issues. You can download this tool from: https://github.com/hasherezade/pe-bear-releases
  • Pestudio is a great tool written by Marc Ochsenmeier and it’s mainly used to triage and collect different information of a potential malware. The tool (free and paid versions) are available here: https://www.winitor.com/features
  • CFF Explorer, which makes part of Explorer Suite, it’s an well-known PE Editor that is used to visualize and fix PE headers. The Explorer Suite can be downloaded from: https://ntcore.com/?page_id=388
  • pe_unmapper is another tool written by Aleksandra Doniec (a.k.a Hasherezade) that can be used for converting a PE binary from mapped version to unmapped version, so fixing all PE alignment issues. This tool can be downloaded from: https://github.com/hasherezade/libpeconv/tree/master/pe_unmapper
  • Scylla is an amazing x86/x64 Import Reconstructor that is already embedded in x64dbg. If you need the standalone version, so you can download it from: https://github.com/NtQuery/Scylla
  • HxD is an excellent hex-editor that we could be used, for example, to check and fix PE headers manually. It can be downloaded from: https://mh-nexus.de/en/hxd/
  • XVI32 Hex Editor is another interesting hex-editor that is great to clean up dumped memory regions to isolate the unpacked binary. XVI32 Hex Editor can be downloaded from: http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm

Once again, remember that unpacking is only the first obstacle during malware analysis, and many other hard challenges such as string de-obfuscation, API resolving, C2 configuration extraction, C2 emulation and other topics are also in out list. This project will cover several unpacking situations and all of these mentioned tasks in the next articles.

Now we have a minimal knowledge about unpacking process, issues and solutions, it’s time to review different code injection techniques, which could help you to have a better comprehension about unpacking.

We will meet in the next article. 👋

--

--

Chamindu Pushpika

Network/WebApp Pentester | CTF Player | Security Analyst