Hello! In anticipation of the start of a new stream on the
reverse engineering course
, we are sharing with you a translation of a very interesting material. Enjoy reading
The last two years can be called years of ransomware hackers. Ransomware has, without a doubt, been the most popular type of malware. However, at the end of last years, we began to observe their decline in popularity and its increase in favor of miners. It is possible that in 2018 this trend will only grow.
From the point of view of the victim, this is a relief, since miners are not as dangerous as ransomware. Yes, they slow down the system, but as soon as you get rid of them, you can continue to use your computer as before. Your data is not stolen or lost, as is the case with the ransomware virus.
From the perspective of a malware researcher, miners are disappointing. They do not provide enough new material for deeper analysis, mainly because they are based on well-known open source components with little or no confusion or stealth.
However, from time to time we find miners using interesting tricks. We recently observed a technique called “Heaven's Gate,” which allows you to inject into 64-bit processes from 32-bit bootloaders. This idea is not new, its first implementation dates back to 2009, but it is interesting to see how it was implemented in a new form, obtained directly "from the wild."
Beginners in virus analysis can read a guide on what Heaven's Gate is and how to approach its analysis.
Materials for analysis
This sample was found in the continuation of the
Ngay campaign (more on this
here ). Checking the biography of such samples led me to the
article @_qaz_qaz , which describes an earlier campaign with a similar sample. However, his analysis did not include Heaven's Gate technology.
Behavior analysis
To see the injection mentioned, we must run the sample on a 64-bit system. We see that it launches the essence of the notebook, with the parameters specific to cryptocurrency mining:
Looking at the lines in memory in ProcessExplorer, we see that this is not a real notebook, but the
XMRig Monero
miner .
So, at the moment, we are sure that the image of the notebook in memory was most likely replaced by the RunPE (Process Hollowing) method.
The main dropper is 32-bit, but transfers the payload to a 64-bit notebook:
Most interestingly, this type of injection is not supported by the official Windows API. We can read / write to the memory of 32-bit processes from a 64-bit application (using the WoW64 API), but not vice versa.
However, there are some unofficial solutions, such as a technique called “Heaven's Gate.”
Heaven's Gate Review
The Heaven's Gate technique was first described in 2009 by a hacker with the nickname Roy G. Biv. Later, many implementations were created, for example, the
Wow64ext library or, based on it,
W64oWoW64 . In his blog in 2015, Alex Ionescu described
measures to combat this technique .
Let's look at how it works.
Running 32-bit processes on 64-bit Windows
Each 32-bit process running on a 64-bit version of Windows runs on a special
WoW64 subsystem that emulates a 32-bit environment. You can draw an analogy with a 32-bit sandbox that is created inside a 64-bit process. So first, a 64-bit process environment is created. And already inside it a 32-bit environment is created. The application runs in this 32-bit environment, but does not have access to its 64-bit part.
If we scan a 32-bit process from the outside using a 64-bit scanner, we will see that inside it has both 32 and 64-bit DLLs. Most importantly, it has two versions of NTDLL: 32-bit (loaded from the SysWow64 directory) and 64-bit (loaded from the System32 directory):
However, the 32-bit process itself does not see the 64-bit part and is limited to using 32-bit DLLs. To inject into a 64-bit process, you need to use the 64-bit versions of the corresponding functions.
Code segments
To gain access to the restricted part of the environment, we need to understand how isolation is done. It turns out that everything is quite simple. 32-bit and 64-bit code execution is available through a different code segment address: 32-bit - 0x23 and 64-bit - 0x33.
If we call the address in the usual way, then the mode that is used to interpret it is set by default. However, we can explicitly request a change using assembly code.
Inside the miner: Heaven's Gate implementation
I will not conduct a full analysis of this miner, as it has already been described
here . Let's go straight to the place where the fun begins. The malicious program checks its environment, and if it detects that it is running on a 64-bit system, it uses a different path to inject into the 64-bit process:
After some anti-analysis checks, it creates a new, suspended 64-bit process (in this case, a notepad):
This is the target in which the malicious load will be embedded.
As we learned earlier, in order to embed the payload into a 64-bit process, we need to use the appropriate 64-bit functions.
First, the bootloader passes 64-bit NTDLL processing:
What happens inside the
get_ntdll
function requires a more detailed explanation. As an explanation, we can also take a look at similar
code in the ReWolf library.
To access the 64-bit part of the process environment, we need to work with segment selectors. Let's see how the malware enters 64-bit mode.
It seems this code was copied directly from the open library:
https://github.com/rwfpl/rewolf-wow64ext/blob/master/src/internal.h#L26
The 0x33 segment selector is pushed onto the stack. Then, the malware calls the following line: (in this way, the address of the next line is also pushed onto the stack.)
The address that was
retf
stack is fixed by adding 5 bytes and is set after
retf
:
At the end, the RETF statement is called. RETF is “far return”, and unlike regular RET, it allows you to specify not only the address from which you want to continue execution, but also a segment. As arguments, it takes two DWORDs from the stack. Thus, when RETF is executed, the return return address is:
0x33: 0x402A50
Thanks to the changed segment, code starting at the specified address is interpreted as 64-bit. So the code that the debugger sees is 32-bit ...
... actually 64-bit.
To quickly switch views, I use the PE-bear function:
And here is what this piece of code looks like if it is interpreted as 64-bit:
Thus, the code that is executed here is responsible for moving the contents of register R12 to a variable on the stack, and then switches back to 32-bit mode. This is done in order to get the 64-bit
Thread Information Block (TEB) , from which we get the 64-bit
Process Environment Block (PEB) from here - we look at a
similar code .
A 64-bit PEB is used as a starting point for finding a 64-bit version of NTDLL. This part is implemented quite
trivially (a “vanilla” implementation of this method can be found
here ), using a pointer to the loaded libraries, which are one of the fields in the PEB structure. So, from PEB we get a field called
Ldr :
Ldr is a structure of type
_PEB_LDR_DATA
. It contains an entry called
InMemoryOrderModuleList
:
This list contains all loaded libraries that are present in the memory of the process under study. We look through the list until we find the library that interests us, in our case it is NTDLL. This is exactly what the above
get_ntdll
function
get_ntdll
. To find a suitable name, it calls the following function, designated as
is_ntdll_lib
, which
is_ntdll_lib
library name with ntdll.dll by character. The equivalent of
this code turns out.
If the names match, then the library address is returned to a couple of registers:
Once we found NTDLL, we just need to get the addresses of the corresponding functions. We can do this by looking at the library export table:
The following functions are retrieved:
- NttUnmapViewOfSection
- NtGetContextThread
- NtAllocateVirtualMemory
- NtReadVirtualMemory
- NtWriteVirtualMemory
- NtSetContextThread.
As we know, these functions are typical of the RunPE technique. First, NtUnmapViewOfSection is used to unmap the original PE file. Then, in the remote process, memory is allocated and a new PE is written. At the end, the context of the process is changed so that execution starts from the embedded module.
Function addresses are stored and later called (similar to
this code) to control the remote process.
Conclusion
Until now, the authors of miners have not shown much creativity. They achieve their goals by relying on open source components. The described case well reflects this trend, since a ready-made implementation was used.
Heaven's Gate has been around for several years. Some malicious programs use it to increase
stealth . But in the case of this miner, the authors probably sought rather to maximize performance by using the payload that best fits the target architecture.
That's all. You can find out more about our course
here .