x86 API Hooking Demystified (2012)

cecilpl2 · on July 10, 2019

I used this technique once to very useful effect in an automated dependency-checking tool.

Essentially given an idempotent data-transformation process, you can inject a dll into it and hook its filesystem calls, maintaining a list of all files it touches. Upon process termination, serialize out the filesystem state (size, timestamp, MD5) of each of those files, saving the data in a dependency file whose filename is the hash of the command line.

When you run the same command line again, the dll can first look up the dependency file from the previous run, and if all files on the filesystem are in the same state as previously, you can short-circuit the process execution entirely.

This is orders of magnitude faster in many cases for highly-parallel data build jobs where only some small percentage of the source data changes each time you run the build job, and has the advantage that you don't need manually maintain a list of dependencies for each process type (no new dependency can be added without changing one of the existing dependencies).

beagle3 · on July 10, 2019

Are you describing tup? http://gittup.org/tup/

cecilpl2 · on July 13, 2019

It looks similar to that, but no. This was a custom-built tool used internally in a company that is now out of business.

new_realist · on July 10, 2019

On UNIX this sort of thing is typically done with a virtual file system.

asveikau · on July 10, 2019

That sounds too fancy. If it operates on a given directory, you can write a very simple wrapper shell script to do that with tar.

pjc50 · on July 10, 2019

Isn't it normally done with the similar technique of LD_PRELOAD, which unlike a vfs can be done from userland?

vthriller · on July 10, 2019

> unlike a vfs can be done from userland

You can implement and mount such virtual FS entirely in userspace using FUSE.

sansnomme · on July 10, 2019

What about Ubuntu/Debian's checkinstall?

stevemk14ebr · on July 10, 2019

I wrote a hooking library that implements the methods talked about, plus a bunch more it doesn't. As well as handle a bunch of edge cases not mentioned: https://github.com/stevemk14ebr/PolyHook_2_0

jchw · on July 10, 2019

Thanks for linking, this is really cool. Also nice to see AMD64 support, I never figured out how it can be done. (Is there an absolute jmp without destroying registers in x64?)

stevemk14ebr · on July 10, 2019

yes there is. \x25\xff\x00\x00\x00\x00 where 0 is a 32 bit displacement to a memory location that contains a 64 bit constant is what i use: jmp [disp] but you need to be able to place the constant +- 32bits. This is really hard for x64 on windows, since VirtualAlloc gives you no guarantees, i used to walk pages manually now i just new/delete in a loop and hope for the best.

There's also:

push rax

mov rax, 0xDEADBEEFDEADBEEF

xchg qword ptr ss:[rsp], rax

ret

but i dont like it as much since it touches stack (technically more detectable since you overwrite stuff at rsp - 8). RSP & RAX are original values after that gadget though.

Also fun fact. My library supports JIT-ing, so you can create the stub that the hook jmps to at runtime and it will JIT translation logic for the calling convention and pack the args + ret value into a structure that can be modified. So you can hook unknown functions at runtime.

jchw · on July 10, 2019

Interesting.

I am guessing the length of the jump itself is probably important for some reason. But if you could afford to overwrite ~16 bytes, maybe you can store the address inside the imm64 of another instruction. Length issues aside, it shouldn’t break nested hooking at least.

intea · on July 10, 2019

jmp [rip] .dq 0xCCCCCCCCCCCCCCCC

works too

jchw · on July 10, 2019

Mixing code and data, right? I think the downside here is if you tried to install another hook it’d fail because the LDE wouldn’t be able to make heads or tails of the address. I suggest embedding the value into something with an imm64 argument (mov?) so that LDEs can handle it.

I guess also though, at ~16 bytes its probably deep enough into the function that it may no longer be position independent, or hell, maybe the function isn’t even that long to begin with.

intea · on July 10, 2019

If you're writing a hooking library / a hook you should be keeping track of where they are. It's a big hook, that is true but it's also one that doesn't spoil a register and is pretty straightforward to add. It's a tradeoff.

jchw · on July 10, 2019

Well the bigger problem imo is other hook engines that might also be roaming around the process space. I think all you need is two extra bytes to make it valid instructions, and in theory then nested hooking should work fine. Though it only exacerbates the length issue.

stevemk14ebr · on July 11, 2019

if you place the data at the end of the trampoline it avoids these issues of mixing data and code, it's like a little custom data segment you make since you have to allocate the trampoline anyways. This is what i do in my lib. The disp is after the jmp the trampoline uses to jmp back to the original. The original function only has the jmp [disp] and no data is mixed.

MrGilbert · on July 10, 2019

We used hooking to "isolate" the audio output from a certain application on Windows:

The application would ask the operating system for the default audio device. By intercepting this request, we were able to re-route it to our own, virtual audio device. Our program would then fetch the audio data from the virtual device, and replay it to the "real" audio device. At the same time, the audio gets saved to ram, and finally to disk.

The benefit of this method was that we were able to actually isolate the audio from all other sources on the computer. So you could, in theory, mute the playback, while still being able to let the recording run. Ultimately, we abandoned this method, as it proved quite unreliable. But it was fun to come up with, and finally implement.

jchw · on July 10, 2019

Oh hey! This is a favorite topic of mine. I wrote myself a hooking library for a project where I didn’t want to use libc (and it was Win32 by design so I could just use the equivalent Win32 calls). The hardest part was definitely the LDE (length disassembly engine) and in the end I found a small header-only open source library that did it perfectly. The rest was very easy, especially on x86.

API hooking sometimes even works in hostile environments, like on software that tries to guard against patches and modification, simply because it can be challenging to detect, and you can do it early in (like inside a DLL entrypoint). So if you can do all of your work at API boundaries, you can get away with a whole lot, even if an app is packed with a strong VM packer.

Worth noting that for many less difficult use cases on Linux, you can use LD_PRELOAD to somewhat similar effect.

userbinator · on July 10, 2019

On Linux (and probably other Unices), another way to hook is via the interesting default behaviour of the dynamic linker to only use symbol names instead of qualifying them with the library's filename from which they are to be imported and prefer symbols in already-loaded libraries; I suspect that more often than not, this happens by accident instead of deliberation.

jchw · on July 10, 2019

I mentioned LD_PRELOAD separately in my other comment, but I believe this same behavior is what enables LD_PRELOAD to work.

(I wonder how this interacts with glibc symbol versioning, now that I think about it.)

djmips · on July 10, 2019

Renderdoc uses hooking to instrument your 3D API calls to provide excellent graphics debugging. Good example to dive into since it's also cross platform.

https://github.com/baldurk/renderdoc

I used graphics API hooking in a Source Engine game where I didn't have the source for the engine but needed to display the 2D Flash based GUI (Iggy) at the correct time. It was fun to get it working.

Valve's Steam also does something similar since it has the ability to superimpose it's GUI over a running game.

Const-me · on July 10, 2019

Conceptually yes, but technically renderdoc works differently. On Windows it patches IAT (Import Address Table), then for D3D it wraps complete D3D COM interfaces. It doesn’t use the described tricks to patch code with jmp instruction, it’s not too reliable, IMO.

Similar on Linux, it doesn’t have IAT but it has PLT (Procedure Linkage Table) which is basically the same thing as IAT.

_0ffh · on July 10, 2019

API hooking used to be super easy and very common under DOS. All the API calls used to work using traps, so you'd just have to 1) store the original interrupt vector 2) change it to point at your own interrupt service routine, 3) do whatever when the trap was invoked.

Point 3 might include logging, passing the request through to the original ISR (possibly with changed parameter values), changing the return values, anything really. Easy & fun. =)

osullivj · on July 10, 2019

I've used MS Detours to hook my own C++ interceptor functions into dispatch from Excel to XLL extensions via the internal Excel4V interface. It worked very well, showing me the XLOPER values. But that was for a 32 bit Excel. I didn't appreciate how much trickier it is with the amd64 instruction set.

Ididntdothis · on July 10, 2019

If I remember correctly the old Mac operating systems used this a lot to ship patches or to change system behavior.