MocoCompositor: GPU Accelerated Compositing

I’m a little giddy and excited about this new feature. Last semester I took a class at the main campus titled “Technical Animation” where we learned about all sorts of Computer Graphics and Animation techniques/algorithms used in the game and movie industries. It was a pretty cool class that focused on projects. My final project (teamed up with Federico Perazzi and Grace Lin (both from ETC)) was to create a target-driven smoke-simulation accelerated on the GPU. I knew absolutely nothing about smoke simulations or GPUs for that matter. Long story short, we taught ourselves how GPU accelerated computation worked and how to write shaders in GLSL… and eventually wrote a regular smoke simulation in GLSL (we ran out of time for the target-driven part). Turns out it doesn’t matter if you do the software in Python as far as speed is concerned since you’re passing all the heavy computation over to the GPU on the video card to do anyway. So we ended up using pyglet (the OpenGL interface in Python) and a tiny shader class to string together several custom shaders to do our smoke simulation… it worked in real-time pretty well.

Skip to the present: We at Mocotila talked a lot about compositing images plates together because that was one of the biggest uses for motion controlled cameras. Compositing a live action/model with a matte-painting or 3D model. But we also realized that compositing is usually done post-production and takes a lot of time to do, render and see the output. Then if any shots are screwed up in framing/etc you’d either have to reshoot or try to fudge the effects til it was acceptable.

But here we are with an awesome camera with its viewfinder being streamed to anything capable of reading http-streamed images… and not just the expensive camera either, our bioloid has a little webcam that is also being streamed in the same way and when we get Maya/Blender integration we might have live 3D renderings being streamed as well. Wouldn’t it be grand if the cameraman/director could see a live end-result composite preview so he could direct actors or reframe things appropriately? And what if we could just kinda composite several of these streams into a new viewfinder? This is where the MocoCompositor comes in… it runs on any computer on the network with a good video card capable of shaders. It pulls in (subscribes to) images from multiple image streams coming from the server and it publishes a new composited image stream to the server that anyone else can subscribe to.

The actual compositing is accomplished using the GPU via GLSL (the OpenGL Shading Language). This is where my Technical Animation story comes in… I went back and looked through my GPU smoke simulator and implemented it again taking out all the simulation stuff and adding layers of images to be processed like Photoshop Layers. Your view is from the top of the layer stack looking down (the very bottom being the background image plate). The algorithm runs from the bottom plate up applying an associated shader to each plate and saving the result into an output plate. The output plate is what is packaged as the jpeg image and published out to the server again.

So far we’ve implemented a green-screen shader that replaces all the green in the foreground plate with the pixels from the background plate… this means live green-screen replacement compositing. We experimented with background-subtraction (taking a reference shot of the background and subtracting it from the live shot so we wouldn’t need a green-screen, but it just wasn’t reliable or clean). We’re hoping to add some more shaders to this system all implemented via GLSL shaders… especially some gradient blur filters (if you know where I’m getting at :).

Under the Hood

The main thread is a pyglet app running its event loop. Word of advice NEVER mix OpenGL calls (or anything that touches hardware directly without locks and state preservation) across threads… bad things happen (one of these days I’ll get around to putting locks on the camera class too). Anyways in this main thread we have the on_draw event from pyglet where we run through each image plate and execute the appropriate shader on the input image (and working/output image). After we’ve gone through all the plates we package the output image texture back into a jpeg and send it out to the server via a http stream publisher.

Now since pyglet controls our main thread and we want to be able to pull images from at least 2 streams concurrently from the server, we need to do it in threads. So for each image stream coming in (subscribed to) we have a thread which grabs the image and updates a mutex-protected data structure (our image plates) with the raw data. The next time through the main pyglet loop it’ll reload the image from the raw data… we can’t do it out in the thread because lord knows what pyglet is doing behind the scenes (or if the libraries it uses are thread-safe) to load these images.

Speaking of the libraries used by pyglet… we were having this nasty SegFault in the MocoCompositor after a few seconds of working perfectly. Just out of nowhere it would SegFault (and not at the same period). After digging through pyglet and stepping through the execution using a python debugger I tracked down the problem to a codec being used for the JPEG decompression. Pyglet uses 3rd party libraries to decompress jpeg images (not sure if it has to do with patents or whatnot), but on the Ubuntu system I’d been using, it was defaulting to using gdkpixbuf to do the decompression… I assume it’s the fastest implementation available on the system (probably uses the C libjpeg or something). But I noticed in the pyglet documentation that you can specify a decoder to use for the decoding and I noticed that the Python Imaging Library was installed (PIL)… so I forced pyglet to use that decoder instead (it seems a tinsy bit slower) but it worked without any SegFaults [so far]… huzzah!

Also note that textures used in OpenGL really have to be made in powers of 2… otherwise crazy things start happening (things start striping and staggering diagonally). This was an ongoing struggle for quite a while, until on a whim I remembered that old rule and tried it as a power of 2 and it worked. So now I have the code looking at the input image size and rounding it up to the next power of 2 to allocate the texture. This leaves a big black unused area but it works… when we’re done calculating the output image, we simply blit the original size of the image from the big texture to send to the output stream as the jpeg image.

Comments are closed.