Mocotila: Epilogue

May 10th, 2010

The Mocotila Project was a success!

We met all the goals we had originally intended and then some. We built an eight foot tall robot arm capable of five degrees of freedom: Pan, Tilt, Jib, Turret and Dolly. All its motions were as accurate and repeatable as we could get them to be given the budget and tight time frame and the fact none of us knew what we were doing going in. We also wrote some cool software to control and animate the robot using a flexible web interface. We also did a lot of research and side projects like GPU accelerated live compositing, Maya integration, face tracking and some other fun stuff (read the previous blog posts for the details).

Our Final Presentation went well… after stressing out for 10 minutes before the presentations began when our turret rod pulley’s set screw got loose and we couldn’t control it anymore… we ran around the building finding new screws and fixed it with only seconds to spare. The actual presentation went without a hitch and the bot worked beautifully throughout the demo… whew.

All in all I think it was a pretty successful project. Team Mocotila would like to thank all the people who helped us make this project a success… Ralph (our advisor), Krishna, Steve, John, Janice, Caitlin, Ben Carter, Team I3, the MechE machine shop and so many more I’m forgetting.

Finally we’d like to give a tremendous amount of thanks to the two people who helped us the most in this project: Kyle Richard Gee our Mechanical Engineering Intern who designed our initial jib arm and an all around cool kid. And last but certainly not least Lauren Etta, our experienced design/build consultant and very talented Mechanical Engineer (and also an ETC alumni), who pretty much physically built this robot for us (yes, we helped a bit too). Thank you so much you two, this robot is your baby just as much as it is ours.

Here are some videos of the work we did:

Turret Construction

Dolly Construction

Jib-Arm Mounting

Demo Reel

Case Studies / Tests

MocoBot: The End Result

May 10th, 2010

So now that our semester has come to an end and Final Presentation over, we can finally reveal our end product to the Internet community.
Team Mocotila presents:

The MocoBot Software

The Main Menu

The Motion Builder Interface

The Motion Builder Interface (in mid-dragging of pan/tilt via the mouse on the viewfinder, notice the red border and dot)

Time Estimates Calculator

Job Execution Dialog Window

The Current Running Job Status

The Video Player Interface that lets you view the captured frames in a flip-book-like player


Drag-and-Reorder Layered Compositor Node configurator

The Manage Processes Interface that lets you launch/terminate currently running subsystems

—-

The MocoBot Hardware

Front 3/4 View

Back 3/4 View

Camera Mount + Pan/Tilt

Tilt Servo

Pan Servo

Jib Arm

Jib Linear Actuator

Turret Top

Turret Side (turntable)

Turret Servo + Encoder

Turret Rod + Timing Belt + Tensioner

Dolly

Standard Free spinning Track Wheels

Dolly Drive Motor + Encoder

Dolly Drive Motor + Axle + Timing Belt + Tensioner + Encoder

Dolly Drive wheel

Power Supply

UPS Battery

LynxMotion Servo Controller Board

LynxMotion Servo Controller Board

MocoBot Brain – Mac Mini (shown without anything plugged in). Usually have the following USB connections: LynxMotion, 2 Phidget Encoders and DSLR Camera

Standard Movie Tracks (rented)

I3 Build-out taping

April 12th, 2010

So Mark and I spent the last 3 days (Fri, Sat, Sun) baking in the Sun filming the I3 Team‘s Carnival booth being built. I set up the timelapse capture to do 10 sec and 15sec interval shots with some simple panning which gave us around 5-6 hr sessions at a time averaging around 1500 frames. Meanwhile Mark took the smaller camera and ran around doing shots of everything else including other exhibits.

I learned some lessons doing this shoot:

  • Time lapse is excruciatingly slow and tedious… we got bored after 15mins with the camera and made ourselves useful by helping I3 where we could while the camera snapped away.
  • There’s a tremendous amount of planning you’d have to do before you can even get a shot started (this isn’t a point and shoot type of deal).
  • With that comes an amazing amount of unpredictability… for example as much as I tried to get the pan/tilt motion such that it would have people moving about in the frames and also get some shots in the background with the other booths being built, it never seemed to work out right when the time came. Things would happen out of order (the real world isn’t predictable at all), lighting would change like crazy (exposures become way wrong towards the night), things NEVER get into the frame that you were expecting them to at the time, focus gets lost (autofocus can stall the camera so we’ve got to use manual)… etc.
  • If you catch an error in mid-shoot (exposure screwing up or out of focus or anything really), you can’t really fix it… because making the change would blow out all the footage you already have from being consistent with the rest of the shoot. In reality these 1500 frames compresses to about 50secs or so; making any changes mid way would drastically pop out at you and in the long run makes it worse than just having consistency in the shot.
  • Don’t even try to do anything close-up, more than likely it won’t get framed right with the people. Your best bet is to go for the widest shot you can do. It really isn’t practical to be by the camera 24/7 constantly adjusting things for each shot.
  • In the end you can sum up the use cases for where this thing works to be shooting long-time-lapse of environments with very little worry about framing issue or shooting a wide-zoomed-out shot from far away.
  • The Dell Mini9 that we’ve been using to be the robot’s brain sucks to use directly out in the field (the glare of the screen, the tiny unusable keyboard, sucky trackpad, way tiny screen resolution for the kinds of megapixel images we were taking). Our software is designed to be networked and the main user interface was meant to run from a browser on a separate computer on the network letting the Mini9 sit on the robot and simply control it untethered. The Mini9 I still think is the best option for the robot’s brain… we just need to add some sort of wireless network onto it… or run an ethernet cable off of it I suppose.
  • Hard drive space becomes very precious. Each of the shoots was easily burning through 4-5gigs (these are megapixel images). I had to offload the captures to my home computer (which was also rapidly losing space) every night.
  • Carrying around and setting up all this equipment is very tiring… that UPS easily weighs around 50-60lbs… and let’s not even talk about all the loose wires and cables. It’s more than one person could handle in one trip from the car.
  • That AC adapter we got for the Canon camera is the worst… the little connector doesn’t plug in all the way and the tiniest gust of wind or kick could unset it killing the power to the camera as well as screwing up the computer’s connection to it and the shoot.
  • While the UPS was an incredible lifesaver for us, it still can’t save us when we need to do very long shoots (5+ hours)… we had to tap into I3’s power, when we could, to recharge… I did recharge the UPS every night though (apparently it takes a very long time to fully recharge), but it still runs out of juice eventually.

The I3 Booth, when I last left it at around 9pm on Sunday, had gotten all their exterior walls up and interior rooms all built out. Which is kind of amazing considering that it’s a two story structure. In fact it’s so big and fancy they had quite a number of people as well as other teams come by asking what the heck they were building… it could very well be a livable house. They’ll be doing the internal walls, painting and adding all the electronics by Carnival on Thursday. While we’d love to keep filming them working their butts off, we unfortunately have our own project to attend to this week.

So with that I’d like to Thank the I3 Team for letting us film them, letting us tap into their power, making us honorary team members and putting up with us in general ๐Ÿ™‚ . Best of luck to you guys on Thursday… can’t wait to see it when it’s fully done… also get some sleep! You’ll need it for the teardown ๐Ÿ˜‰ .

MocoCompositor: The Nasty Hack + Videos

April 9th, 2010

So remember a while back I mentioned that I wanted to be able to expose the image plates data structure (ie the image layers + shaders for each layer) used by the MocoCompositor to the main user interface (the web)? Yeah well, turns out it was waaaaaay harder than I’d expected it to be. The exposing of the data structure and UI wasn’t too bad it was the part where I had to rebuild the image plates within the compositor while it was still running (quite multithreaded no less).

Let me explain how the Compositor Configurator (compositor_config.html file) works… this is a simple web page that loads a json file from the server upon onLoad using the /loadjson/ service that the Server already provides (the same one used in saving and loading curve sets). This json file, named “compositor_layers”, contains the current state of the compositor image plates. The compositor_config.html GUI lets the user add a new layer, reorder the layers, change the shader used as well as any parameters necessary for that shader. When the user is satisfied they simply click the “Update Compositor” button and it sends the new image plates as JSON to /publish/compositor_config . Meanwhile the Compositor is sitting there with a subscription to /subscribe/compositor_config (in a thread)… the minute it receives the new plate data structure it saves it to the “compositor_layers” json file and then it “theoretically” rebuilds the image plates within the compositor and starts executing those.

Now I said “theoretically” for a reason… turns out pyglet + shaders + OpenGL + PIL + whatever else really don’t like having things being shuffled from underneath them. I was getting this incredibly annoying Segmentation Fault every time I tried to rebuild the plates after the initial build. I tried everything I could think of including putting a giant COMPOSITOR_LOCK around both the on_draw and rebuild_plates functions hoping that maybe it was just a race condition of the data structures getting used while in an invalid state. No dice, I was still getting SegFaults… there wasn’t going to be a nice clean way for me to rebuild these image plates with all these threads running as well as OpenGL doing shader stuff concurrently.

So I used my nuclear option… suicidal processes… sometimes the only way to cleanly do something is to destroy the process (thereby giving the OS a chance to clear memory, close file handles etc) and start it up again… Now we have the MocoCompositor at start up do nothing but spawn another process of itself (using the subprocess.call function). The child then can create the pyglet window + shader etc… when the child receives the new image plate data structure from the GUI it saves it to the json file and promptly unsubscribes all the subscriptions and exits pyglet as cleanly as it can and exits with an exit value of 1. When the child dies the parent process gets back the exit value (remember he’s blocked on the subprocess.call() call), if the exit value is equal to 1 it knows the child was requesting a restart so he goes ahead and spawns a new child process of MocoCompositor which in turn loads the configuration from the json and bam we’re back in business with the new configurations. If the exit value was anything else the parent assumes a normal exit and doesn’t try to restart the child and just exits himself.

The observant reader would wonder how the MocoCompositor would keep from recursively spawning child processes of itself. We prevent this by passing an additional command line flag (that is picked up in the child’s sys.argv) called ‘–no-subprocess’ in the subprocess.call()… when MocoCompositor sees that flag he assumes he’s the one that should be doing the pyglet stuff rather than spawning… additionally this allows us to run a non-respawning version of MocoCompositor (still will kill itself on a configuration change) by passing the ‘–no-subprocess’ flag to it in the command line.

All this works without the clients that are subscribed to /viewfinder/compositor being any the wiser because of the way the Server implements the Publisher/Subscriber model… it never tells recipients of a disconnection from senders and still keeps live sockets connected. This is what gives me the freedom to kill a publisher/subscriber process at will without hurting others in the network (theoretically).

Our MocoCompositor now has the following shaders:

  • normal – Simply loads an image from a file or an HTTP stream (MJPEG) which becomes the input to the next layer.
  • blackandwhite – Grayscales the previous output image using the luminosity method of grayscaling. http://www.johndcook.com/blog/2009/08/24/algorithms-convert-color-grayscale/
  • greenscreen – Accepts a new file/HTTP stream to use as a foreground plate on top of the previous output layer’s image (background plate). Green pixels in the foreground image are replaced with the pixels from the background image. It also has an option to set a small “feather” value to fine tune the matching for green removal.
  • horizontal_blur, vertical_blur – These are 2 versions of the same blur shader (one blurs horizontally and the other vertically). They accept a mask image where the alpha values in the mask are used to determine the “blurriness” of the background plate. So one can open up photoshop and do a simple gradient on the alpha channel and save it as a PNG and set it as the mask for this filter to make gradient blurs (ie. what’s used in faking tilt-shift shots). These filters also accept a blur_size value (512 seems to be ok, the higher the smoother it seems). I stole the algorithm for these blur shaders from http://www.gamerendering.com/2008/10/11/gaussian-blur-filter-shader/ .

Ok enough talk, let’s see some results… the following are all live MocoCompositor outputs from /viewfinder/compositor that I opened up in a different computer’s browser and screen captured using xvidcap (on Linux)… that also explains the slowish frame-rate.

Here’s a Tilt/Shift of the street outside using a gradient blur mask with the horizontal_blur and a blur_size of 1024.0:

Here’s the same Tilt/Shift scheme with a blur_size of 512.0:

Here’s the cool and addictive greenscreen shader… you’ll notice the background plate is coming from a live Maya virtual camera compliments of MocoMaya (/viewfinder/maya), the foreground plate is the /viewfinder/rig… also a special thanks to the YouMedia group for unknowingly giving us those little oragami thingies… it was a nice square green sheet of paper :).
Note: The reason I keep looking off to the right is because that’s where our big LCD TV is with a full screen Firefox displaying the /viewfinder/compositor and also doing the xvidcap ๐Ÿ˜› (terrible weatherman impersonation as well):

Finally we have the blackandwhite shader (for that Film Noir look):

MocoMaya: Update + Screenshots

April 7th, 2010

Ok I finally implemented the MocoMaya curve export feature (it took so long because I had to dig through the Maya Python API looking for the right functions for traversing the graph curves and then testing them out, such a pain because of the odd style of MEL they used). Anyways we’ve got curve’s exporting to the MocoServer (normalized as are all the curvesets stored in the server).

Just like the Import, the Export assumes the begin and end frames are the begin and end frames set by the user in Maya and the frames are normalized between them (Note that the begin and end frames are preserved in the curve set file so they show up in the GUI’s Begin/End Frame settings… I’m still contemplating if that is a good idea or not since the Import ignores the Begin/End settings coming in… it doesn’t seem “well-balanced” if the Import and Export don’t follow identical rules… meh).

To test it all out, I created a curveset in the MocoBuilder, imported it into Maya, updated some actuator rotations by moving the joints in Maya, then exported the curveset back to the MocoServer. Then I created a second MocoMaya rig by running MocoMaya again (before that I quit the previous one’s window which still keeps the rig in the scene with the last curves it has). I moved the old one over a little bit, turned on live control from the MocoServer, opened up the new curveset in the MocoMotionBuilder GUI and started scrubbing through the frames and playing it flat out. You’ll remember that when the frames are scrubbed in the GUI it’ll also be updating the Maya’s current frame pointer which in turn animates whatever is already animated in Maya (ie. the old rig w/ curves).

So I hit play and keep a close eye on the Maya window to make sure the two rigs are moving identically… and for the most part they were pretty much in sync. This tested the correctness of the export feature and being able to read the resulting curveset in a different program (the MotionBuilder GUI). Now I did say “for the most part” above because apparently my interpolation of the curves don’t seem to be perfectly in sync with the way Maya is interpolating its animated curves (I believe Maya uses Bezier curves while I’m doing Catmull-Rom). However I believe this is in enough of a correctly functional state to be acceptable for the prototype that it is and I don’t intend to dig too much more into the finer details of spline math or the inner workings of Maya (neither of which I’m any good at).

Here are some screenshots of the MocoMaya client in action from within Maya 2010.

This is the start-up window and tab where you can enter in the Server, Port and optionally an existing Maya camera’s name.

Here’s the Controller tab where you can toggle on/off the control of the virtual camera by the MocoServer. You’ll also see that you can turn off the outgoing “viewfinder” (ie. a rendering from the camera’s POV) and also even turn down the resolution of the rendering to speed up the viewfinder/rendering. The viewfinder is viewable in the MotionBuilder GUI as a viewfinder.

This is the Import/Export curves tab which allows you to load and save curvesets to the MocoServer.

Here are some screenshots of the virtual camera rig that is created on the fly by the script when it connects successfully to the MocoServer. You’ll notice it’s nothing more than 3 joints with a camera stuck on the end… remember joints don’t get rendered in Maya so it should prevent the rig from rendering itself by accident ๐Ÿ™‚

Here’s that last one annotated (using my awesome Gimp skillz) with the actual rotation movements

Moco MotionBuilder GUI: Updates

April 5th, 2010

I’ve made some updates to the MotionBuilder GUI (pay no attention to the top-right frame, it’s being worked on):

  • Added buttons for changing the Begin and End Frames (remember that internally all the curves and frames are normalized values so this Begin/End Frame thing is really just for the human and also determines the actual number of frames to iterate over when executing).
  • Added 3 tabs for grouping the various curves by role: Actuators, Delay and Camera. The Actuators tab contains all the arm motors/servos as curves. The Camera tab shows the number of images to capture at each frame and this can be animated over a curve as well (though it’s a discrete stepped value). The Delay curve essentially tells the rig how long it should wait before taking the image for the frame and moving to the next frame. This delay curve is what gives our system timelapse abilities and since it’s animated over a curve we can do “time-ramping” of time-lapse (think speeding up and slowing down of time). The delay tab also has a button where you can set the delay curve’s upper bound (in seconds)… I’m fixing the minimum to 5 seconds because there needs to be some sorta sanity preventing the user from killing the hardware (the camera takes at least a second to get an image to the computer depending on the resolution).
  • Each curve now has an “Arm/Disarm” button, a “Lock” button and a “Curve type” pulldown. The Arm/Disarm button toggles whether the curve has any effect during frame scrubbing/playback or saving of curves… it’s like a Mute button for curves. The Lock button, when engaged, prevents any changes to the curve. The Curve type pulldown allows you to change the style of interpolation used (catmullrom, linear, step) for the curve.
  • Disabled the ability to click around in the curve canvas to set key points because mouse coordinates within heavily CSSed pages don’t seem to be accurate or reliable in browsers (getting weird offsets). Until I overhaul the layout to be barebones (which I doubt will happen) or I figure out a more reliable mouse event coordinate calculation, mouse clicking in canvases will have no effect. This means you must scrub the frame play head to the location you want to change and use the actuator slider on the left to set the keypoint (or alternatively use the keypoint toggle button above the canvas).
  • Added the ability to click on the value label of the curve (the underlined value next to the curve’s name) and set it by typing in the value.
  • Implemented the job_executor publication so that the MocoBot can /subscribe/job_executor and the GUI can /publish/job_executor when the user clicks the new “Execute” button on the interface. When clicked a new window pops up to verify the intention of the user (see the following image). Once verified it sends the job to the MocoBot to execute the timelapse run. Note this will lock up the rig from being available to the GUI again until it is finished with the run (that means no viewfinder either).
  • Added a “Calculator” button which pops up a new window with some very useful taping-time and running-time calculations using the current state of the delay curve. It was always annoying (if not impractical with delay ramping) to figure out how long your taping session was going to take… but now the math is automagically a click away. Note that this calculator actually iterates through each frame to figure out the delay to add, it just only displays the results at each keypoint found in the delay curve.

MocoMaya: Controlling a Virtual Camera Rig in Maya

April 4th, 2010

Being able to control a real-life robotic armature/camera rig wasn’t enough… so I implemented the MocoMaya client Python module. Autodesk Maya 2010 has the ability to be scripted using its embedded Python interpreter. Python is pretty much a second class citizen in Maya since their original, native and well supported scripting language is MEL. Fortunately they have it set up such that the Python API somewhat closely mirrors the MEL API… albeit quite nastily.

The MocoMaya implementation has the ability to connect to a MocoServer and get live actuator movements, publish its own “viewfinder” (which is really just a live rendering of the scene as seen by the camera) and it can even import the CurveSet that is saved from the MocoMotionBuilder interface.

To run MocoMaya you have to start up Maya, go to the Scripting Window and under the Python tab enter about 4 lines of Python (really it’s only 2 lines that count (import MocoMaya; MocoMaya.MocoMaya()) the 2 lines preceding it are to append to the sys.path so it can find the MocoMaya module from within Maya… remember this is a Python that’s embedded and quite stripped down)… when you execute it you get a nice GUI.

When you run MocoMaya a small window pops up asking you for the Servername and Port of the MocoServer, it also lets you specify an existing virtual Maya camera if you want to use it as the camera being controlled by the MocoServer otherwise it’ll just create its own camera. After you enter the proper server and port and hit Connect a skeletal rig (using joints) is created and the camera is parented to it’s neck automatically… this rig represents all the joints that are movable in the actual real-life rig (I still need to add constraints/limits to the joints). From there you can go over to one of the other two tabs: Controller and Import/Export.

In the Controller tab you have several configurations dealing with the live control that you can set. You can enable/disable the MocoServer from controlling your camera live (ie. as the user is doing things in the MocoMotionBuilder interface things start moving around in Maya). You also have the ability to disable the outgoing viewfinder, this can drastically speed up the reaction time for Maya as it isn’t rendering the scene at every single move/frame. Finally you can turn down the resolution of the viewfinder output so it can render faster.

Note that when you have live control turned on the following things are constantly updated live in Maya as they are changed on the GUI: The armature positions and the “current frame”… the “current frame” is a relative concept. No matter what you set the Begin/End Frames on the GUI it still represents the frames as a normalized value from 0 to 1 internally as well as when it exports things. This means that as you’re creating some animation in Maya for a totally different frame rate/range the live playhead moves relative to the begin/end frame range in Maya… this is a feature. If you want the actual frame numbers to be identical then simply set both the GUI’s and Maya’s frame Begin/End values to be the same.

Additionally at every actuator command MocoMaya receives it also renders the image at the end of the motion from the rig camera and publishes it to the MocoServer as “viewfinder/maya”… this viewfinder, like all the other viewfinders, can be viewed in the browser or pulled in by anything else that can pull viewfinder images (ie. MocoCompositor).

In the Import/Export tab (only the Import works for now)… you can specify the filename that you saved the CurvesSet to in the MocoMotionBuilder interface. It will pull in the CurveSet from the MocoServer and apply all the keyframes and rotations to the virtual rig… you can see the curves using the Graph Editor and clicking on the rig/camera.

It’s worth mentioning that when you’re doing a live control from the MocoServer, nothing is actually being keyframed or saved in you Maya file… it’s only being used for previewing the framing and such. If you actually want the rig motion curves applied in Maya you’ll have to save them in the MocoMotionBuilder and Import the CurveSet as mentioned above. This is also a feature (not screwing with your Maya scene too much).

Also Note that if you have your rig already keyframed and animated it’s probably a good idea to turn off the live control otherwise both Maya and the MocoServer are fighting over the rig as the frames are being scrubbed. The easiest fix here is to delete your keyframes for the rig so the MocoServer can get total control.

So what’s next? I’ll need to get around to implementing the Export feature (to export the CurveSet to the MocoServer so others can use it). Right now the live control is a one-way thing (MocoServer->MocoMaya) I was contemplating the possibility of bi-directional live control so any rig changes made in Maya would send actuator commands back to the MocoServer etc… but that’s just another can of worms just waiting to happen (ie. trapping events in Maya and that whole re-entrant issue of receiving actuator commands you yourself published (the latter can probably be fixed with adding a ‘_initiator’ signature of some sort and filtering actuator commands containing your own signature))… but there’s more important things to implement/fix before I get time to think about this, so I’ll just add it as a TODO here.

“Oh controlling Maya is great and all, but how about us poor people that can’t afford the expenses of Maya?” you ask… yeah I feel your pain. I originally wanted to implement a MocoBlender (for the awesome open source Blender 3D program)… unfortunately after much research I found out that they are in the middle of a major overhaul for their 2.50 release. Their Python API is being reworked and isn’t quite ready for prime-time. This gave me two options, either implement it in the old API with a limited lifespan or just wait it out til 2.50 comes out and try to implement it then… yeah I’m lazy, I’ll wait til it comes out… well after graduation :P.

Here’s a quick little video that I made using MocoMaya to plan out the motion of the virtual rig, then importing the CurveSet and finally doing the Maya batch render out to frames. I used Imagemagick’s convert command to make an mpeg out of it. Note: I found the house model and the background image somewhere online (no ownership claim, but fair use). Obviously I’m not a professional animator ๐Ÿ™‚

MocoServer: Integrating the Cheetah Template Engine

April 1st, 2010

The GUI/html interface was becoming a bit complex and I could no longer simply do straight up HTML/Javascript for some features that we’d need (ie. getting publication lists etc)… and I wasn’t about to wrap every little server side data structure in json to send it out to the client to parse out. The obvious solution was to implement server-side scripting… yeah yeah Python is already the de-facto server-side scripting since it’s already running the Server process itself, but I didn’t want to bloat the Server code with GUI-specific functionality. Also if I’ve learned anything from my years of web development, it’s that you REALLY don’t want to mix HTMLย  (and inevitablyย  Javascript) inside a scripting language… its just nastiness. It’s much nicer to flip it inside out and use a HTML templating technique with embedded python instead (ie. similar to mod_python’s PSP ). Unfortunately the PSP interpreter is stuck to mod_python (the Apache module) and I wasn’t about to emulate the Apache module API just to include mod_python into my webserver… Althoooough that’d be pretty sweet and it’d give me all those mod_* for free from Apache ๐Ÿ˜› Nah!

I could just reimplement my old “Embedded-Perl-in-HTML-via-EntityTags-and-<perl>tags-aka-PHTML” code as Python… it WAS pretty sweet and just 2 functions. But alas, due to lack of time and the fact Perl was ideal for the heavy regular expressions I used in it while Python regex I’d have to figure out all over again… also the fact it was totally not a standard templating language but my own cheap albeit beautiful solution ;).

So with a little more Googling I came across the Cheetah Templating Engine (a Python based Templating engine for text). The template syntax seemed a bit esoteric at first but after reading the documentation, experimenting and realizing they were creating a generic templating language (not just HTML) it started making some sense. On the programmer/integrator side of things the Cheetah API is super simple and clean to use. You simply create an instance (or derive your own class from their class) of Template passing in a filename or string containing the template syntax, then you can set arbitrary properties (which become accessible variables from within the template syntax) to the Template instance. Finally whenever the template gets the __str__ call (ie. in print or str() calls) it returns the generated final output.

Within the MocoServer I integrated it such that if the file extension ends in .tmpl it assumes it’s a Cheetah template file (this is Cheetah’s own idea, I just went with it) and calls the Template instance to generate a string and then we send the string out to the client. Additionally I use a sub-file extension to determine the Content-Type to send to the client (ie. .html.tmpl sends text/html, .js.tmpl sends text/javascript etc). All these tmpl files exist in the same htdocs directory as the rest of the web content.

To be of any actual use the template syntax will need access to data from the Server itself, so I went ahead and bound several Server data structures to the template’s attributes. Namelyย  CGI_Params goes to Template.CGI, clientSocket goes to Template.clientSocket and the server instance itself goes to Template.server (security considerations be damned :P). So now any templated file can essentially look up whatever datastructures I’m using in the server, send random junk out to the client or just look at the pre-parsed CGI parameters. Heck it can even invoke methods within the server (ie. stop() the server from inside, such a bad idea :P)

This should certainly make GUI design a bit easier when things need to be tightly integrated with the Server.

MocoFaceTracker: Facial tracking via OpenCV

March 26th, 2010

While I was in Bat-teK last semester someone had mentioned that there was a package called OpenCV that could do facial tracking. So on a whim I downloaded it and looked through the API and example code. In particular there was one example of using the Python binding with facial tracking.

Side Note: Installing OpenCV was a pain in the butt since Ubuntu already came with the 1.0.* package in the repository and I couldn’t for the life of me get OpenCV 2.0’s Python modules working correctly. So I just used the old module interface since OpenCV2.0 still has it supported anyway (for now). I have a thing or ten to say about the OpenCV API but I’ll bite my tongue since children may be reading this blog.

Anyways, MocoFacialTracking is a separate standalone program that connects to the MocoServer to do passive facial tracking and actuator movements. I ripped out the important bits from the facial tracking example and wrote a viewfinder Subscriber that pulls in the live video stream from the camera. Then it applies the OpenCV facial tracking algorithm (which probabilistically returns a bounding square for each of the human faces it finds on the image using the training set it comes with)… I then pick the biggest one of these bounding squares, find the midpoint of the square and then the vector from the middle of the frame (this determines the speed in which to move to bring the face to the center of the frame). After that I publish new actuator movements to the server which in turn move the pan/tilt head accordingly to bring the face midpoint to the frame midpoint. Since there isn’t a sure fire way to know how far to move the servos to center the face (you’d have to figure out the distance of the face to the camera and the field of view at that distance to figure out the frame edges to interpolate the pan/tilt absolutely)… instead I just move the pan/tilt servos a very tiny amount (multiplied by the distance to center vector I calculated earlier for an extra umpf and to slow down if it’s close). But remember that the image is being streamed continuously and the facial tracking is being done on every frame it can get. This means the facial tracking is continuously running and adjusting the servos incrementally. In the long run it comes off as smooth (albeit slow) movement to center the face in the frame. The biggest issue was the facial tracking bottleneck which can back up the image stream (and also the server on the other end)… to solve this I made the viewfinder subscriber a thread that rips through the incoming stream as fast as possible updating the latest image internally while another thread does the facial tracking and actuator publications. So whenever the algorithm gets done from an iteration it’ll have the latest image to process for the next iteration. Quick somebody implement this in GLSL ๐Ÿ˜›

So now you ask, “Great you’ve got primitive facial tracking which moves the pan/tilt to keep a person’s face in the center of the frame? WHY?”. Well I’m glad you asked… just imagine if you will a use case where a news reporter is on the field without a camera guy. Another good example would be if for a very long timeframe you want to track a certain feature in the world but you don’t know for certain the direction it will travel in (ie. a plant growing). The last thing you want is to come back a few days later and find out the plant grew out of the frame before it bloomed and thus ruining your whole shoot. Even though we’re doing facial tracking, the OpenCV algorithm is really “feature tracking” so given a good enough training set it should theoretically be able to track other features within a frame and thus realign the camera to follow it.

TODO: I’ll never get to it this semester but if we were to flesh out the TV news reporter scenario I’d love to add some heuristics to the MocoFacialTracker such that it uses some simple composition rules (like rule of 3rds, horizon line etc). Also if a second face shows up in the frame maybe it can adjust to doing a two-shot automatically (especially if the faces are pointed at each other as in conversation (ie. a news reporter interviewing a witness)… I think PittPat might have the facial tracking we’d need for face direction). I’d also love to track the handheld microphone the news reporter uses to use it as a cue to the algorithm as to where it should be focusing its attention. For example if the news reporter says something into the mic and then shifts it to the witness to speak, the algorithm would know to follow and focus on the witness til the mic is brought back. Just little things like that which a human camera guy would instinctively do and aren’t too computationally heavy or AI-y would be great things to add.

MocoCompositor: First Real Test

March 24th, 2010

So I finally got a hold of a green screen cloth from Krishna (Thanks Krishna!) and draped it over our office door. I setup the camera in front of it, found a cool sci-fi desktop wallpaper to use as my background plate, the live camera viewfinder as the foreground plate and ran the MocoCompositor with the greenscreen/chroma-key filter GLSL shader. The output from the compositor streams to a new viewfinder called compositor so any browser should be able to view the final composite also.

The computers were setup such that the netbook Dell Mini9 was running the server, background plate stream and the MocoBot (the live camera feed) programs. Meanwhile my desktop (which is a bit more heavy duty with an nVidia GPU card) was running the MocoCompositor which connected to the server on the Mini9 to pull the camera viewfinder feed as well as the background plate feed (oh ya I didn’t mention that I also wrote a quick static image streamer publisher that will later become a video/frame player in the UI). At each pyglet on_draw event the MocoCompositor applies/blits the current images to the appropriate plates/textures and composites them using the shader. The resulting texture is then pulled from the GPU framebuffer and compressed to jpeg (via pyglet via PIL) and streamed out to the server via a publisher.

It works surprisingly well and fast given the amount of network traffic it produces. And these are the results (I print-screen captured these while viewing the compositor viewfinder in a browser on yet another computer :). Note that the background plate is using a sci-fi wallpaper of the Earth I found on Google images… (I claim no copyright to it, but do claim fair-use for educational purposes):

Here’s me smiling cheesily (Note: my left shoulder isn’t abnormally low, I was reaching for the printscreen button on the keyboard below ๐Ÿ™‚ )

And here’s me looking into the Universe contemplating existence or looking for Dr. Who.

Obviously the lighting was crappy and uneven hence the green pixels still present from the foreground plate. I need to come up with a way of passing the shader parameters to the server and to the UI for the user to fine-tune its settings rather than hardcoding it into the shader like I do now. Playing with green screening is getting to be more fun than I’d expected… need to get back to work.

Edit: I wonder if the JPEG compression/decompression could be done as a shader… Looks like NVIDIA’s site has an example of the DCT algorithm as a Cg shader (we use GLSL)… but I’m not sure of what the rest of the JPEG compression algorithm/format would need. But I think a future extension to this software could be a JPEG compression shader to be able to get rid of PIL doing the jpeg compression and speed things up even more… just a thought, but outside of the scope of this project obviously.