Wednesday, December 24, 2008

Computer Vision on OS X with Python and OpenCV

After my macbook's sudden demise and spontaneous, inexplicable regeneration, I've decided to try and port some things I've worked on in GNU/Linux to OS X. Also, having a built-in microphone and camera on a portable computer is pretty amazing when you work primarily with audio and video and need to test things.

I was really happy with how my previous forays in pygame had been going, so over the holidays I thought I 'd try and make a version of my previous camLoops prototype, but for OS X.
Also, at Alexandre Quessy's invitation, I've added cam_loop.py to the ToonLoop project. More on that in the coming weeks, so stay "tooned" (if not put off by that terrible pun).

*UPDATE The initial opencv version of camLoops is now available. I've only tested it on a MacBook running OS X so far, but it should also work in GNU/Linux.

I did some research online and there did not seem to be much in the way of camera- frame-grabbing modules for OS X with python bindings (granted it is a pretty niche area). The most complete are Apple's QTKit, the Cocoa framework (which has python bindings) for Quicktime and OpenCV, or Open Source Computer Vision, which is a cross-platform, BSD-licensed library written in C with Python bindings. Since I have no interest in wasting time using a Mac-only, proprietary framework like QTKit, the choice was pretty obvious. The installation, however, was not.

Given my heavy use of fink packages (when I should really just give in and roll GNU/Linux on my laptop), I am used to having to fight with cross-platform librairies/frameworks. I was even pleased to find an OS X specific build instruction page on the OpenCV wiki. It's quite likely that my approach is not the best solution for building but it's what worked for me.

I decided for better or worse to get the current production version of Python, which is 2.6.1 at the time of this writing. I grabbed the disk image and ran the installer (and breathed a sigh of relief). For pygame to work, I had to get PyObjC, which is used to build Cocoa apps for OS X in Python. This is not required for OpenCV, so skip ahead if you're not using pygame. This also required an svn checkout of the 1.4 branch of pyobjc which works on Tiger, as no slick disk-images with installers are available from the website at present. To grab the branch, do:

svn co http://svn.red-bean.com/pyobjc/branches/pyobjc-1.4-branch pyobjc

Fortunately, the PyObjC people know their target audience and all it took once in the pyobjc directory to make a nice installer was:

python setup.py bdist_mpkg --open


*UPDATE Previously OpenCV was using CVS for version control, they have since migrated to Subversion. Check out a clean version with:

svn co https://opencvlibrary.svn.sourceforge.net/svnroot/opencvlibrary opencvlibrary
In the INSTALL file, it is suggested that you do autoreconf -i --force
This failed for me (even though my autotools are up to date) so i used the pre-existing configuration files and left autoconf alone.


In the opencv directory, create a build directory and enter it with mkdir build; cd build


From the build directory, run configure with a few flags set (replace sw with /opt/local if you are using darwinports instead of fink):

../configure CPPFLAGS="-I/sw/include" LDFLAGS="-L/sw/lib" --with-python

Now compile and install with make && sudo make install. You will be prompted for your password.

Edit your ~/.profile to include the following two lines, which will be different if you set the --prefix option on your configure script to something other than the default /usr:

LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH
PYTHONPATH=/usr/local/lib/python2.6/site-packages:${PYTHONPATH}
export PYTHONPATH

To test that everything worked, run your python interpreter and try import opencv. Provided that there is no errors such as 'no module opencv', you can start trying the python examples in the samples directory.


I ran python facedetect.py 0, where 0 is the index of the camera you want the program to use. The program then draws a red square around what it thinks is your face. Extra points for being beard proof, and hats off to Mark Asbach for his work on OS X support for OpenCV.

Monday, December 15, 2008

Live animation with pygame and video4linux

 EDIT: I've uploaded the code (and merged it into one file) on gitorious.

Last Friday, the SAT saw the premiere of a new work by Alexandre Quessy and Isabelle Caron called Motifs Urbains. The work was part of the experimentation phase of the propulse[art] project. The demo involved Alexandre generating live, multichannel audio and looping it, while at the same time in a separate room Isabelle did live stop-motion animation and looped it using Alexandre's Processing-based software, ToonLoop. Both rooms featured simultaneous playback of the audio and animation.
Inspired by the demo and my recent tinkering with pygame, I decided to try and write a prototype for a simple stop motion animation looper like ToonLoop. The result is cam_loop.py. Thanks to Nirav Patel's 2008 Google Summer of Code project, pygame (subversion revision 1744 or later) now supports video4linux camera input. The camLoop program shows the camera playback in the left hand region of the window, and the animation in the right hand of the video. The user can then grab frames live, and the accumulated frames will be played back in the right hand side of the window:

If you run the program during the winter months, you'll be subjected to a nauseating holiday theme overlayed on the incoming video. This example was simply to demonstrate how it's possible to easily draw arbitrary objects on live video. From Nirav's blog, it would appear that CoreVideo support is also forthcoming.

Monday, October 27, 2008

OpenGL and Gtk+ with GtkGlExt

UPDATE: source files now on gitorious

Lately I've been really impressed with the performance and simplicity of Gtk+. After using it to get fullscreen video with Gstreamer, I thought I would check out using Gtk+ in place of GLUT to write OpenGL programs.

The program teapot.c is a heavily commented example that creates a window and draws a mesh teapot. The accompanying Makefile will compile it under GNU/Linux provided the following packages (in Ubuntu) are installed:
gtk+-2.0 gtkglext-1.0 gtkglext-x11-1.0

GtkGlExt is an API extension to the Gtk+ API that allows developers to use OpenGL calls on standard Gtk+ widgets or on new custom widgets. There is also a C++ API,
gtkglextmm, for Gtk+'s C++ counterpart, gtkmm.

The program is broken down into initialization functions to set up the window and OpenGL context, as well as callbacks that are used to do the heavy-lifting. The callbacks are registered in the initialization functions, which means they are attached to specific events (or signals) and are triggered when these events are fired off asynchronously.

The expose callback (expose_cb) is where we do our OpenGL drawing. It will be called in response to "expose-event" signals, fired whenever the window needs to be redrawn, i.e. if it is resized, exposed (where regions become visible that previously were not), or moved.

The idle callback (idle_cb) is the only callback that is not attached to an event. It is registered using g_timeout_add, meaning that it will be called at a fixed interval. Its job is to flag the drawable area of our GtkWindow as needing to be redrawn. It is also where we would update any control or data parameters that the expose callback needs to take into account. For example, say we were stretching our teapot, we could have the scaling factor incremented in our idle callback. It's important to note, however, that g_timeout_add does not guarantee that the timeout interval will be respected, as explained in the GLib documentation:

Note that timeout functions may be delayed, due to the processing of other event sources. Thus they should not be relied on for precise timing. After each call to the timeout function, the time of the next timeout is recalculated based on the current time and the given interval (it does not try to 'catch up' time lost in delays).
The constraints of real-time computing are beyond the scope of this entry, but the animation subsection in Chapter 1 of the OpenGL Red Book offers a good overview.

But lo and behold, the wonder of the Utah Teapot:

Friday, September 26, 2008

Fullscreen video in gstreamer with gtk+

UPDATE: Source files are now up on gitorious.

In a similar vein to my previous entry, I'd like to explain how to get full screen video with gstreamer and gtk. Here again I'm using the latest CVS head of gstreamer, as well as the gtk+ and gdk development files.

I was very impressed with gtk's well documented API and in-depth tutorial. Also, since both gstreamer and gtk depend on GObject, combining functionality from both frameworks is quite seamless.

The source file fullscreen.c and accompanying Makefile are used for this example.

To test that your build environment is properly setup, I would recommend making a simple C file with the following:

#include <gst/gst.h>
#include <gtk/gtk.h>
#include <gst/interfaces/xoverlay.h>
#include <gdk/gdk.h>
#include <gdk/gdkx.h>

gint main(gint argc, gchar *argv[])
{
gst_init(&argc, &argv);
gtk_init(&argc, &argv);
return 0;
}


and try compiling. Make sure that you have all the necessary development files installed and that your environment can find them. The gst_init and gtk_init calls must always be called once, by any program using gstreamer and gtk. Note that if for some reason they are called more than once, the calls have no effect.

To be able to process key-events and to keep the pipeline rolling, we need to use glib's mainloop. It may be possible to achieve the same results with some other event loop mechanism, this is just the one most often used in gtk and gstreamer applications.

loop = g_main_loop_new (NULL, FALSE);

The simple pipeline here consists of a videotestsrc going to an xvimagesink. We set the "force-aspect-ratio" property of the xvimagesink to TRUE so that when the size is changed, the image's proportions are not distorted.

We build our gtk window:
window = gtk_window_new(GTK_WINDOW_TOPLEVEL);
g_signal_connect(G_OBJECT(window), "expose-event", G_CALLBACK(expose_cb), sink);

and attach the "expose-event" to the expose callback function. This function will be called when our window is brought to the foreground. The expose callback overlays xvimagesink's video on our gtk window.

A common feature in video-players is to assign a hot key to switch from windowed to full screen viewing. This is possible using:

gtk_widget_set_events(window, GDK_KEY_PRESS_MASK);
g_signal_connect(G_OBJECT(window), "key-press-event", G_CALLBACK(key_press_event_cb), sink);

which connects are key_press_event_cb function to the "key-press-event" signal emitted by the gtk window. The call to gtk_widget_set_events makes sure that the key-press event is propagated all the way up to our top-level window. This is important for determining which level of a window hierarchy is intended to handle such an event.

Lastly, we need to make the window black initially, as otherwise the background will sometimes be white when switching from fullscreen to windowed mode.
We call gtk_widget_show_all() on our window to "recursively show a widget, and any child widgets (if the widget is a container)". The pipeline is then set to playing, and the mainloop is run.


gstreamer test video

Thursday, August 21, 2008

Multichannel audio with gstreamer

Over the past few months, I have spent a lot of time working with the gstreamer multimedia framework. From their site:
GStreamer is a library for constructing [...] graphs of media-handling components. The use cases it covers range from simple Ogg/Vorbis playback, audio/video streaming to complex audio (mixing) and video (non-linear editing) processing.
I've found gstreamer to be remarkably flexible and useful for a variety of audio-video applications. One of the trickier things I had to figure out was how to have 8 channels of audio playing in one gstreamer pipeline.

These examples require the following packages:
  1. JACK Audio Connection Kit librairies
  2. gst-plugins-base-0.10.20
  3. gst-plugins-good-0.10.9
  4. gst-plugins-bad-0.10.8

These modules should be installed in the above order. Personally I use the CVS head for all of the above, which you can get by doing:
$ cvs -d:pserver:anoncvs@anoncvs.freedesktop.org:/cvs/gstreamer co modulename
where modulename is gstreamer, gst-plugins-base, gst-plugins-good, and gst-plugins-bad respectively.

Gstreamer includes a command-line utility, gst-launch, that allows a user to quickly build a gstreamer pipeline with a simple text description. For example:


$ gst-launch audiotestsrc ! jackaudiosink

will play a sine wave, provided you are already rolling a jack server. I generally run a jack server using the qjacktl application.

Users should be aware of this warning from the gst-launch man page:

gst-launch is primarily a debugging tool for developers and users. You should not build applications on top of it. For applications, use the gst_parse_launch() function of the GStreamer API as an easy way to construct pipelines from pipeline descriptions.

It is possible to run a simple multichannel audio example with the following launch line:

gst-launch-0.10 -v interleave name=i ! audioconvert ! audioresample ! queue ! jackaudiosink \
audiotestsrc volume=0.125 freq=200 ! audioconvert ! queue ! i. \
audiotestsrc volume=0.125 freq=300 ! audioconvert ! queue ! i. \
audiotestsrc volume=0.125 freq=400 ! audioconvert ! queue ! i. \
audiotestsrc volume=0.125 freq=500 ! audioconvert ! queue ! i. \
audiotestsrc volume=0.125 freq=600 ! audioconvert ! queue ! i. \
audiotestsrc volume=0.125 freq=700 ! audioconvert ! queue ! i. \
audiotestsrc volume=0.125 freq=800 ! audioconvert
! queue ! i. \
audiotestsrc volume=0.125 freq=900 ! audioconvert ! queue ! i.

This pipeline consists of 8 audiotestsrc elements, which generate sine tones of increasing frequency. The audioconvert element is used to convert a given audiotestsrc's output to the appropriate numeric data type that the queue element expects. The queue element is a simple data buffer, to which the audioconvert element writes, and from which the interleave element reads. The interleave element combines multiple channels of audio into one interleaved "frame" of audio. For example, if we had 2 independent channels of audio like so:

Channel1: 00000...
Channel2: 11111...

where channel 1 outputs only 0's, and channel only 1's, the interleaved frame would look like:

0101010101...

The interleaved audio again needs to go through an audioconvert and an audioresample element in case the audio from our pipeline differs in datatype or sample rate from the jack server. Finally the audio is output by the jackaudiosink element, which writes audio from our pipeline into corresponding jack ports.

Many plugins require that the interleave element explicitly specify each channel's spatial position. Unfortunately, this cannot be done with gst-launch. I've created an example C program, multiChannel.c, which initializes interleave appropriately. It can be compiled with this Makefile. The relevant section in multiChannel.c is the function set_channel_layout(GstElement *interleave). This function is passed our interleave element, and sets its channel-positions property to an array of valid spatial positions.

Gstreamer uses Glib's object system (GObject) heavily, and as a result the above example program might be a little tricky to follow for programmers used to straight C. Check out gstreamer's application development manual for further examples of gstreamer usage in C.

Monday, February 25, 2008

Metronome project

The ViMic software I worked on will be used in a piece on March 4th. I will post some notes about the current state of the project and how it was used after this concert.

Besides ViMic, I have been working on a customizable polymetric metronome built using the STK for a few months now, as a personal project. After looking at several GUI toolkits, I have decided to develop the interface for this program using wxWidgets. Since I have only used QT before (and only a little at that), I was unsure for quite a while about which toolkit would work for my project. I was mainly looking for something that was easy to develop crossplatform, in "straightforward" (aka does not feel like I'm learning another language) C/C++. I was persuaded by a friend to try wxWidgets, and was more or less sold by Jeff Cogswell's wxWidgets tutorial. Also the free wxWidgets ebook didn't hurt either.

I won't say too much more about the metronome at this point, I want to keep the feature list short for the time being. A few important points:

  • It will be licensed under the GPL version 3.
  • It will have customizable ticking sounds (pitch, resonance, loudness, etc.).
  • It will allow for multiple metres at the same time.
  • It will be easy to use.

The last feature is really the most important, but out of habit the GPL notice always comes first. Besides, mentioning ease of use last makes it more memorable (for me at least) than if I had listed it second.

Wednesday, January 9, 2008

Source directivity

Sound source directivity patterns are the newest (and possibly last, for a while) feature to be added to the ViMic project. The user will be able to load or draw a directivity pattern in a Max table object. The ViMicMax~ external will use this table to obtain a directivity gain value. It remains to be seen whether or not the source object should have its own copy of the table, or alternately use a built-in function to read individual values from the table.

Monday, January 7, 2008

Room model filtering

So I'm now working on a spatial audio project called ViMic, in C/C++ and Max/Msp. I will post a more comprehensive overview at some point, but I think I should write about the current issue I am working on. The problem is as follows:

  • To avoid the Doppler effect when moving a sound source rapidly (i.e. sudden change in delay time resulting in a pitch shift), the program crossfades between the previous delay time and the new delay time.
  • This approach is good for avoiding the Doppler effect, but has led to discontinuities in the signal. These result from reading from the wrong part of the delay line.
  • Adding filtering to simulate the walls absorption of the sound seems to worsen the discontinuities apparent in the crossfading case.

A potential solution we are working on right now is to have one set of filters for the signal that is faded out, and another for the signal that is faded in. These filter sets are being swapped to avoid discontinuities. Moving the filter swap to the start of the crossfade, rather than the end, seems to have resolved the issue.