Origin of idea
A couple of months ago, Sally (5) was bouncing around on the trampoline at my in-laws’ house, and said something very close to: ‘Dad, you should take lots of pictures of me jumping in the air, and put them together into a video, then it will look like I’m flying’.
I thought this was a cool idea, so we did it.
Instead of lots of individual stills, though, I thought I’d take some video footage and pull the frames out of that. With a digital camera kindly lent to me by my in-laws, I took about 1′20″ of Sally leaping around aimlessly. This is just over 2000 frames, at 25fps.
Locate her head in each frame
I briefly considered trying to do this automatically, but settled for a manual approach. I wrote a short
PyGame program to show me each frame in turn, and allow me to click on her nose if she was high enough in the air to count as flying. This was boring but didn’t take too long, giving me a list of candidate frames and their matching nose-coordinates.
Discover I had not held camera steady
Unfortunately, on comparing the last frame to the first, it turned out I had not held the camera steadily enough, and the image had drifted over the course of the sequence. The movement was enough that putting together frames from different parts would give unacceptable jittering up and down.
In contrast to the nose-locating problem, fixing this did seem suitable for automation, and so I followed the guidlines in a ‘Learn OpenCV’ tutorial to find and undo the camera movement. An initial trial with Euclidean warping suggested that the rotation was negligible, so I re-did the processing with pure translation. Smoothing slightly, and cropping to a common view-port gave me a stabilised image sequence.
I decided the nose coordinates wouldn’t have moved very much in this process, as far as the next step was concerned, so didn’t transform them.
Identify and assemble snippets of motion
From the nose-location data, it was straightforward to identify short fragments of the video where Sally is ‘in the air’ for a few consecutive frames. The question then is how to choose the best fragments to stitch together into a complete flight from one side of the trampoline to the other, say from left to right.
I framed the situation as a (directed) graph whose nodes were the video fragments in which Sally’s nose travelled from left to right, supplemented with special ‘start’ and ‘end’ nodes. Skipping the details, the shortest path through this graph then gave the sequence of fragments which covered as much of the travel distance as possible with non-overlapping flying. Repeating this gave a sequence of fragments for the right-to-left flight too.
Re-assemble into flying video
To make the final result last a bit longer, I used
ffmpeg to put the frames back together at only 10fps, and also loop back and forth a few times.
The result is quite fun, and demonstrates that her idea was a good one!