Too Busy For Words - the PaulWay Blog

Recording video at LCA

A couple of people have asked me about the process of recording the talks at Linux Conference Australia, and it's worth publishing something about it so more people get a better idea of what goes on.

The basic process of recording each talk involves recording a video camera, a number of microphones, the video (and possibly audio) of the speaker's laptop, and possibly other video and audio sources. For keynotes we recorded three different cameras plus the speaker's laptop video. In 2013 in the Manning Clark theatres we were able to tie into ANU's own video projection system, which mixed together the audio from the speaker's lapel microphone, the wireless microphone and the lectern microphone, and the video from the speaker's laptop and the document scanner. Llewellyn Hall provided a mixed feed of the audio in the room.

Immediately the problems are: how do you digitise all these things, how do you get them together into one recording system, and how do you produce a final recording of all of these things together? The answer to this at present is DVswitch, a program which takes one or more audio and video feeds and acts as a live mixing console. The sources can be local to the machine or available on other machines on the network, and the DVswitch program itself acts as a source that can then be saved to disk or mixed elsewhere. DVswitch also allows some effects such as picture-in-picture and fades between sources. The aim is for the room editor to start the recording before the start of the talk and cut each recording after the talk finishes so that each file ends up containing an entire talk. It's always better to record too much and cut it out later rather than stop recording just before the applause or questions. The file path gives the room and time and date of recording.

The current system then feeds these final per-room recordings into a system called Veyepar. It uses the programme of the conference to match the time, date and room of each recording with the talk being given in the room at that time. A fairly simple editing system then allows multiple people to 'mark up' the video - choosing which recorded files form part of the talk, and optionally setting the start and/or end times of each segment (so that the video starts at the speaker's introduction, not at the minute of setup beforehand).

When ready, the talk is marked for encoding in Veyepar and a script then runs the necessar programs to assemble the talk title and credits and the files that form the entire video into one single entity and produce the desired output files. These are stored on the main server and uploaded via rsync to mirror.linux.org.au and are then mirrored or downloaded from there. Veyepar can also email the speakers, tweet the completion of video files, and do other things to announce their existence to the world.

There are a couple of hurdles in this process. Firstly, DVswitch only deals with raw DV files recorded via Firewire. These consume about a gigabyte per hour of video, per room - the whole of LCA's raw recorded video for a week comes to about 2.2 terabytes. These are recorded to the hard drive of the master machine in each room; from there they have to be rsync'ed to the main video server before any actual mark-up and processing in Veyepar can begin. It also means that previews must be generated of each raw file before it can be watched normally in Veyepar, a further slow-down to the process of speedily delivering raw video. We tried using a file sink on the main video server that talked to the master laptop's DVswitch program and saved its recordings directly onto the disk in real time, but despite having tested this process in November 2012 and it working perfectly, during the conference it tended to produce a new file each second or three even when the master laptop was recording single, hour-long files.

Most people these days are wary of "yak shaving" - starting a series of dependent side-tasks that become increasingly irrelevant to solving the main problem. We're also wary of spending a lot of time doing something by hand that can or should be automated. In any large endeavour it is important to strike a balance between these two behaviours - one must work out when to stop work and improve the system as a whole, and when to keep using the system as is because improving it would take too long or risk breaking things irrevocably. I fear in running the AV system at LCA I have tended toward the latter too much - partly because of the desire within the team (and myself) to make sure we got video from the conference at all, and partly because I sometimes prefer a known irritation to the unknown.

The other major hurdle is that Veyepar is not inherently set up for distributed processing. In order to have a second Veyepar machine processing video, one must duplicate the entire Veyepar environment (which is written in Django) and point both at the same database on the main server. Due to a variety of complications, this was not possible without stopping Veyepar and possibly having to rebuild its database from scratch, and I and the team lacked the experience with Veyepar to know how to easily set it up in this configuration. I didn't want to start to set up Veyepar on other machines and finding myself shaving a yak and looking for a piece of glass to mount a piece of 1000-grit wet and dry sandpaper on to sharpen the razor correctly.

Instead, I wrote a separate system that produced batch files in a 'todo' directory. A script running on each 'slave' encoding machine periodically checked this directory for new scripts; when it found one it would move it to a 'wip' directory, run it, and move it and its dependent file into a 'done' directory when finished. If the processes in the script failed it would be moved into a 'failed' directory and could be resumed manually without having to be regenerated. A separate script (already supplied in Veyepar and modified by me) periodically checked Veyepar for talks that were set to "encode", wrote their encode script and set them to "review". Thus, as each talk was marked up and saved as ready to encode, it would automatically be fed into the pipeline. If a slave saw multiple scripts it would try to execute them all, but would check that each script file existed before trying to execute it in case another encoding machine had got to it first.

That system took me about a week of gradual improvements to refine. It also took me giving a talk at the CLUG programming SIG on parallelising work (and the tricks thereof) to realise that instead of each machine trying to allocate work to itself in parallel, it was much more efficient to make each slave script do one thing at a time and then run multiple slave scripts on each encoder to get more parallel processing, thus avoiding the explicit communication of a single work queue per machine. It relies on NFS correctly handling the timing of a file move so that one slave script cannot execute the script another has already moved into work in progress, but that at this granularity of work is a very small time of overlap.

I admit that, really, I was unprepared for just how much could go wrong with the gear during the conference. I had actually prepared; I had used the same system to record a number of CLUG talks in months leading up to the conference; I'd used the system by myself at home; I'd set it up with others in the team and tested it out for a weekend; I've used similar recording equipment for many years. What I wasn't prepared for was that things that I'd previously tested and had found to work perfectly would break in unexpected ways:

In testing, the slave system capturing the output from the speaker's laptop would continue capturing when it was unplugged; during the conference, the capture process would die completely and had to be restarted manually.
Noticeable audio hum would come from a couple of rooms when the speaker left the audio cable unplugged.
The main video server's RAID system threw a hard disk on Wednesday and had to be taken offline to be backed up in full before I could start using it again.
The machines we used for recording Llewellyn Hall were newer than the kernel in the image we were using and it didn't recognise the firewire cards or the video display, and had to be upgraded.
The 60 minute DV tapes only recorded about 40 minutes of video.
The wireless microphone batteries would occasionally die, often producing lots of pops beforehand.
In a couple of recordings, the video just speeds up for no readily apparent reason and then goes back to normal.
Speakers laptops are a very mixed bunch and, even when the speaker is convinced that they've set their external output to the right resolution, can stop producing video or refuse to work with the input system.
It was difficult for the person in charge of the mixing in each room to be sure that what they were seeing on their screen, and what they were hearing, was actually what was being recorded to disk.
DVswitch does not handle partial failure well - when we started, the only way for a room to easily fix a system that wasn't working was to restart all the software and hope it worked that time. By the end of the conference, Jason Nicholls and Luke John in the AV team had written a system that handled this process much better.
The naming scheme of the files was roomname/date/time, which meant that it was possible to inadvertently copy the video from one room's master laptop into the directory for another room. This happened because the rsync process was being run manually for fear of overloading the network and/or the main server's hard disk transfer speed.
It turns out that using a souped-up games machine with motherboard RAID controller isn't very good at handling one disk failing.
Running all the VMs for running the conference including the video processing and the main router for the network on one souped-up games machine can be a problem if there are hardware issues with that machine, and is going to be a problem if one or more of those machines is going to run CPU-intensive tasks like video processing.
Top-end, brand new machines running a standard version of Ubuntu lock up hard and have to be manually reset on an intermittent basis. Some locked up when running ffmpeg or ffmpeg2theora, and some locked up literally as I was typing a command into them when they were otherwise completely idle. It's more difficult to reset these when they're sitting in an office in ANU and you aren't.

The other main problem that galls me is that there are inconsistencies in the recordings that I could have fixed if I'd been aware of them at the time. Some rooms are very loud, others quite soft. Some rooms cut the recording at the start of the applause, so I had to join the next segment of recording on and cut it early to include the applause that the speaker deserved. There were a few recordings that we missed entirely for reasons I don't know. I was busy trying to sort out all the problems with the main server and I was immensely proud of and thankful for the team of Matt Franklin, Tomas Miljenovic, Leon Wright, Euan De Koch, Luke John and Jason Nicholls who got there early, left late, worked tirelessly, and leapt - literally - up to fix a problem when it was reported. Even with a time machine some of those problems would never be fixed - I consider it both rude and amateur to interrupt a speaker to tell them that we them to start again due to some glitch in the recording process.

But the main lesson to me is that you can only practice setting it up, using it, packing it up and trying again with something different in order to find out all the problems and know how to avoid them. The 2014 team were there in the AV room and they'll know all of what we faced, but they may still find their own unique problems that arise as a result of their location and technology.

There's a lot of interest and effort being put in to improve what we have. Tim Ansell has started producing gstswitch, a Gstreamer-based program similar to DVswitch which can cope with modern, high-definition, compressed media. There's a lot of interest in the LCA 2014 team and in other people to produce a better video system that is better suited to distributed processing, distributed storage and cloud computing. I'm hoping to be involved in this process but my time is already split between many different priorities and I don't have the raw knowledge of the technologies to be able to easily lead or contribute greatly such a process. All I can do is to contribute my knowledge of how this particular LCA worked, and what I would improve.

Last updated: | path: tech / lca | permanent link to this entry

Too Busy For Words - the PaulWay Blog

Sat 16th Mar, 2013

Recording video at LCA