Tutorial: Stereoscopic 3D with Sony Vegas

3D is back as the next big thing (until holograms arrive, hah!) and many forces are heavily pushing for it on all fronts. Soon enough, we will be enjoying 3D without the need for annoying glasses too.

Since July 2009, YouTube supports 3D videos. It offers various viewing styles to fit all kinds of tastes and… glasses. So, here’s how to shoot, edit and export such 3D footage in stereoscopic mode (a mode that allows YouTube to offer more than one viewing style) on Sony Vegas Pro 9 and prior versions, or Vegas Platinum 10 or earlier. Vegas Pro 10+ and Platinum 11+ have their own, different way of editing 3D.

The shoot

1. Buy this and this. Here’s a cheaper twin-head model if you’re short on money.

2. Place two identical cameras on the twin-head tripod. If not identical, they should at least be similar models (e.g. the HF10 and the HF100). Leave less than an inch/2cm of space between the two cameras for zoom level 0. But if you zoom-in, let’s say 3x, make their space ~2-3 inches/4-6cm. Of course, you need to be super-precise about your zooming level each time (genlock the cameras if they have that feature).

3. Setup the cameras the exact same way: frame rate, resolution, zoom level, exposure compensation, shutter speed, etc etc.

4. Make sure the cameras are level with each other (you can enable the “Grey markers” feature on your camera to test if the horizon is tilted in one of the two cameras). Try to shoot an object in a non-static way, always making sure there’s some background visible, so we can fake “depth” in the image.

5. Press “Record” on both cameras (maybe even by using a remote control, if your cameras came with one). I suggest you record in plain 50i/60i because 3D requires more frames to look natural (although PF25/PF30/24p/25p are workable, PF24 can be very problematic depending on the pulldown removal algorithm used, so stick with the default frame rates).

6. Use the clapper board to clap. The sound it makes will be used later to line up the footage from the two cameras.

The editing

1. Load Vegas, and set up a 1920×540 project if your cameras were full HD, or a 1280×360 project if your cameras were 720p (notice how the vertical resolution is half of 1080p/720p). Make sure that the rest of the project properties are correct (e.g. frame rate, field order, aspect ratio). Select “Best” for quality, and “interpolation” for de-interlacing algorithm. Here’s how it would look like if you shot in NTSC 1080/60i HD:

2. Place the two nearly identical clips from the two cameras in the timeline (one clip on the video track on top of the other clip). Zoom-in in the timeline, and find the place where the clapper makes the clapping sound. Based on this, line-up the two videos. Cut off the edges of these clips.

3. Load the “Track Motion” dialog for the video track on top. Click “Lock Aspect Ratio” icon in its toolbar. Then, change under the “Position” section the following: X:-480 Y:0 Width:1,920 Height:540. It should look like this:

4. Load the “Track Motion” dialog for the video track on the bottom. Do the same as above, but for X use the 480 value (instead of -480). Close it down. Now, you should have something that looks like this in the (ultra-wide) preview screen:

5. That’s it. Your video is now stereoscopic. Save the project.

The exporting

1. Export like it’s described here, but with two modifications: first, ignore the “project properties” setup in step-1 (we already did that step above), and secondly, the resolution. If you are exporting at 720p, then the resolution you should export is 1280×360. Everything else is the same as in that exporting tutorial. If you are exporting at 1080p, export at 1920×540 and give it a bit more bitrate (e.g. 8-9 mbps).

2. When the video is exported, upload it on Youtube. Make sure you add the following TAGS in your video, otherwise YouTube won’t apply the 3D menu options: HD, 3D, yt3d:enable=true, yt3d:aspect=16:9 (eventually it will be possible to tell Youtube your videos are 3D, so the 3D tags won’t be needed, but for now, use them).

3. After a while, YouTube will have converted your video to HD (it transcodes the low-resolution versions first and HD becomes available a few hours later). Wear your glasses, select the right Youtube 3D menu option on your video page (depending on what kind of glasses you got), and enjoy!


1. The kind of export we did here is called stereoscopic (with its wide 2 clips next to each other). There’s a way to export directly an anaglyph red-cyan image, hard coded, but this is the old way of doing things, now YouTube can dynamically adapt the stereoscopic image to various methods and viewing styles, so the stereoscopic way in this very article should be the method you should choose.

2. You can edit & export at full HD rather than just 540 pixels height, but you will have to create a 3840×1080 project to do that (1920+1920×1080). Unfortunately, only Sony Vegas Pro 9+ supports such high project resolutions.

3. YouTube requires 2x the CPU speed to playback 3D videos. So an older PC that barely plays back smoothly an HD YouTube video, won’t be able to playback a 3D version of that HD video smoothly.

4. If you don’t have two cameras, you can “fake” it by using the exact same clip twice, but by offsetting it by 4-5 frames in its video track compared to the other clip. This is how I did it on my 3D test here, since I don’t have two HV20s (although I am seriously thinking of getting an HV30 now to use it in 3D mode). This hack of course doesn’t produce realistic 3D, but it’s good enough to test things around and learn the workflow.


bobk wrote on July 23rd, 2009 at 6:36 AM PST:

Thanks for the write-up on the processing.

There are cheaper platforms for twin cameras, starting around $35. Here are two sources: one, two.

Instead of a clapper, I fire off a photo flash once both cameras are started, and it’s pretty easy to match frames from that.

This is the admin speaking...
Eugenia wrote on July 23rd, 2009 at 11:42 AM PST:

Thanks for the links. The one I link above is actually a better model, but this particular model should be able to do the job.

Regarding the photo-flash, it’s better to have both a visual and an audible cue to line up footage.

Comments are closed as this blog post is now archived.

Lines, paragraphs break automatically. HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

The URI to TrackBack this blog entry is this. And here is the RSS 2.0 for comments on this post.