The importance of accelerated drivers

Geeks.com, sent over one of their popular computer parts products, video cards, the nVidia GeForce 8800GT (512 MB DDR3, PCIe, PureVideo2 HD support, GL 2.0, DirectX10, HDCP). This is a test on video playback performance with Vista-default drivers (Vista 64bit, SP2), and nVidia accelerated drivers (latest stable, v190).

I tested a Canon 5D Mark-II file, since it’s a heavy format: MOV h.264 High Profile, 40 mbps, no audio. File was loaded full screen in a 1:1 screen (1920×1080) at 32bit color, using various decoders and media players. Then, the frame rate and CPU usage was measured. I used a video file with a lot of movement to visually figure out if VLC was playing the file in real time or not (from the players I tried, it was the only one that didn’t have a way to show fps performance). The rest of the players had a way to get actual concrete numbers. Results below:

The CoreAVC Pro CUDA-accelerated version had of course the best result with just 3% CPU utilization (the Vista default drivers had no CUDA support). When CUDA was turned off, there was still a small speed up with the newer, non-Vista, drivers. The rest of the decoders also had it easier either with better frame rate, or with less CPU utilization. If they didn’t do better in terms of frame rate was mostly because of multi-threaded issues, as these decoders are written in legacy styled code (JBQ and I still joke sometimes how even today’s programmers can’t get multi-threading). The only decoder from the ones I tested that was actually multi-threaded was CoreAVC’s. These guys rock.

Please note that I used a speed up option for VLC to get real time decoding with it. By default, VLC doesn’t do real time on the 5D files, not even in this Quad Core 2.4 Ghz DELL PC I used for the test.

The moral of the story is:
– Use graphics cards that have a fast memory bus. Since 2D acceleration tapped by generic non-Purevideo decoders is mostly bandwidth-bounded, get cards that don’t cut costs by using slow memory or buses.
– Don’t leave your PC with the default XP/Vista/Win7 drivers. Upgrade to the latest stable version from your manufacturer’s web site.
– When possible, use CoreAVC Pro as your default decoder on media players/editors (Vegas won’t support it unfortunately, since it doesn’t support DirectShow decoders — but Premiere might).
– Prefer nVidia over ATi. nVidia’s PureVideo architecture is better supported by decoders, be it CoreAVC or Adobe’s CS4.
– Don’t ever opt for an Intel integrated card, unless you are really short on money.

9 Comments »

Luis wrote on July 27th, 2009 at 7:11 AM PST:

>- Don’t ever opt for an Intel integrated card, unless you are really short on money.

Regarding integrated graphics, are you sure these results would have been any different if you used an integrated nvidia 9400 card? Personally I don’t think so, but I haven’t tested it.

To get a nvidia 8800 card to play video seems like a crazy idea to me. Not only those high end cards are very expensive, they also use a lot of power, produce a lot of heat and therefor, noise. They are just designed for violent teenagers to play stupid games.

Regarding Intel, the problem are drivers and decoders’ support. Hardware wise, even the very underpowered GMA 500 that can hardly move windows around can decode 1080p h264 video on an atom processor with low CPU usage with the right drivers. And that platform (processor + graphics card + I/O controller) has a combined TPD of 4.3 watts, while the only card you used alone might be around 150 watts).

IOW, video decoding does not require lots of power. On the contrary, just some very low powered hardware with specific capabilities and working drivers and decoders that take advantage of it.

I agree that NVIDIA with updated drivers and latest coreAVC are a winning option now. But I disagree with the high-end, non-integrated solution.


Ben wrote on July 27th, 2009 at 10:58 AM PST:

A few comments:
I bought some NVIDIA GeForce 9400, the cheapest model (something like ~40 euros in Europe). It has PureVideo-3, which mean full hardware support for H.264.
Under linux, I can compile mplayer (a recent version, like something from SVN) with VDPAU support. You need to install proprietary NVIDIA drivers to make it work. Then it approximately uses ~5-10% CPU on 1080p h264 clips (I have a core 2 duo 1.6 GHz).
Use something like that
mplayer -vc ffh264vdpau -vo vdpau my_file.ts
Let’s not talk about the cryptic command-line interface, it’s another topic 😀
Under windows, there’s the DX-VA architecture to use hardware decoders, but I think Directshow is a mess and quite difficult to configure properly so I didn’t try under this OS. It’s supposed to work with media player classic though. It may be more difficult for VLC as they have a custom architecture.

So what’s interesting is that today, you can purchase cheap video cards, which require low power, is well supported both under linux and windows, and is able to decode any video using hardware (WMV/VC-1 is not fully done in hardware for NVIDIA but sufficiently so that real-time decoding is always achieved).
IMHO, software h264 decoding speed is not relevant anymore. Everybody does it better using hardware. ATI has full hw support, under windows at least (I’m not sure under Linux due to drivers..). And Intel will have soon full hardware support, using DX-VA for windows and VAAPI under linux (which is also implemented in mplayer).
As for Vegas, I believe (or wish) they’ll have hardware support in their decoders sooner rather than later, considering how important it is to have real-time decoding during edition.


clei wrote on July 27th, 2009 at 2:02 PM PST:

To get a nvidia 8800 card to play video seems like a crazy idea to me. Not only those high end cards are very expensive, they also use a lot of power, produce a lot of heat and therefor, noise. They are just designed for violent teenagers to play stupid games.
>
>
Nvidia cards are not only designed for stupid teenagers, but for people like Eugenia, who are too stupid to realize that the vast majority of Linux Users and other non-gamers don’t take people like her and their pre-cooked benchmark tests seriously, and have not for years.

There’s absolutely *NOTHING* wrong with Intel video chipsets for playing video files with.

The Toshiba A205-S500 laptop with a Intel Celeron 1.86 ghz processor w/i965 graphics chipset that I bought for $350.00 US play pretty much anything I throw at it while under Totem,VLC,mplayer and Gnome mplayer under fedora 10 and 11.


This is the admin speaking...
Eugenia wrote on July 27th, 2009 at 2:29 PM PST:

>To get a nvidia 8800 card to play video seems like a crazy idea to me.

No, it is not. Faster memory bandwidth and transfer rates between the gfx chipset and mainboard result in FASTER playback, because SIMPLY, the data arrive to their destination *faster*. It is NOT a matter of getting a card with the fastest 3D chip, but it’s a matter of getting a card with a FAST BUS. Understand the distinction please.

I have seen this time and again when upgrading from ATi to nVidia cards with fast buses, let alone when it comes to much slower (in terms of bandwidth/bus speeds, not 3D) Intel cards. My kernel engineer husband even agrees with this, so it’s not just “people like Eugenia”.

So hold up your bitter and INSULTING comment for yourself. You obviously have no freaking idea about video and common sense, and you have not tested this yourself. You are just talking out of your ass, just because Intel is the only gfx manufacturer that plays nice with Linux.

And who the fuck cares about Linux anyway? These results are about Windows. On Linux, the X architecture is the real limitation, so if you don’t see any advantage, you know who you should blame. But on Windows, the faster the gfx bus, and manufacturer drivers (as opposed to Microsoft’s generic drivers), **the faster the playback**. And that’s a freaking FACT, observed time and again.

Now, if you upgrade a generic 2D nvidia X driver to nVidia’s official drivers, you might also see a big speed boost, although I fear that X11 itself will limit how much gain you will get in apps that use xvideo. On Windows there are no such limitations, and so upgrading to official drivers, *and* by using a fast bus card, you can get up to 20% faster playback!

>and have not for years.

So, you are one of these Linux weenie haters of mine, from my OSNews days, huh? Explains your stupidity.


Luis wrote on July 27th, 2009 at 3:25 PM PST:

First, sorry if with my own comment I triggered the insulting ones.

>Faster memory bandwidth and transfer rates between the gfx chipset and mainboard result in FASTER playback.

I do not discuss that. I’m sure that if instead of having a 1080p video we had a 5320p one, the Intel GMA 500 would fail and the nVidia 8800 GT might survive. But when watching videos we don’t need FASTER playback, we need FAST ENOUGH playback. You certainly don’t want to play a video shot at 30fps at a higher frame rate.

So sure, a 20 meters truck would take your kids to school all right. You just don’t need it. If you have 400 kids, then yes, but if you have 2 it is overkill.

If you can play a 1080p h264 video on a 4.3 watt platform (and the test I linked to proves that, on the lowest end Intel card and Linux no less, and using 2.2% CPU), you don’t need a 400 watt one.


This is the admin speaking...
Eugenia wrote on July 27th, 2009 at 3:31 PM PST:

This is only true if your current PC can playback 1080p high-bitrate h.264 files in real time already. My 2.4 QuadCore would not always playback 5D files in real time with VLC (it was really depending on the scene and the calculations it had to do). And if you own a much less powerful computer (like in my case in the past a P4 3 Ghz), getting up to a 20% speed playback increase is not a bad idea (that’s how much I got when I upgraded my ATi to GeForce 8600GTS on my P4).

Besides, why overworking a PC? If a PC can barely do 30fps with a specific player (as I said, in my case the 2.4 Ghz QuadCore with VLC), there will be scenes/formats that will eventually struggle (depending on the complexity of the h.264 encoding used, e.g. if CABAC/HighProfile was used). So there’s always a reason to get better performance without upgrading the whole PC.

The problem with your comment is that you ASSUME that people own PCs that already can do real time playback in h.264 without a problem. I don’t assume that all. I am living proof that not all geeks have the latest and greatest PCs (my daily machine is my P4 3Ghz, a 5 year old DELL PC, and my video *editing-only* station is the 2.4 QuadCore PC). Therefore, I NEED every juice I get squeeze out of them when it comes to playback. And I can tell you right now, that most people are in the same boat, they don’t buy the latest and greatest every year. Therefore, if this article helps them get up to 20% faster h.264 playback, then it’s a win-win situation for them. It was for me.


Luis wrote on July 27th, 2009 at 6:24 PM PST:

I guess I’m not explaining myself correctly. I do not assume people own computers that can playback 1080p videos. That’s not my point. My point is that decoding a 1080p video does not require powerful hardware. What it does require is drivers and decoders that can use the hardware’s built in decoding engine (and obviously hardware that does have this engine built in). You don’t need a card with the fastest bus speed/bandwidth. Even the lowest end GMA 500 has the power to decode 1080p h264 video in real time as long as it’s using the right software (drivers and decoders). An nVidia 9400 (integrated or discrete) is easily 30 times faster, so that’s all you’ll ever need for video decoding.

That is, I agree with your article on the software side, but I disagree on the hardware one. I’d rather see you recommending the lowest end nVidia, since the results will be exactly the same at a fraction of the cost.


This is the admin speaking...
Eugenia wrote on July 27th, 2009 at 6:48 PM PST:

Oh, but I agree with you on that. 95% of the software decoders out there just SUCK. CoreAVC is making a name for themselves because they employ hackers who are willing to spend night and day optimizing and optimizing… it’s just that the reality of it is that most developers don’t do that.

And yes, a cheaper nvidia will do too, but NOT one with a slower bus. What matters is the bus speed and raw throughput here, not the latest 3D chipset.


Matt wrote on July 28th, 2009 at 2:08 PM PST:

I guess I don’t follow why the video needs such a high amount of bandwidth, can you go through that?

It seems to me like a 1080p video would require less than 250MB/s of bandwidth, which I think is comfortably less than even a cheap Intel card would provide (correct me if I’m wrong).

Also, for those who don’t want to buy CoreAVC, ffmpeg does have a multi-threaded option which shows speedups of around 2.5x on quad-cores.


Comments are closed as this blog post is now archived.

Lines, paragraphs break automatically. HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

The URI to TrackBack this blog entry is this. And here is the RSS 2.0 for comments on this post.