ATI makes use of the DRI. Their OpenGL libraries thus work with the existing Mesa GL stuff -- they are just drivers including the hardware-specific commands and optimizations that are executed by the Mesa GL API. The kernel module includes the kernel-componant DRI and framebuffer bits that all DRI-based drivers need. In short, the secret sauce is really just the hardware-specific components and the internal accelerated GL implementation, plus some extras like power management. As a result, they are limited by the Xorg/kernel/DRI/Mesa interfaces, and cannot offer features or performance fixes that aren't also possible in the Open Source drivers (which are pushing the new interfaces like DRI2 to fix performance problems).
The NVIDIA drivers do not use Mesa, nor do they use DRI or XAA/EXA. Their libGL.so.1 completely replaces Mesa and directly talks to a proprietary interface exported by the kernel blob. The X driver blob likewise plugs in as an X protocol handler and talks over the proprietary interface (and NVIDIA's libGL) to the kernel. Their secret sauce is pretty much _everything_ in the stack above the X protocol and the GL API. Since they define all of their own internal protocols they have never been limited by the design flaws of the Open Source stack and hence have been capable getting better performance and more features (e.g., they had accelerated indirect rendering long before AIGLX or Xglx ever came into being). NVIDIA did what they did partly because it meant reusing more of their existing code instead of having to code Linux/UNIX-specific code for DRI/Mesa/XAA integration, and also because they knew that those interfaces kinda sucked compared to what they already had and would impose ugly limitations.
The reason these binary blobs break so often is that both the kernel and Xorg have internal APIs for basic things (memory management in the kernel) and domain-specific things (hooking into the X protocol handlers) which can and do change to incorporate performance improvements or bug fixes, and the binary blobs cannot be updated by the kernel/Xorg developers at the same time as they change the interface. You thus end up waiting for ATI/NVIDIA to release new blobs that incorporate the interface changes.