Problems I have found are different handling of implicit conversions of AMD and Nvidia. AMD converts implicitly (allowed in the spec) e.g. cos(2) works, while Nvidia does not do that, here you have to convert that yourself.
Further AMD converts vectors explicitly e.g.
uint 4 u; ...
float4 f; ...
I guess this is wanted behaviour though I would consider it a bug since it is (specifically) forbidden in the spec.
Just these two examples result in code that will work with AMD devices or Intel CPUs yet not on Nvidia GPUs and that is bad.
In fact if one knows that Nvidia does no implicit converting and that AMD ignores the spec at that example above one can easily not use these features. Though still that leaves a bad taste in the mouth and makes me wonder what else of the spec is not suppoted by whichever implementation or what is ignored.