On my skylake notebook I get improvement for matmul when vectorizer is disabled (i.e. -fno-tree-vectorize).
Clang 5 does not vectorize the benchmark....