The loops are ALREADY vectorized. This is about vectorizing code that is NOT in loops.
The essential ideas behind this have been working for a while now. The essential bottleneck has been a good cost model. The problem is that to aggregate random more-or-less uncorrelated operations into a vector operation requires a fair bit of marshaling to load the data into vectors, then to unpack it at the end. Frequently the cost of these marshaling operations is higher than the time save by doing a whole of adds or multiplies or whatever as a vector op. Hence the need for an accurate cost model which is neither too optimistic about costs (so that you frequently vectorize when you shouldn't) nor too pessimistic (so you miss out on vectorizing when it would be a win).