IIRC, the framebuffer that drives your displays must have a linear memory layout, because the output needs to read it in a linear fashion. Otherwise the output would wildly jump between different memory locations and eat too much memory bandwidth.
An offscreen buffer can be in any memory layout the driver desires, i.e. swizzled. Rendering to a swizzled buffer is faster.
For applications that render a lot, it may well be more efficient to render to a swizzled buffer all the way, then do a single additional copy to the linear buffer.