Besides, while the CPU cores are twice as large, they still account for a fairly paltry percentage of the whole chip. The GPU portion is several times larger than all CPU cores combined in any version.
More likely is that they simply want the two versions to be drop-in compatible (save software, of coarse), so they needed to have the same or very similar physical footprint and external memory bus / memory controller. Since the two Denver cores can issue up to 7 instructions with two of those being load/stores each, and the 4 32-bit cores can issue only 3 instructions with one of those being a load/store, they're actually pretty evenly matched in terms of potential of-chip-memory operations. Its likely that simply adding additional Denver cores would become memory starved without a significantly beefier memory interface.
There is a bit of strangeness in this architecture. It will be interesting to see how code executes before it hits the microcode cache.
Oh, and one reason that Denver is so powerful is because its instruction decoder/scheduler is 7 operations wide, compared to Cortex-A15's 3-wide (and Apple's Cyclone at 6-wide). This is a really powerful core, at least as far as ARM processors go. It will be interesting to see how much electrical power it draws though.