Nevertheless, the integration of multiple cores into an Intel multiprocessor was very inefficient before Nehalem (i.e. the cores were competing for a shared bus, which prevented them from ever reaching their maximum aggregate throughput, unlike in the AMD multiprocessors, which had inherited the DEC Alpha structure, with separate memory links and peripheral interfaces and with an interconnection network between cores, like all CPUs use now).
However this was noticeable at that time mostly in the server CPUs and much less in the consumer CPUs, as there were few multithreaded applications.
Core 2 still lagged behind AMD's cores for various less mainstream applications, like computations with big integers.
Only 2 generations later, after Core 2 and Penryn, with Nehalem (the first SKU at the end of 2008, but the important SKUs in 2009) Intel has become able to either match or exceed AMD's cores in all applications.
On the other hand, AMD's 90 nm CMOS process has been excellent.
With its 65 nm process, Intel has recovered its technological leadership, but that was not the most important factor of success, because AMD's 65 nm process was also OK and it became available within a few months of Intel's process.
AMD has lost because they did not execute well the design process for their "Barcelona" new generation of CPUs (made also in 65 nm, like Core 2). While Intel has succeeded to deliver Core 2 even earlier than their normal cadence for new CPU generations, AMD has launched Barcelona only after several months of delays and even then it was buggy. The bugs required microcode workarounds that made Barcelona slow in comparison with Core 2, and that started the decline of AMD, after a few years of huge superiority over Intel.
AMD was struggling to release CPUs that were competitive against year old Intel Core 2 Duos which remained the status quo through their Bulldozer architecture. Things started turning around with Ryzen when a combination of architecture improvements and typical workloads taking more advantage of multicore flipped the script.
The bits about "true" multicore are also sketchy considering Bulldozer was using shared L2, fetch/decode, and floating point hardware on each module and calling a module two "cores" for marketing purposes.
https://www.anandtech.com/show/4955/the-bulldozer-review-amd...
So perhaps a bit more than a couple of years, but my impression is also that they fell behind on (single-thread) performance for a long time after that.
I've also understood that in more ancient history AMD CPUs sometimes beat contemporary Intel parts in performance, although releasing their parts later than Intel. I'm not sure that's relevant to any remotely recent developments anymore though.