undefined | Better HN

0 pointslaurencerowe1d ago0 comments

i686 was the microarchitecture introduced with the Pentium Pro and then Pentium II.

0 comments

If I am correct, the Pentium Pro was the first "out of order" design. It specialized in 32-bit code, and did not handle 16-bit code very well.

The original Pentium I believe introduced a second pipeline that required a compiler to optimize for it to achieve maximum performance.

AMD actually made successful CPUs based on Berkeley RISC, similar to SPARC (they used register windows). The AMD K5 had this RISC CPU at its core. AMD bought NexGen and improved their RISC design for the K6 then Athlon.

twoodfin1d ago

Because of the branding change, history will remember the Pentium (P5). It was really the Pentium Pro (P6) that put Intel leaps ahead on x86 microarchitecture, a lead they’d hold with only a few minor stumbles for two decades.

Bob Colwell (mentioned elsewhere ITT) wrote a fascinating technical history of the P6: The Pentium Chronicles.

consp1d ago

The major stumble being having to cross licence AMD for the x64 opcode design thus ensuring at least two players in the field (and due to how it's going only two).

1 more reply

squater1d ago

Small correction, Pentium Pro was the first OoO microprocessor from Intel. Others like IBM POWER1 came earlier

phire11h ago

I'm really not sure if POWER1 and PowerPC 603 should be counted as OoO or not.

It's certainly not the same kind of OoO. They had register renaming¹, But only enough storage for a few renamed registers. And they didn't have any kind of scheduler.

The lack of a scheduler meant execution units still executed all instructions in program order. The only way you could get out-of-order execution is when instructions went down different pipelines. A floating point instruction could finish execution before a previous integer instruction even started, but you could never execute two floating point instructions Out-of-Order. Or two memory instructions, or two integer instructions.

While the Pentium Pro had a full scheduler. Any instruction within the 40 μop reorder buffer could theoretically execute in any order, depending on when their dependencies were available.

Even on the later PowerPCs (like the 604) that could reorder instructions within an execution unit, the scheduling was still very limited. There was only a two entry reservation station in front of each execution unit, and it would pick whichever one was ready (and oldest). One entry could hold a blocked instruction for quite a while many later instructions passed it through the second entry.

And this two-entry reservation station scheme didn't even seem to work. The laster PowerPC 750 (aka G3) and 7400 (aka G4) went back to singe entry reservation stations on every execution unit except for the load-store units (which stuck with two entries).

It's not until the PowerPC 970 (aka G5) that we see a PowerPC design with substantial reordering capabilities.

¹ well on the PowerPC 603, only the FPU had register naming, but the POWER1 and all later PowerPCs had integer register renaming

p_l1d ago

It was intel's (at least) second OoO processor, after i960 - from which it pulled important team members.

1 more reply

jabl1d ago

OoO is a surprisingly old idea, first used in the IBM System/360 Model 91 released all the way back in 1966.

https://en.wikipedia.org/wiki/Tomasulo's_algorithm

Took a while until transistor budgets allowed it to be implemented in consumer microprocessors.

1 more reply

chasil1d ago

Very true, Bob Colwell was hired with past experience in this, I think from "Cyndrome" (edit: Multiflow).

https://news.ycombinator.com/item?id=38459128

dspillett1d ago

> The original Pentium I believe introduced a second pipeline that required a compiler to optimize for it to achieve maximum performance.

It wasn't a full pipeline, but large parts of the integer ALU and related circuitry were duplicated so that complex (time-consuming) instructions like multiply could directly follow each other without causing a pipeline bubble. Things were still essentially executed entirely in-order but the second MUL (or similar) could start before the first was complete, if it didn't depend upon the result of the first, and the Pentium line had a deeper pipeline than previous Intel chips to take most advantage of this.

The compiler optimisations, and similar manual code changes with the compiler wasn't bright enough, were to reduce the occurrence of instructions depending on the results of the instructions before, which would make the pipeline bubble come back as the subsequent instructions couldn't be started until the current one was complete. This was also a time when branch prediction became a major concern, and further compiler optimisations (and manual coding tricks) were used to help here too, because aborting a deep pipeline because of a branch (or just stalling the pipeline at the conditional branch point until the decision is made) causes quite a performance cost.

Tuna-Fish16h ago

The Pentium was not just pipelined but also superscalar; it had two pipelines (U and V). U implemented all instructions, V only implemented a subset of simpler ones, and only when using simple (prefix-less) encodings.

As the CPU was not out of order, to execute two instructions per clock you had to pair them so that the second one was simple, and did not use the output of the first one. Existing code and most compilers around at the time were generally bad at this, but things like inner render loops in games could make a lot of use if you wrote them in assembly.

NetMageSCW23h ago

That reminds of using the pencil method on an Athlon to overclock.

antod17h ago

Didn't the Celeron 333 (easily overclockable to 450) also have a similar pencil short hack to enable SMP in a dual slot mb?

j / k navigate · click thread line to collapse

0 comments

chasil1d ago

If I am correct, the Pentium Pro was the first "out of order" design. It specialized in 32-bit code, and did not handle 16-bit code very well.

The original Pentium I believe introduced a second pipeline that required a compiler to optimize for it to achieve maximum performance.

twoodfin1d ago

Bob Colwell (mentioned elsewhere ITT) wrote a fascinating technical history of the P6: The Pentium Chronicles.

consp1d ago

The major stumble being having to cross licence AMD for the x64 opcode design thus ensuring at least two players in the field (and due to how it's going only two).

1 more reply

squater1d ago

Small correction, Pentium Pro was the first OoO microprocessor from Intel. Others like IBM POWER1 came earlier

phire11h ago

I'm really not sure if POWER1 and PowerPC 603 should be counted as OoO or not.

It's certainly not the same kind of OoO. They had register renaming¹, But only enough storage for a few renamed registers. And they didn't have any kind of scheduler.

While the Pentium Pro had a full scheduler. Any instruction within the 40 μop reorder buffer could theoretically execute in any order, depending on when their dependencies were available.

It's not until the PowerPC 970 (aka G5) that we see a PowerPC design with substantial reordering capabilities.

¹ well on the PowerPC 603, only the FPU had register naming, but the POWER1 and all later PowerPCs had integer register renaming

p_l1d ago

It was intel's (at least) second OoO processor, after i960 - from which it pulled important team members.

1 more reply

jabl1d ago

OoO is a surprisingly old idea, first used in the IBM System/360 Model 91 released all the way back in 1966.

https://en.wikipedia.org/wiki/Tomasulo's_algorithm

Took a while until transistor budgets allowed it to be implemented in consumer microprocessors.

1 more reply

chasil1d ago

Very true, Bob Colwell was hired with past experience in this, I think from "Cyndrome" (edit: Multiflow).

https://news.ycombinator.com/item?id=38459128

dspillett1d ago

> The original Pentium I believe introduced a second pipeline that required a compiler to optimize for it to achieve maximum performance.

Tuna-Fish16h ago

NetMageSCW23h ago

That reminds of using the pencil method on an Athlon to overclock.

antod17h ago

Didn't the Celeron 333 (easily overclockable to 450) also have a similar pencil short hack to enable SMP in a dual slot mb?

j / k navigate · click thread line to collapse