So what's really going on here is that the emulator must emulate not only the SNES hardware, but also the television. Video game emulators have had to deal with this for a long time, to varying and increasing levels of accuracy. Televisions (especially analog CRTs) have quite a bit of emergent behavior in processing the display input that is not easily captured and replicated by your typical frame buffer. Interlacing is a major such phenomenon; most emulators still simply treat the 60 fields per second as 60 distinct frames rather than interlacing them. (And younger players are used to seeing the games that way, never having played on original console and TV hardware.)
The ultimate example of this effect occurs in emulating games that originally used a vector CRT. An emulator writing to a raster frame buffer simply can't replicate the bright, sharp display of a real Asteroids or Star Castle or Battlezone machine.
TV behavior even goes beyond electronics. Consider the characteristics of the phosphor coating and the persistence time between refreshes. Some games made use of effects where that characteristic mattered, so if you want to emulate that with high fidelity, yes that will take a lot of CPU cycles.
If the console allowed sneaky things to be done on each raster line (like changing the colours) then constructing that frame buffer becomes considerably more resource intensive, as it must now probably be done line by line with the correct timing wrt. the rest of the emulation.
If you could pull tricks mid scanline (presumably through careful timing after an interrupt) then the problem will be a whole lot worse, though I'd guess it can be reduced by recording changes to the relevant hardware registers along with timestamps in the emulation so that the timing of your scanlines' construction becomes less of an issue.
I'll give you another example. On the Atari 2600 game console, the vertical sync is software controlled. The software is responsible for enabling the vertical sync pulse. This can be done 60 times per second as standard -- or you could play tricks with it. Suppose you strobe it at a different or even irregular rate. On an analog TV, the picture starts rolling vertically. That breaks way outside the sandbox of a framebuffer, with signal being displayed in overscan areas, and during the normally-blank retrace interval resulting in ghosting effects. (No commercial game did that, but it's been done in tech demos, and conceivably a horror game could do it intentionally for mood.) To produce that same behavior on framebuffer-based hardware, you need to emulate or at least approximate the workings of a TV's vertical sync logic, none of which appears in the console itself.
(I know this from experience, I wrote an Atari 2600 game: http://www.dos486.com/atari/ )
This would be possible in most cases, but the SNES throws another problem at you: the video renderer can set flags that can affect the operation of the CPU. Range/tile over sprite flags, H/Vblank signals, etc.
In my model, I chose to forgo timestamps because they are very tricky to get right with subtle details. Instead, I render one pixel at a time, but I use a cooperative threading model. Whenever the CPU reads something from the PPU, it checks to see if the CPU is currently not caught up with the PPU. If so, it will switch and run the PPU. The PPU does the same with respect to the CPU.
Even with that, all the extra overhead of being -able- to process one pixel at a time knocks the framerate from ~240fps to ~100fps. And it fixes maybe a half-dozen minor issues in games for all that overhead.
This is because scanline-based renderers are notoriously good at working around these issues. There are lots of games that do a mid-scanline write in error, but only a few that do more than one on the same scanline. So all you have to do is make sure you render your line on the correct 'side' of the write. We actually took every game we could find with this issue, and averaged out the best possible position within a line to run the highest number of games correctly. Other emulators take that further and can make changes to that timing on a per-game basis to fix even more issues.
Air Strike Patrol's shadow is actually the only known effect where two writes occur on the same line, and there is no one point that would render the line correctly.
In fact, every Atari 2600 game is a copper effect. The 2600's graphics chip is one-dimensional, working with only one scanline at a time. To display a picture, the software must run in lockstep as the electron beam traces down the screen, changing sprite bitmaps and colors and positions each scanline as appropriate. In other words, the 2600 literally uses the phosphor on the physical TV screen as the frame buffer. No surprise that this was tricky to emulate, and why 2600 emulators took longer to reach usable compatibility levels than emulators for the later more powerful Nintendo systems.
However there was another very useful trick to changing colours wrt sync. Basically you wanted to have all the gfx drawing done before the vertical retrace (which is quite a bit longer than the horizontal one), then flip the buffer (during) so you'd get a flickerless display at full framerate. Now if you'd change palette colour 0 (background, including screen edges) to red right after the flip, and then back to black again after your drawing routines are done and you begin waiting for the vsync again, you got to see the top of your screen's background red, up until some percentage of the screen height.
This was basically your performance meter. Code more complex routines and the red area becomes bigger. Add even more calculations, it gets to the bottom of the screen, and when it gets too far you won't be done calculating before the next vsync and your framerate drops to half.
Some times I even micro-optimized bits of assembly code by marking the position with pencil on a post-it on the side of the monitor to see if switching around some instructions would make it go up or down a few millimeters :) It really was that stable (given you did exactly the same calculations every frame--which is often the case for demos, but probably not for games). That is, until Windows came along: multitasking meant you were going to miss that vsync every once in a while and the red bar jumping up and down like crazy.
Performance is far easier to achieve there.
The devices aren't exactly expensive either.
Multi-core seems promising, but unfortunately even 4-8 threads aren't going to cut it here. Each emulated chip can have several logic units (eg a four-stage pipeline, an ALU, a DMA unit, etc.) And even then, CPUs aren't meant for this level of synchronization. You can only lock and unlock a mutex between two threads at about 100,000 times a second. And even if that were faster, what's going to be more a burden? Requiring a 3GHz single core CPU, or a 1GHz octa core CPU?
FPGAs are great for writing emulators (although I wouldn't say as easy), but the problem with this is even worse than the octa core CPU. Until more people have the hardware than a 3GHz single core CPU, it will continue to be a worse solution for the number of people your software can reach.
It's great to see you on here. I've read a lot of the writeups on your site and they're very, very fascinating stuff. BSNES/Accuracy has become my favorite emulator as well, when I can spare the clock cycles.
So thanks for being awesome, and doubly thanks for your attention to detail when nobody else seems to think its important.
Plus, many cartridges has supplementary chips, so the fpga would also have to include all the different chips used, and all these are proprietary as well.
See http://www.fpgaarcade.com/ for many examples of this.
It's no longer the 90s and people shouldn't even have to mention NESticle existed in an article unless they're that out of touch with trends of the last decade.
I never intended to convey that this is a new idea, sorry. I do believe my cooperative threading model is a new concept in emulation, but it's still a ridiculously old one in computer science.
> The NES scene in particular ...
... is not as rosy as it seems. Having recently written an NES emulator, I can tell you that they're far from completion. For just one example, all of those mapper chips are basically a big unknown. Those chips have ways of detecting scanline edges to simulate IRQs and split-screen effects. This is done by monitoring the bus for certain patterns from the cart-side. And the details of this stuff? Completely unknown. Not even Nestopia nor Nintendulator attempt to simulate this: they just have the PPU -tell- the mapper when a scanline edge is hit. I could be wrong, but I believe I'm the first to even attempt to have the mapper detect scanlines by monitoring the bus.
And we're talking chips that are dozens of times less complex in the worst case than some of the SNES coprocessors.
> It's no longer the 90s and people shouldn't even have to mention NESticle existed in an article unless they're that out of touch with trends of the last decade.
The important part of that article is that the SNES was (and largely still is) in the NESticle phase of development, which was the purpose of bsnes. Unfortunately, just as we go from NESticle (25-50MHz) to Nestopia (800MHz required); ZSNES (200MHz) to bsnes (2-3GHz) needs a huge jump. But this time, that jump is hitting the wall of where most computer users are at. While people didn't notice Nestopia because everyone has at least a 1GHz processor these days, bsnes was not so fortunate.
So the article was more about explaining what that level of overhead is required for.
this is also awesome, posted a while back: http://byuu.org/articles/emulation/snes-coprocessors
The Dolphin Wii emulator isn't perfect - it has the obscure bugs mentioned in the article - but unlike SNES emulators, it doesn't have a lot of game-specific hacks.
Though DLC is locked up as future consoles' content is likely to be.
Yes, this is probably a recipe for disaster, and I have little idea what mechanism could be used to ensure time accuracy, but just a thought. (Perhaps an RTOS?) I also wonder what would be possible with FPGAs, whether programmable logic might provide a better approach to emulating these chips in synchrony.
It doesn't take a 3000mhz machine to emulate a ricoh 5a22
Look how Nintendo has virtual consoles working on a 730 MHz power PC - the wii
The Wii isn't going to be around forever: those will start to fail in a number of years. In addition, not every single NES game is available on the Wii virtual console, anyway.