Adventures of Lolo 2, Ms. Pac-Man (Tengen), and Spelunker rely on 1 cycle NMI delay when $2002 bit 7 gets set inside vblank (if $2002 has not been read yet), in which $2002 bit 7 can be read as true
Balloon Fight relies on reading the nametables through $2007 to twinkle the stars in the background. (The code is at $D603.)
Bases Loaded II glitches after a pitch is thrown (screenshot) if writing $00 then $80 to $2000 during vertical blank does not cause an additional NMI
Battletoads requires fairly precise CPU and PPU timing and a fairly robust sprite zero implementation. It leaves rendering disabled for a number of scanlines into the visible frame to gain extra VRAM manipulation time and then enables it. If the timing is off so that the background image appears too high or too low at this point, a sprite zero hit will fail to trigger, hanging the game. This usually occurs immediately upon entering the first stage if the timing is off by enough, and might cause random hangs at other points otherwise.
Bee 52 needs accurate DMC timing and relies on $2002 bit 5 (sprite overflow) as well
Bill & Ted's Excellent Adventure and a few other MMC1 games depend on the mapper ignoring successive writes; see iNES Mapper 001 (the talk page for that page might be informative too). Bill & Ted… also turns off and re-enables rendering midframe to switch CHR banks (e.g. in the black border above dialog boxes).
Cobra Triangle and Ironsword rely on the dummy read for the sta $4000,X instruction to acknowledge pending APU IRQs.
Crystalis, Fantastic Adventures of Dizzy, Fire Hawk, and Super Off Road do mid-frame palette changes
Fire Hawk, Mig 29 Soviet Fighter, and Time Lord need accurate DMC timing because they abuse APU DMC IRQ to split the screen
Galaxian requires proper handling of bit 4 of the P register for /IRQ.
Huge Insect depends on obscure OAMADDR ($2003) behavior; see PPU registers.
Marble Madness switches CHR banks mid-scanline to draw text boxes (e.g. at the beginning of each level). Getting these to render correctly requires fairly precise timing.
Micro Machines requires correct values when reading PPU $2004 (OAMDATA) during rendering, and also relies on proper background color selection when rendering is disabled and the VRAM address points to the palette (see the "background palette hack" on the PPU palettes page).
Paperboy relies on the open bus behavior of controller reads and expects them to return exactly 0x40 or 0x41; see Standard controller.
Punch-Out!! requires fetching the 34th tile; otherwise, the ring will be glitched.
Puzznic and Reflect World (FDS) use unofficial opcode $89, which is a two-byte NOP on 6502 and BIT #imm on 65C02. (Puzznic tasvideos discussion) The instruction in Puzznic is 89 00; emulating $89 as a single-byte NOP will trigger a BRK that causes the screen to shake.
Slalom does a JSR while the stack pointer is 0, so that half of the return address ends up at $0100 and the other half at $01FF.
Super Mario Bros. is probably the hardest game to emulate among the most popular NROM games, which are generally the first targets against which an emulator author tests his or her work. It relies on JMP indirect, correct palette mirroring (otherwise the sky will be black; see PPU palettes), sprite 0 detection (otherwise the game will freeze on the title screen), the 1-byte delay when reading from CHR ROM through $2007 (see The PPUDATA read buffer), and proper behavior of the nametable selection bits of $2000 and $2006.[1] In addition, there are several bad dumps floating around, some of which were ripped from pirate multicarts whose cheat menus leave several key parameters in RAM.
Super Mario Bros. 3 relies on an interaction between the sprite priority bit and the OAM index to put power-ups behind blocks.
The Young Indiana Jones Chronicles accesses PPUDATA ($2007) during rendering to perform a glitchy y scroll (to make the screen shake when cannon balls hit the ground). See the notes on accessing $2007 during rendering on the skinny on NES scrolling page.