Your thoughts are similar to what I have been pondering about...
The rewrite limit: 10k rewrites are unlikely to be hit within a reasonable time. If you'd play each day 3 different games it would take more than 4-10 years to hit he guaranteed minimum rewrites. Since I think that this is a highly unlikely scenario, I came to the conclusion that this doesn't matter much (to me). Even if a game would consist of multiple programs, it would still take plenty of play time to trash the chip. (If it was so much fun to play, 15$ would be sort of OK to pay once per year I'd think
)
About the boot loader: I have no idea how to work with it, but I have some ideas. I was thinking to have some code that takes an address of the SD card file and starts loading the content, not taking (much) file system code into the loader. One way to do it would be to have a "Loader" program that has FAT32 code to look through the SD card, showing ROM thumbnails and allowing to load a game.
When a game is selected to be loaded, the game's properties (FAT32 first cluster table entry) as well as the loader program's address is stored on EEPROM. It then loads the code from flash through the bootloader that only needs to handle loading files through the FAT cluster table, which should be reasonably simple to read. So to load the loader program from the game again, very little FAT32 code is required because all required cluster information is stored on the EEPROM. Note: If the ROM format of the files on the SD card has a size limit like 128kb and the cluster size of the FAT32 format is 4k, the 32 cluster addresses of the file entries could be just stored on EEPROM - so loading content would be really simple by just reading the eeprom. Especially if the file structure is aligned with FAT cluster sizes.
I researched a bit the topic of resetting and using the watchdog timeout it's possible. So I would think that the life cycle would be:
:: Loader => user selects a game => store cluster file data in eeprom => flash the ROM => restart. Game.
:: Game => user selects option to quit => load loader cluster from eeprom => flash the ROM => restart. Loader.
I am curious if that could work out this way...
Regarding an interpreter: I've considered this as well for quite some time. It would be great! After all, loading code dynamically from SD and executing it would drop lots of limits. However, I am worried about two things: RAM and performance. The interpreter would need some RAM that would not be available to the program. And interpreting code will take quite some extra cycles. The more since data needs to be loaded from SD continuously. Having played a bit with the tinyduino, I think using an interpreter would be fine for plenty of applications but games that require every CPU cycle to achieve 20fps+ in native code would probably not run at reasonable speed when executed partially via interpreter.
I still think that an interpreter would be cool for various things and I am intrigued of it. But I am afraid that it might turn out to be not usable for most games. With the render API I wrote where you can issue draw commands to draw textures / rectangles or circles, you could certainly make a simple game such as pong and maybe asteroids with interpreted code... but anything more complex would be quite difficult if not impossible.
Another reason why I'd prefer flashing is that loading textures from SD and blitting them to the tinyscreen is probably quite slow as well in itself.
I'd be interested to know if you see any problems with my considerations!