HOW THE EXPLOIT WORKS
Too long or corrupted strings (01Ah 2 Nickname length in characters 050h 2 Message length in characters) in the NVRAM DS user settings cause a crash when going to System Settings->Other Settings->Profile->Nintendo DS Profile. The reason that the 3DS crashes in this case is that the string that has been corrupted and overflowed is apparent in the stack causing it to crash. The same string then is used to manipulate the stack smash and create a ROP chain loaded from the NVRAM. There is a function already present in memory which loads a file from SYS:\Launcher.dat. This can be manipulated to instead load from YS:\Launcher.dat, mounting the SD as "YS:". This loading from the SD card allows this ROP chain to continue but from the data inside "Launcher.dat".
GETTING RAM DUMPS
In order to continue development of this exploit, we had to find a kernel exploit, we already had the user land exploit(DS Profile strings). This implies we would need a RAM dump or a form of dumping RAM. Unfortunately, members of the 3DS hacking community who had already achieved this refused to give up RAM dumps for various reasons. We had to discover or implement a way to dump our memory to the SD card.
After a bit of brainstorming, we decided to change the permissions of the IOpen_File function to write/create(0x06) instead of the original read permissions(0x01). When set to write/create, the 3DS would create the file we were attempting to dump if it didn't exist already. This meant we had a form of writing data to the SD card. Unfortunately this was also inefficent due to the restrictions of the _this(variable that holds info on the file used in alter functions such as read and write) for the IFileOpen command as well as the restrictions in the FAT32 filesystem. Because of these limitations, we were only capable of dumping 0x160 bytes at a time, 22 times before a reformat of the SD card was needed. Using the Unix dd command and a custom python script that analyzes FAT tables and looks for names not preassigned, we were able to view the raw data written to these filenames and give us very small pieces of RAM. Each of these dumps could then be stitched together to form a larger area of mem. The only problem being that strings are null terminated so any reoccurring 0's in these dumps could result in a fault dump. Do fix this we XOR-ed against 0x11(based off of the assumption that 0x11 wouldnt occur often) and would re-XOR it in the python script.
GETTING "FULL" RAM DUMPS
While this was useful for finding various ROP gadgets, we had to find a more efficient manner of writing data. And that is where the search for IFile_Write began. With this function, instead of simply creating filenames with dumped memory, we could write all of memory to a single file with no limitations.
Seeing as how IFile_Open was located at 0x001b82ac and IFile_Read at 0x001b3958, it can be reasonably assumed that IFile_Write is potentially located in 0x001bXXXX. After having exhausted just about every other option, we decided the best choice of action for the time being was to brute force this area. With 0xFFFF bytes in this section, and dumping at 15 iterations for approx. 4kb per 15 iterations, it would take 16 passes(of 15 iterations) to dump all memory in 0x001bXXXX. Seeing as how one pass takes approximately 14 minutes, this would potentially take up to 4 hours to brute force(assuming only 1 tester). Lucky for us, IFile_Write was found within the first 4 passes.
ARM9 CODE EXECUTION
With full ram dumps we could npw use this to figure out how Gateway and others achieved custom code execution. With these "full" RAM dumps(we could only technically dump RAM from userland accessible areas) we could modify them and load them in QEMU for analyzation. Seeing as how our Launcher.dat is being loaded into memory at 0x002B0000, we can simply replace that with the publically released "decrypted"(only the first stage was decrypted) version of the Gateway Launcher and run that in a controlled environment where we can view each register. Another user, previously known for his work in the PS3 scene, provided us with further decrypted versions of the Gateway Launcher. Once we could decipher where the Gateway custom code payload was stored in the Launcher(and removed decryption from the ROP payload) we had custom code execution! After having ARM9 code exec we had to find the addresses for the displays' framebuffers in memory. This was achieved relativily quickly and allowed us to output information to the user. We had tried various ways to allow dumping of RAM(now that we had access to more mem than in userland). We had tried flashing binary to two photoresistors taped to the LCD giving us a VERY slow form of communication. We had tried simply dumping mem to characters on screen and going to the next address on the press of the "A" button(which had worked as well, just slowly).
Now that custom code could be executed on the ARM9 processor, we had access to virtually all of the hardware of the 3DS(other than direct access to the GPU and other various, unimportant things at the time). Lucky for us, 3DBrew had already documented various i2c devices and registers. After a bit of testing and help from another user we had a working serial communication from the infrared IC on the 3DS to my computer. This allowed dumping of memory over serial at 115200 baud as well as sending binaries to the 3DS over serial. This new form of communication between us and the 3DS allowed for extremely fast prototyping as well as the RAM dumps we desired.
ARM11 CODE EXECUTION
Because of our desire for proper floating point calculations, and honestly just because, we had decided to try and get ARM11 code execution(I believe we also tried to get this before the coming weekend at the time for the potential to make a homebrew 3DS game for a local game jam). We ended up actually achieving ARM11 code exec before that deadline, while not too pretty. This was achieved by overwriting the IRQ vector tables and writing code to handle IRQs and jump to our code. We overwrote the IRQ vector table, which is a list of addresses which contain the information on which routine gets executed when the corresponding interrupt is recognizes.Essentially, this is how our ARM11 code exec works
Setup our ARM11 IRQ stub Flush Cache Setup Our ARM11 IRQ handler Flush Wait for an ARM11 response success!!
Our ARM11 stub has a generic IRQ handler initially. We load our magic into memory and if our magic isn't found then we handle the IRQ normally. Then the cache is cleaned and we check which ARM11 core we're on. We check for our magic at a certain address depending on which core we're on and then check if the magic word matches for that core. If so we do the wait for sync between ARM9 and ARM11 and get the address to jump from ARM9 to ARM11. Currently we only jump if we are on core 0 because we don't have a good RTOS setup for multi-core processing at the moment.