Alexey Shalpegin
Embedded Security Expert
Hi there, it’s Positive Labs! Here we perform different kinds of embedded R&D. Regretfully, most of the cool stuff is not allowed to be shared due to the NDA, but I still want to introduce you to the awesome adventurous spirit of our lab. It happens that my current job is pretty similar to my long-time hobby, the security research of game consoles. So it’s high time to continue my Xbox360 trilogy[1][2][3] (Rodrigo Copetty has a really good summary of it here). Yup, today’s topic is how the most used Xbox 360 modding method, RGH3, was born.
The origins
A long time ago, in the year 2021, I was sure that the Xbox 360 era was over. After almost 13 years in the scene, it was time for me to let it go. So I sold the rest of my console parts, abandoned the related projects and stopped visiting forums. But Providence had other plans for me, involving several months of sleepless nights.
It all started from a simple spark question: my friend asked me to help him find something that can play video games. His PC was really weak and he had no money for an upgrade. My first thought was: “Xbox 360 would be a solution! It’s pretty cheap but can play even AAA games like GTA 5.” On the next day, we bought a used Xbox360 for around $30.

The quest is done, the console is prepped, flashed, and ready to go. That could have been the end of the story, but nostalgia got the better of me. And eBay ads offered very tempting deals.
This is what happened:

A guy was selling two broken Xbox 360 motherboards for peanuts! They were easily fixable, so I offered my help, showed him how to fix them, taught him the basics, and (without even realizing it) came back to the modding scene.
Further events came like a thunderstorm. On a modding Discord channel, someone uploaded detailed Xbox 360 schematics for modern Slim console revisions, with the CPU_PLL_BYPASS pad highlighted by Josh Davidson (Octal450), which was a big surprise.

Despite the lack of the real accessible pads for the CPU_PLL_BYPASS, it was still possible to solder to the via hole:

To understand the importance of this discovery, I need to explain the basics of the Reset Glitch Hack (RGH) for Xbox 360. It’s described in detail in the third part of my trilogy, but I’ll give it a brief overview here.
A brief introduction into RGH
RGH (as well as other glitch attacks) takes advantage of unintentionally incorrect CPU behavior under certain conditions. By such conditions, I mean anything outside the recommended safe ranges, like too low/too high voltage, or pulses on the clock line.
In this particular case, it is a very short pulse on the CPU_RESET line. In normal situations, asserting this line results in a full reset of all internal subsystems of the CPU. But as a researcher named GliGli found out, if the line is asserted for a very short time, the CPU doesn’t restart, but instead may execute some of its instructions in a wrong way.
To take advantage of this unintended behavior, it is necessary to send the reset pulse at a very specific moment: when the signature comparison is taking place. The better you calculate the required moment, the better the chances for a successful glitch. To increase the precision and success rate, the CPU is slowed down, and that’s where CPU_PLL_BYPASS comes in. Asserting this line disables the internal clock multiplication and effectively reduces the CPU speed by a factor of 128. This was used in the very first RGH implementation to bypass the hash comparison (though only for the Fat revisions of the console):

Until recently, it was assumed that Slim revisions didn’t have CPU_PLL_BYPASS, and researchers used the so called i2c slowdown (RGH2 method) instead. This way, the CPU reference clock was adjusted by controlling the clock generator inside the HANA chip. Unlike the CPU_PLL_BYPASS, this type of slowdown was weak (with an approximate factor of 3.14) and thus not as reliable. Unfortunately, Fat revisions joined the RGH2 team after yet another system update that introduced the “Slim-like” split bootloader scheme:

The RGH2 method was the only Xbox 360 modding solution for almost 6 years. It was unreliable and slow, you had to wait several minutes for a successful boot! However, there were some attempts to increase the precision by using a faster FPGA clock (the 150MHz one) but usually it wasn’t enough for a flawless experience:

However, the Xbox 360 scene wasn’t dead: Team Xecuter kept creating new modchips and new ideas. The R-JTAG idea used a glitch pulse to bypass the fuse check and thus load the blacklisted bootloader vulnerable to the JTAG SMC hack. The CR4 XL project used the famous CPU_PLL_BYPASS, which brought reliability back to the Fat modding. Sadly, they couldn’t compete with the Chinese X360ace modchips due to the high price:

Finally, two things happened almost at the same time: I had plenty of free time to work on Fats, and Dr.Schottky has started his own CR4-like project. A few months later, two reliable modding methods were born: RGH 1.2 and R-JTOP. Dr.Schottky’s R-JTOP method reimplemented the R-JTAG idea of bypassing the fuse check using CPU_PLL_BYPASS, while my RGH 1.2 used a combination of RGH2 software along with CPU_PLL_BYPASS slowdown.
“How did you get there?” you might ask. Well, the main question bothering me was “Why did they stop using CPU_PLL_BYPASS? Was it completely blocked?” The lack of any published details made me test real hardware. The tests confirmed that the slowdown worked as expected, with the same factor of 128. The only issue was the long waiting time: the slowdown is applied long before the check, which increases the waiting time enormously. But hey, we don’t really need the slowdown that early. Let’s postpone it for a dozen or two milliseconds so it kicks in just before the check!
The final RGH 1.2 algorithm was:
- Wait for the POST 0xD8
- Wait for a certain amount of time before the slowdown
- Assert CPU_PLL_BYPASS and slow down the CPU
- Wait for the POST 0xDA
- Wait for another specific time before the glitch pulse (“timing”)
- Apply a short pulse to the RST line
It took me a few days to find the right values, but it finally worked:

I handed over the project with all the sources to Josh (Octal450), he is maintaining it now. Besides, he is responsible for:
- the updated RGH 1.2 for all kinds of modchips
- the CPU_EXT_CLK version (using a different line for slowdown) for Xenon and Zephyr revisions where CPU_PLL_BYPASS fails
- J-Runner-With-Extras
And then he got the schematics with CPU_PLL_BYPASS for Slim revisions! Of course, he immediately started porting the RGH 1.2 to them. He soon found out that the slowdown factor of Slim boards was 640 instead of 128, which promised great reliability, but also made the initial timing search much worse. To help him out, I created a special bootloader that repeated the same signature check in a cycle to avoid waiting for console reboots and bruteforce the timing faster. As expected, the results were amazing – instant boot on Slim revisions:

Then it struck me: since we have such a perfect slowdown, can we get rid of the modchip completely? Xbox 360 has a separate reprogrammable MCU inside the southbridge called SMC, which even has its own access to the CPU_RST line—let’s use it!
At that point, I bought all kinds of Xbox 360 motherboards, took my logic analyzers and other stuff, and went on another console research adventure.
What the … SMC?
Like many other complex systems, Xbox 360 has a separate chip that handles various tasks such as:
- Processing buttons (power, disk tray, gamepad sync, and more)
- Power control (CPU, GPU)
- Video interface communication & configuration (AV, HDMI)
- Temperature & fan control
SMC is always on, consumes very little power, and it has its own firmware!

Of course, the firmware is encrypted, but the algorithm is already well known. There is no write protection or signature check, so it’s easily modifiable:
def decrypt_smc(data): key = [0x42, 0x75, 0x4e, 0x79] res = bytearray() for i in range(len(data)): j = data[i] mod = j * 0xFB res.append(j ^ (key[i & 3] & 0xFF)) key[(i + 1) & 3] += mod key[(i + 2) & 3] += mod >> 8 return bytes(res)
It’s worth noting that this algorithm can be found in the the Xbox 360 hypervisor:

The firmware itself is stored in the flash memory accessible via the SPI interface:

Two additional signals (“SMC reset” and “SMC DBG_EN”) are required to enter the special NAND reflash mode:


In the XDK version, the headers used to reflash the NAND are connected directly to the “Sidecar”. This allows you to restore the console even if the system is completely corrupted:

We’ll come back to this at the very end, but for now, let’s dive into the secrets of SMC firmware!
Hello, world!
Once loaded into any disassembler, the decrypted SMC firmware doesn’t look good. The very beginning contains weird data interpreted as instructions:

Looks like the first 4 bytes are intentionally filled with random data to make the encrypted firmware of each device unique. The “real” first 4 bytes are moved to the very end of the firmware code:

With the real data placed at the beginning, you can finally see a proper reset vector table for the 8051 architecture:

One mystery solved, many more to come. How to add custom functionality to the SMC? How to control something there? How to build code for this microcontroller at all?
Some information can be retrieved from two other projects: SMC JTAG Hack and CR4 XL (both ready-to-use SMC binaries are available in the J-Runner tool). It is known that the first project uses GPIOs to control the GPU with JTAG interface:

The SMC for the CR4 XL project uses a single GPIO to read the required slowdown state and sends corresponding I2C commands in response:

If you compare it with the leaked schematics, it’s easy to see that the 8051’s special function register bits perfectly align with GPIO ports receiving the corresponding signals:

Summarizing all detected GPIO-related registers:
- SFR_80,90,A0,C0,C8 = GPIO_P0..P4 – GPIO values
- SFR_A2..A6 = GPIO_P0..P4_DIR – GPIO direction
- SFR_9D..9F,A1,A7 = GPIO_P0..P4_OD – GPIO open-drain mode configuration
With all this information, it’s already possible to bit-bang something to the external pins of the SMC. But before we can do it, we need to compile the code. It’s hard to integrate the standalone C project into the existing firmware binary, so let’s use raw assembly!
Here is a very simple code to send the short pulse to the SMC Port 0, GPIO 5 (031E0h is the beginning of the free space in the SMC firmware for the Corona revision):
DBG_PIN equ 080h.5org 031E0h setb DBG_PIN clr DBG_PIN retend
Looks pretty minimalistic? I know! Fits into just 5 bytes when compiled.
Now it’s necessary to insert this code into the firmware somehow. It’s better to call the custom code from the main execution cycle. Typically, in such MCUs, the logic is split into parts implemented as FSMs (finite state machines) that are executed sequentially within the same main cycle:

The small cycle contains tasks that require immediate processing (like I2C stuff). Conversely, the big cycle contains periodic tasks with some kind of timeouts (like button press counters, LED blinkers, or various power events).
I hope nothing breaks if the debug LED blinking task is replaced with our custom procedure:
import structdef patch(data, offset, new): return data[:offset] + new + data[offset + len(new):]def lcall(offset): return b"\x12" + struct.pack(">H", offset)CODE_START = 0x31E0smc = open("smc_corona.bin", "rb").read()code = open("build.bin", "rb").read()[CODE_START:]smc = patch(smc, CODE_START, code)smc = patch(smc, 0x829, lcall(CODE_START))smc = patch(smc, 0x256B, b'\x90') # P0.5 OUT diropen("new_smc.bin", "wb").write(smc)
The result of the modification is easily observable with any logic analyzer: the unused DB3R4 pad (corresponding to the GPIO port 80h.5) now produces pulses every 20 ms:

Okay, such debugging is better than nothing, but some kind of UART would be much easier to use. Wait, there are SMC UART pins:

To use UART, you usually have to configure it, then write an 8-bit value into some hardware register. Okay, let’s write values into all 8051 SFRs and see what happens on these UART pads. (This was a pretty dangerous operation: during that bruteforcing process, I accidentally triggered power lines, a watchdog, and other funky animals). It seems that writing into the 0E7h register causes these weird pulses on the UART Tx line:

Looks suspicious, but the data pattern was always the same no matter what I wrote there. However, everything changed when I tried to bruteforce neighbor registers. To output the data correctly, I had to enable the UART (SFR 0E8h = 0C0h) and configure its speed (SFR 0E9h = 0FFh, maximum speed 1.5 Mbps):

Registers 0E8h and 0E9h have status bits where you can find out whether the byte was sent or not, but for simplicity I just used a fine-tuned delay to output multiple symbols in a row:
put_uart: mov 0E7h, A mov A, #09hloop_uart: djnz 0E0h, loop_uart ret
Now, it’s possible to print any text:

I2C
According to my calculations, the CPU_PLL_BYPASS slowdown was not enough for the reliable glitch, since the SMC is pretty slow (clocked at 48 MHz, but it takes at least 4 cycles to do the reset pulse). This means we’re going for a combination of the CPU_PLL_BYPASS and RGH2-like I2C slowdown of the reference CPU clock coming from KSB/HANA:

As always, there is no public information on exactly how the i2c slowdown happens. I was able to capture several values used in different modchips like CR3 and CR4 XL:
- [CF] = 0x40’44’AC’C0 – Corona, 100 MHz, default
- [CF] = 0x08’44’40’40 – Corona, 33.3 MHz, slowdown (CR3 Pro / CR4)
- [CF] = 0x08’54’54’30 – Corona, 25 MHz, slowdown (RGH2)
- [CD] = 0x02’0C’80’4E – Falcon/Jasper/Trinity, 100 MHz, default
- [CD] = 0x03’80’08’4E – Falcon/Jasper/Trinity, 31.5 MHz, slowdown
What do those values mean? What other frequencies can be set? I found some discussion regarding unstable low frequencies—is it even true? Let’s find out!
For such experiments, it’s necessary to measure the real reference frequency between HANA and the CPU. At the time of my research, I only had DSCope U2P20, which is not really suitable for such a task:

The default reference clock is a low-voltage 100 MHz differential pair, but DSCope can only handle up to 50 MHz at best. (Now, at Positive Labs, I have a much better scope, but years ago this was the best I could find). To properly measure the required clock, I had to either find better equipment, or somehow divide the clock to make it measurable.
Well, the quite common Xbox 360 modchip “Matrix Glitcher” is based on a pretty fast programmable CPLD (Xilinx Coolrunner II), and I had plenty of them:

By the way, notice the missing chip mark there. It wasn’t erased as you might expect; there’s a whole story behind it. The original design by GliGli used the PFN-44 version of these CPLDs, just like in the picture. But as soon as everyone started buying these modchips, they were out of stock. Sure, there were XC2C64A in other packages:



But they were not as popular as the original PFN-44 one (due to compatibility issues, in particular). You know what the Chinese manufacturers did? Yup, they just hid the real BGA-56 chip under the double-sided tape, then put a fake plastic PFN-44 thingy on top!


This is what my modchip-based clock divider looks like without the fake placeholder:

Despite the lack of support for differential pairs, I was able to reliably capture the reference clock with this CPLD. It looks like the HANA clock is not a real differential pair and is made of two push-pull lines. This VHDL code divides the reference clock perfectly by a factor of 10:
process(PIN_B) is begin if PIN_B'event then divcnt <= divcnt + 1; if divcnt = 9 thentemp <= not temp;divcnt <= 0; end if; PIN_D <= temp; end if; end process;

Well, at the top 100 MHz, I got a jugged output of about 9.31 MHz. But at lower frequencies, it’s all fine:

Zephyr, Falcon, Jasper, Trinity
In most Xbox 360 revisions, the HANA clock generator works the same way, so it makes sense to start the analysis there. The RGH2 code uses register 0xCD for the slowdown, so let’s start with some value, then change each bit while measuring the resulting clock output with the scope. The I2C interface can be easily accessed using the pyFTDI project:

from pyftdi.i2c import I2cControllerimport structi2c = I2cController()i2c.configure('ftdi://ftdi:2232h/1')slave = i2c.get_port(0x70)def read_cd(): (_, result) = struct.unpack("<BI", slave.exchange([0xCD], 5)) return resultdef write_cd(val): slave.write_to(0xCD, struct.pack("<BI", 4, val))
And here’s a table with various register value tests:
# xxxxxx 010 000110 00000100001001110 - 12.0 MHz# xxxxxx 100 000110 00000100001001110 - 7.17 MHz# xxxxxx 111 000110 00000100001001110 - 4.48 MHz# xxxxxx 110 100110 00000100001001110 - 0.92 MHz# xxxxxx 110 010110 00000100001001110 - 1.56 MHz# xxxxxx 110 001110 00000100001001110 - 2.40 MHz# xxxxxx 110 000010 00000100001001110 - 12.0 MHz# xxxxxx 110 000100 00000100001001110 - 7.17 MHz# xxxxxx 110 000111 00000100001001110 - 4.48 MHz# xxxxxx 110 000110 10000100001001110 - ??? MHz# xxxxxx 110 000110 01000100001001110 - 75.6 MHz# xxxxxx 110 000110 001001000 01001110 - 40.4 MHz# xxxxxx 110 000110 000101000 01001110 - 28.8 MHz# xxxxxx 110 000110 000011000 01001110 - 13.9 MHz# xxxxxx 110 000110 000000000 01001110 - ??? MHz# xxxxxx 110 000110 000001100 01001110 - 7.33 MHz# xxxxxx 110 000110 000001010 01001110 - 6.22 MHz# xxxxxx 110 000110 000001001 01001110 - 5.67 MHz# xxxxxx 110 000110 000001000 11001110 - 5.40 MHz# xxxxxx 110 000110 000001000 00001110 - 4.98 MHz# xxxxxx 110 000110 000001000 01101110 - 5.19 MHz# xxxxxx 110 000110 000001000 01011110 - 5.16 MHz# xxxxxx 110 000110 000001000 01011010 - 5.10 MHz# xxxxxx 110 000110 000001000 01001100 - 5.12 MHz# xxxxxx 110 000110 000001000 01001111 - 5.12 MHz
Summarizing the table:
- [31..26] doesn’t affect the output clock
- [25..23] is a divider #1 (D1 – 1)
- [22..17] is a divider #2 (D2 – 1)
- [16..8] is a base multiplier (B – 1)
- [7..0] is some kind of a correction offset (O)
For example, the default configuration (0x20C804E) might be decoded as:
- 0x20C804E = 100 000110 010000000 01001110
- D1 = 4 + 1
- D2 = 6 + 1
- B = 128 + 1
- O = 78
The resulting output clock should be N * 129 / 5 / 7 + 78 * C, where
- N – main onboard oscillator frequency (27 MHz)
- C – correction multiplier (~0.00214 MHz)
A fun fact: by calculation, the reference clock should be 99.68 MHz.
Further experiments showed that the on-the-fly frequency swap was a bad idea. At very low values, the clock might not change, while at very high values there was no output clock at all. But the worst thing was the instability: the resulting clock would fluctuate slightly even though the settings and conditions were the same. This does not affect the system workload, but it spoils the glitch countdown. Such results explain the poor RGH2 success rate.
To fix the clock precision issues, I tried to perform a PLL reset (it’s usually required to restart the clock generator to properly change the clock).
What I found by bruteforcing the neighbour register, was even better than I expected. It looks like the top bits of the register 0xCE control the source of the reference clock:
- 0x08E84014 – default mode (PLL)
- 0x28E84014 – main onboard oscillator bypass (27 MHz)
- 0x48E84014 – some weird mode; the 0xCD register works in a different way, with other dividers and multipliers (could be PLL’ed internal clock)
- 0x88E84014 – could be the internal HANA oscillator bypass (3-10 MHz)
- 0xC8E84014 – no clock output (disabled)
The 27 MHz bypass mode is perfect for the CPU glitch: it can be easily turned on and off, while being extremely stable.
Corona
The Xbox 360 revision Corona doesn’t have a separate HANA chip; it is combined with the southbridge instead. Although the I2C address of the clock generator module is the same (0xE0), the register set and the functions are completely different.

The first line means the hardware southbridge revision. XSB (Xenon), PSB (Panda), KSB (Koala)
By repeating the bit analysis for the register 0xCF used in the RGH2 method, the following table was obtained:
#0x100 - 132 MHz#0x0C0 - 100 MHz#0x040 - 33 MHz#0x030 - 25 MHz#0x020 - 16.6 MHz#0x021 - 8.3 MHz#0x022 - ?#0x023 - 5.55 MHz#0x024 - 18.7 MHz#0x025 - 9.3 MHz#0x026 - 4.68 MHz#0x027 - ?#0x028 - 20.8 MHz#0x029 - 10.4 MHz#0x02a - 5.2 MHz#0x02b - 6.9 MHz#0x02c - 22.9 MHz#0x02d - 11.4 MHz#0x02E - 5.7 MHz#0x02F - 7.6 MHz#0x116 - 36 MHz#0x319 - 180 MHz#0x219 - 140 MHz
Further analysis reveals that the last 2-bit field determines the divider:
- 00 – x1
- 01 – x2
- 10 – x4
- 11 – x3
The following 8-bit field is a multiplier. The final equation is quite simple:
Clock = M / D * 100/48 MHz, where
- M = multiplier
- D = divider
In contrast to the Fat revisions (where I could set the clock down to hundreds of KHz), the lowest clock I was able to set here was only 3.6 MHz. Unfortunately, the clock applies to all system parts: CPU, GPU, and even the SMC are affected by the slowdown (easily noticeable by the different UART frequency). Since the SMC is slowed down along with the CPU, this slowdown method can’t be used for the glitch (the glitch countdown code inside the SMC would also be slowed down).
A fun fact: in a slowed mode, the system fan starts to make a sound, the tone of which might be controlled by the slowdown factor. A good idea for a Device orchestra!
Fortunately, there is a bypass mode for the onboard 25 MHz crystal, like the one on Fat revisions. Experiments have shown that the fourth bit of the 0xDB register switches the bypass mode for the CPU:
- 0x01F001F8 – bypass the onboard 25 MHz
- 0x01F001F0 – the default PLL mode
Here we go again: a reliable and easily switchable clock directly from the onboard oscillator.
The same bypass is also available for the GPU reference clock (just set the register 0xDB to 0x01F801F0) as well as for the video pixel clock (register 0xDC value 0x000001F8).
Fat appears to be slow
As mentioned earlier, the CPU_PLL_BYPASS assertion on Fats gives a slowdown ratio of 128, while Slim revisions have a five times higher slowdown ratio of 640. This means that glitch on Fats may still not be reliable enough due to the way too long reset pulse. Since the HANA PLL for the CPU reference clock is unreliable, the only way I could find was… southbridge overclocking!
Its original clock is 48 MHz, but it is also generated by the same clock generator inside the HANA chip. Brute-force analysis of the registers revealed that the register 0xD4 could be easily used to adjust the SMC clock:
- 0x990e00e – default value (48 MHz)
- 0x990e001e – overclock x2 (96 MHz) – quite stable
- 0x990e026 – overclock x2.5 (120 MHz) – max value my Jasper could handle
You can try to find the full description of the fields (with multipliers, dividers, etc.). Consider it a home assignment. 🙂
To glitch, or not to glitch?
Okay, the slowdown is fine, now it’s time to find the best boot stage for glitching. The boot process of the Xbox 360 (the latest system version) can be divided into the following steps:
- 1BL: reading, decryption, and RSA-2048 signature verification of the CB_A
- CB_A: reading, decryption, and HMAC-SHA1 check of the CB_B
- CB_B: DRAM init, reading, decryption, and HMAC-SHA1 check of the CD
- CD: main hardware init (GPU, video, and so on), further system startup
As you can see, there is a chain of trust with multiple bootloaders decrypting each other. It’s hard to impact the latter, because you need all the keys used in the previous steps. So the earliest possible attack point is preferred. The RGH2 method targets the CB_B SHA-1 hash comparison inside the CB_A loader, which happens right after the 0xDA POST code:

Surprisingly, the research showed that the real place affected by the glitch pulse was not the compare itself! According to the experiments, the cmpwi and beq opcodes can’t be altered by the reset glitch hack, but the various arithmetic commands like addi (mr) can:

After the successful glitch, the real source register value is set to zero, resulting in a successful comparison in this particular case. This knowledge hints at other good glitch positions around POST code 0xD5:

With the help of the glitch pulse, it’s possible here to spoil the reading size argument, which results in an overwriting of the currently executing code with controllable NAND data (as well as an immediate interception of the code execution). Theoretically, it’s possible to get rid of the POST_OUT as a reference signal by using the CE pin of the NAND. I tried this in the very first RGH3 implementation, but abandoned it due to a low success rate.
Fun fact: balika011 was really interested in this way of POSTless glitching the Xbox 360 and made his own proof of concept based on the shared details (unfortunately, without any reference to the original source).
Can you hurry up? My dog is waiting!
The final algorithm of breaching the SMC by glitching the hash check within the CB_A is:
- Wait for POST 0xD6
- Trigger the HANA slowdown
- Wait POST 0xD8
- Wait for X ms (before the slowdown)
- Trigger the CPU_PLL_BYPASS slowdown
- Wait for POST 0xDA
- Wait for Y ms (the glitch timing)
- Send the reset glitch pulse
- Turn off the CPU_PLL slowdown
- Turn off the HANA slowdown
- Check if the glitch was successful
Why not follow the D8, D9, DA codes? Originally, the first POST bit was used due to the lack of the intermediate POST code in the RGH1 CB boot sequence (there was code 0x39 right after the 0x37). Additionally, in the RGH2 method, the use of the first POST bit saved some amount of CPLD resources (less ticks to count). Finally, in the Waitsburg and Stingray revisions (also known as Corona v3-v6), the POST_OUT debug pads are missing, making it impossible to access the 0th bit (the first bit happens to be in the last BGA row and is still accessible using various POSTfix spring-loaded adapters):

The problem is that SMC is not well suited for extra-precise time countdown, as opposed to FPGAs and CPLDs. The counting itself can be interrupted by hardware interrupts (who would have thought?), while a proper setup of internal timers is not possible due to a lack of the documentation. The easiest way left is to disable interrupts and decrement the counter in a busy loop.
This is where the problem starts: staying in a busy loop for a long time can trigger the watchdog, resulting in a restart of the SMC firmware. It’s possible to periodically kick the watchdog to avoid it, but this decreases the precision. Also, long waits increase the total time of each glitch attempt and affect the user experience. The less we wait, the better the result. So what are we waiting for? For the calculation of the CB_B hash sum! The smaller the CB_B size, the less the SMC must wait for it to be processed. All that’s left is to reduce the CB_B size as much as possible.
The correct CB_B must be:
- Not larger that 0xC000 in size
- With an entry point at offset 0x3D0 or higher (but within its size range)
- Its size must be a multiple of 4

Considering that the SHA-1 operates 0x40-sized blocks, the smallest possible CB_B must be 0x400 bytes in size. Although the entry point can’t be lower than 0x3D0, there are no such restrictions for the code as a whole.
Here we have a modified boot chain:
- 1BL loads and decrypts the CB_A
- CB_A loads and decrypts the CB_X (a new intermediate loader instead of the CB_B)
- CB_X loads the plaintext CB_B
- CB_B loads the plaintext CD
So the new loader CB_X is only there to reduce the processing time; its only task is to read the real CB_B and execute it. Now it’s time to flex our software development muscles!
Making a custom bootloader
Before creating a working bootloader, it’s necessary to find the ABI, a set of rules about how parameters are passed between different BLs.
It’s pretty easy since we have a reverse-engineered the 2BL C code:
- The loader copies itself to the SRAM offset 0xC000
- It uses data from register r31 that contains the NAND offset for the next BL to be loaded
- The loader reads the BL from NAND to the SRAM offset 0x0
- Then the hash calculations and checks are performed
- SRAM and most of the registers are cleared before executing the loaded BL; r31 is adjusted to pass the next BL offset
Pretty similar to the relay race, huh? Each BL checks the next one and passes the pointer to its next neighbor within r31. Since we can completely skip both the encryption and the checks, the few things we need to do are:
- Self-copy to SRAM offset 0xC000
- Load the CB_B from NAND
- Adjust the r31
- Execute the CB_B code
The first part is really simple if you know the required addresses:
sub_3D0: # lowest possible entry point li r3, 0x200 oris r3, r3, 0x8000 sldi r3, r3, 32 oris r4, r3, 1 # 0x8000020000010000, SRAM address addi r5, r4, -4 # will be used in the second part ori r6, r4, 0xC000 # 0x800002000001C000, where to copy li r2, 0x7F # 0x80 * 8 bytes mtctr r2copy_second_stage: # memcpy cycle ldu r2, 8(r4) stdu r2, 8(r6) bdnz copy_second_stage b 0xC350 # jump to the second part
In the second part, it would be good manners to tell the SMC that the CB_X has successfully booted using the POST bus. A small pause is necessary to allow the SMC to catch this code before CB_B changes it. The previous POST code 0xDB has a “1” at position 1, so the new POST code must have a “0” there for the SMC to notice the change. The value 0x54 meets these requirements:
oris r6, r3, 6 # 0x8000020000060000, I/O li r2, 0x54 # bit #1 must be ‘0’ here sldi r2, r2, 56 std r2, 0x1010(r6) # POST 0x54 to tell SMC we're done lis r2, 1 # small delay for the POST mtctr r2sleep_cycle: nop bdnz sleep_cycle
Finally, copy and run the CB_B:
oris r6, r3, 0xC800 # 0x8000020000C80000, mapped NAND addi r6, r6, -4 clrldi r2, r31, 32 add r6, r6, r2 # + provided r31 offset lwz r4, 0x10(r6) # CB_B size lwz r3, 0xC(r6) # CB_B entry point add r31, r31, r4 # update r31 offset srdi r4, r4, 2 mtctr r4copy_cbb: # copy CB_B to the SRAM lwzu r2, 4(r6) stwu r2, 4(r5) bdnz copy_cbb clrlwi r3, r3, 16 # prepare jump address addis r3, r3, 0x200 mtlr r3 blr # jump to the CB_B
The resulting code size is only 144 bytes (the available space was around 1,000). Not bad!
Building the NAND image
The final NAND image has its own format, so it’s vital to properly place all the puzzle pieces (SMC firmware, CB_A, CB_X, patched CB_B, open-source CD, and XeLL) together into the final binary. I used the existing open-source Xbox 360 image creation software (GliGli’s build.py, Swizzy’s Xebuild GUI, JRunner, and some others) as a reference.
The very first thing loaded by the console is the header. It’s placed at the very top of the image. The most important parts of it (SMC, CB, and CF addresses) are highlighted on the picture:

It’s easy to place the SMC firmware correctly: just use the encryption algorithm mentioned above, place the result somewhere in the image (it’s usually just before the key vault), then add its offset & size to the header. The CB_A is encrypted using the RC4 algorithm with a secret static 1BL key and random 16 bytes from its header. CB_B uses the same RC4 encryption, but with a different key (either the CPU key in retail images, or the zero key in MFG images). The rest is plaintext just to make things easier. Here we go:
import struct, secrets, sys, hmac, hashlibimport Crypto.Cipher.ARC4 as RC4key_1BL = "\xDD\x88\xAD\x0C\x9E\xD6\x69\xE7\xB5\x67\x94\xFB\x68\x56\x3E\xFA"def encrypt_smc(data): key = [0x42, 0x75, 0x4e, 0x79] res = bytearray() for i in range(len(data)): j = data[i] ^ (key[i&3] & 0xFF) mod = j * 0xFB res += struct.pack("B", j) key[(i+1)&3] += mod key[(i+2)&3] += mod >> 8 return bytes(res)def encrypt_cba(cba): rnd = secrets.token_bytes(16) key = hmac.new(key_1BL, rnd, hashlib.sha1).digest()[0:0x10] return (key, cba[0:0x10] + rnd + RC4.new(key).encrypt(cba[0x20:]))def encrypt_cbb(cbb, cba_key, cpu_key=b"\x00"*16): rnd = secrets.token_bytes(16) key = hmac.new(cba_key, rnd + cpu_key, hashlib.sha1).digest()[0:0x10] return cba[0:0x10] + rnd + RC4.new(key).encrypt(cba[0x20:])def insert(image, data, offset=None): if offset is None: offset = len(image) if offset > len(image): image += b"\xFF" * (offset - len(image)) return image[:offset] + data + image[offset + len(data):]# SMCsmc = open("smc.bin", "rb").read()smc_ptr = 0x800# BLscba_ptr = 0x8000cba = open("cba.bin", "rb").read()cbx = open("cbx.bin", "rb").read()cbb = open("cbb.bin", "rb").read()cd = open("cd.bin", "rb").read()# make headerimage = struct.pack(">HHLLL64s40xLL", 0xFF4F, 1888, 0, cba_ptr, 0, b"RGH3", len(smc), smc_ptr)# add SMCimage = insert(image, crypt_smc(smc), smc_ptr)# add BLs(key, cba_enc) = encrypt_cba(cba)image = insert(image, cba_enc, cba_ptr)image = insert(image, encrypt_cbb(cbx, key)) # MFG cba, so zero cpu_keyimage = insert(image, cbb) # decrypted due to load via cbximage = insert(image, cd) # decrypted due to patched cbb# add XeLLxell = open("xell.bin", "rb").read()image = insert(image, xell, 0xC0000)image = insert(image, xell, 0x100000)# save imageopen("image.bin", "wb").write(image)
The final result is ready to be used for loading the homebrew / Linux on the Xbox 360 (well, if you happen to successfully glitch the required check). The only thing left is to write it using some kind of NAND flasher … Wait, I forgot something. Sure, we still don’t have any glitching code inside the SMC. Let’s go back there!
SMC signals
According to the plan, we need to use two GPIOs: one for the POST bus line (the trigger for the glitch countdown) and another one for the CPU_PLL_BYPASS to control the slowdown. Both POST_OUT and CPU_PLL_BYPASS work at the low voltage levels (1.8V for Slim revisions, 1.2V for Phat) so it would be nice to use the SMC port 0 GPIOs (the only port with 1.8V signal levels). Luckily, Corona boards have two unused GPIOs there:

While GPIO5 is completely unused, GPIO4 is used to set the JTM configuration (can be easily forced in the SMC firmware). Let’s reserve the DB3R3 pad (GPIO4) for the POST_OUT and the DB3R4 pad (GPIO5) for the CPU_PLL_BYPASS:

Sadly, other revisions don’t have unused port 0 GPIOs:

Even if we use the 3.3v port DBG_LED0 for the PLL_BYPASS (this will overcurrent the SMC port, but according to my tests, it works fine), the POST_OUT line must be wired to the low-voltage line, otherwise it’s impossible to detect the trigger (of course, you could use a level shifter here, but that would be way too complicated for regular users).
The only input low-voltage GPIO is GPU_RST_DONE, which is used to detect the GPU startup state. Well, okay, this can also be forced in the code to free up the GPIO for the POST line. The resulting diagram shows the POST_OUT line going to the GPIO7 input and the CPU_PLL_BYPASS going to the DBG_LED pin through a resistor to prevent overcurrent:

For Fat revisions where the signal levels still didn’t add up, I found a solution in the form of another 500-Ohm resistor. Paired with the GPU signal and its own 1K resistor, this shifts the level enough for reliable trigger detection. (The other solution was to use a diode, but a resistor looks simpler).
Fun fact: although I tried to avoid complicated solutions like level shifters as much as possible, I found some really weird pictures of voltage regulators used for this purpose.
SMC firmware patching
Several additional patches are required to make the firmware work as intended. For example, the custom code needs to store its variables somewhere in the RAM (which is used entirely by the firmware).
In the Intel 8051 architecture, there are several types of memory and ways to access them:
- iRAM: regular internal RAM, 256 bytes
- mov A, XXh, direct access (only for the first 128 bytes)
- mov A, @R0, indirect access
- CODE: read-only code memory, 65536 bytes
- movc A, @DPTR, indirect read
- xRAM: external memory (might be anything), 65536 bytes
- movx A, @R0, indirect access (8-bit addressing)
- movx A, @DPTR, indirect access (16-bit addressing)
For the glitch code, the regular internal RAM is fine. To reserve a few cells, let’s adjust the initial stack pointer in two places:
smc = patch(smc, 0x7E5, b'\xC2') # own bytes BF..C2 at initsmc = patch(smc, 0x804, b'\xC2') # own bytes BF..C2 at main
A tiny bit of magic: to be called frequently without spoiling other code functions, it’s possible to intercept each FSM in the main cycles:
for pos in range(0x805, 0x84d, 3): if pos == 0x829: # remove the dbg LED FSM smc = patch(smc, pos, b"\x00" * 3) continue my_call = lcall(CODE_START) + ljmp(orig_addr) rom_end -= len(my_call) smc = patch(smc, rom_end, my_call) smc = patch(smc, pos, lcall(rom_end))
The other important thing is the initialization of the reserved variables, which could be done by intercepting the function call right before the main cycle. The init code can be used instead of the unused debug LED FSM:
smc = patch (smc, INIT_START, init_bin + ljmp(0x293C))smc = patch (smc, 0x7FD, lcall(INIT_START))
The only thing left here is the GPIO configuration (force the JTM config pin and change the direction of the unused pin):
smc = patch(smc, 0x256B, b'\x90') # P0.5 OUT dirsmc = patch(smc, 0x2539, b'\xC2') # force EXT_JTM to 0
The patching is done, let’s code!
The I2C stuff
For a successful slowdown, SMC must access the I2C interface and ensure that the command was sent. In the CR4 XL code, Team Xecuter uses existing firmware methods to recreate the full communication sequence:
- Fill the command parameters (what and where to send)
- Wait for the existing transactions to complete
- Reset the I2C pins configuration
- Check whether the I2C interface is not busy
- Finally, start the transaction

Looks simple, but this is actually a workaround. How should it really work? Let’s find out and do properly create our own workaround.
The SMC I2C subsystem is divided into 3 parts:
- The FSM, which handles requests from the other firmware subsystems
- The virtual machine, which creates and executes requests
- The interrupt handler that bit-bangs the I2C pins
The first part is obvious: when the firmware needs an I2C communication, one of the ten “I miss you, let’s go to the ICQ I2C chat” flag variables is set. Once the I2C FSM detects such a flag, it sets the starting point of the VM bytecode, kicks the VM, and then just waits untill everything is done:



Here is an example of the mentioned bytecode (or rather its part that will be executed in this branch):

The parsed representation is:
- 0x00 – I2C bus init
- 0x0E – write the HANA register (0xD9 = [00 03 80 00])
- 0x0E – write the HANA register (0xDC = [00 00 01 E2])
- 0x0E – write the HANA register (0xDB = [01 E2 01 E2])
- 0x0E – write the HANA register (0xCD = [00 00 00 67])
- 0x0B – zero the HANA register (0xDF = [00 00 00 00])
- 0x03 – I2C bus deinit
The virtual machine fills the transaction arguments with its own bytecode (the data along with addresses) into the predefined RAM variables, sets the RAM_2Dh.1 flag, and then starts the hardware timer. This timer generates hardware interrupts (with a period of 100 KHz) where the handler processes the set variables to output the data on the I2C lines. At the end of transmission, RAM_2Dh.1 is cleared so that the VM can proceed to the next step. This is repeated untill the end command code of 0x03 is executed by the VM.
As you can see, the whole thing implemented by Team Xecuter is already present in the firmware. Moreover, it conflicts with their implementation if the slowdown is triggered in the middle of the VM execution. What if we use this existing code instead? We have to add our own bytecode! The starting offset is 8 bits (00h … FFh), the original bytecode has a size of E2h, so there’s plenty of space for custom commands. For both speedup and slowdown, we need 1 (Init) + 6 (Write) + 1 (Exit) = 8 bytes, so 16 bytes in total. However, there is some code right where we need to add the bytecode, but it’s easy to move it (plus patch the methods where it used to be called):
# i2c re-arrangesmc = patch(smc, rom_end - 0x10, smc[0x2e49:0x2e59])rom_end -= 0x10smc = patch(smc, 0x2E9A, lcall(rom_end))smc = patch(smc, 0x2EA0, lcall(rom_end + 0xA))
Now there’s enough space for the new bytecode:
# IN W DB=[01 F0 01 Fx] EXfast_data = b"\x00\x0E\xDB\x01\xF0\x01\xF0\x03"slow_data = b"\x00\x0E\xDB\x01\xF0\x01\xF8\x03")smc = patch(smc, 0x2e49, fast_data + slow_data)
Yeah, now it’s possible to set the starting slowdown/speedup bytecode offset and kick the VM (of course, make sure beforehand that the I2C FSM is not busy):
... mov R0, #SMC_I2C_STATE cjne @R0, #0, return ; check smc i2c FSM, must be idle mov 79h, #0E3h ; fast sequence jс skip_slow mov 79h, #0EBh ; slow sequenceskip_slow: lcall startup_i2c ; initiate the transfer jb 0D0h.5, return ; failed to execute the i2c command mov R0, #SMC_I2C_STATE mov @R0, 4 ; move i2c FSM to the waiting state...
For Fat revisions, the bytecode constants are a bit different. Also, we need a little more space for the SMC overclocking bytecode. Anyway, it’s pretty similar to the Slim code:
# i2c re-arrangeCUT_START = 0x2a38CUT_END = 0x2a62CUT_LEN = CUT_END - CUT_STARTsmc = patch(smc, rom_end - CUT_LEN, smc[CUT_START:CUT_END])rom_end -= CUT_LEN# delayfor off in [0x2681, 0x2687, 0x26C9, 0x26CE, 0x2A67, 0x2A6C]: smc = patch(smc, off, lcall(rom_end))# line statussmc = patch(smc, 0x26BB, lcall(rom_end + 0x10))smc = patch(smc, 0x2A6F, lcall(rom_end + 0x10))# hw triggersmc = patch(smc, 0x26D1, lcall(rom_end + 0x14))# IN W CE=[x8 E8 40 14] W D4=[09 90 e0 xx] EXslow_data = b"\x00\x0B\xCE\x28\xE8\x40\x14\x0B\xD4\x09\x90\xE0\x1e\x03"fast_data = b"\x00\x0B\xCE\x08\xE8\x40\x14\x0B\xD4\x09\x90\xE0\x0E\x03"smc = patch(smc, CUT_START, slow_data + fast_data)
Time tracking
The glitching algorithm is completely tied to time measurement. While the high-precision countdown is done with a simple (cnt -=1) code, the long-range period is better measured in milliseconds. One such example is waiting for the I2C slowdown to stabilize (about 50 ms).
Of course, the SMC has such a feature! Remember the two main cycles, one of which was periodic? Yes, there’s a check for a specific amount of time (20 ms) to pass between cycles. It uses the pseudo RTC in the front module and its hardware interrupts (why pseudo? Because it doesn’t have a backup power supply, so it resets every time you unplug the console):


As you can see, the SMC counts 20 milliseconds and then goes to the periodic FSMs for another cycle. I wrote a special function that analyzes this counter variable and tells me whether the milliseconds have passed. This is done to avoid modifications of the interrupt handler:
msec_passed: mov R0, #MSEC_REG mov A, @R0 ; previous saved ms value orl INT_CNTRL, #01h ; disable interrupts cjne A, SMC_MSECS, msec_differs sjmp msec_samemsec_differs: mov @R0, SMC_MSECS cjne A, #015h, msec_setc cjne @R0, #001h, msec_setc ; do not track the 15h -> 01h sjmp msec_samemsec_setc: setb Cmsec_same: anl INT_CNTRL, #0FEh; enable interrupts ret
Looks like most of the prep work is done, so let’s get to the main code.
The RGH3 code
I’ve split the code into two big parts for simplicity. The smaller part does the slowdown arbitration, the larger one does the POST bus tracking and the CPU glitching. Oh, there is also the tiny hardware / variable initialization where the UART speed is configured:
org 014C0hstart: mov R0, #RAM_START mov @R0, #000h ; clear the i2c state mov 0E9h, #0FFh ; init UART speedend
A very simple top-level design for the final code:
start: lcall i2c_fsm ; i2c slowdown related stuff lcall rgh_fsm ; glitch & timing related stuffreturn: ret
The I2C part checks the request bit in the special variable “I2C_ST” and performs either the speedup or the slowdown, depending on the request and the current state:
i2c_fsm: mov R0, #I2C_ST_REG mov A, @R0 jnb I2_BIT_WAI, try_send_i2c; waiting for completion processing jb SMC_I2C_S, return ; not completed mov C, I2_BIT_SAV ; move saved request mov I2_BIT_NOW, C ; to the real state clr I2_BIT_WAI ; clear waiting for completion mov @R0, A ; update rgh i2c state rettry_send_i2c: mov C, I2_BIT_NOW ; compare request bit the real state jc check_if_1 jnb I2_BIT_REQ, return ; we are fast already, nothing to do sjmp check_statecheck_if_1: jb I2_BIT_REQ, return ; we are slow already, nothing to docheck_state: mov R0, #SMC_I2C_STATE cjne @R0, #0, return ; check smc i2c FSM, must be idle mov 79h, #0E3h ; VM_POS = fast sequence jnb I2_BIT_REQ, skip_slow mov 79h, #0EBh ; VM_POS = slow sequenceskip_slow: lcall startup_i2c ; initiate the transfer jb 0D0h.5, return ; failed to execute the i2c commandi2c_success: mov R0, #SMC_I2C_STATE ; set the i2c FSM into waiting state mov @R0, 4 ; use the state 4 as the least problematic one mov R0, #I2C_ST_REG mov A, @R0 setb I2_BIT_WAI ; set 'waiting for completion' mov C, I2_BIT_REQ mov I2_BIT_SAV, C ; save the requested speed mov @R0, A ret
STATE_IDLE
This is the initial state that is forced when the CPU is powered down. Here, all kinds of slowdowns are disabled, and the POST 0x1C wait delay is configured:
rgh_fsm: mov R0, #RGH_ST_REG mov R1, #I2C_ST_REG mov A, @R1 jb CPU_RST, check_state_0 mov @R0, #STATE_IDLE ; reset the glitch FSM when CPU is offcheck_state_0: cjne @R0, #STATE_IDLE, check_state_1 clr CPU_PLL ; disable PLL slowdown clr I2_BIT_REQ ; disable I2C slowdown mov @R1, A ; update rgh i2c state jb I2_BIT_NOW, __return ; wait till I2C done jnb CPU_RST, __return ; don't setup anything when off mov R1, #DELAY_REG mov @R1, #255 ; 1C waiting delay in ms sjmp go_next_step
STATE_WAIT_1C
A very simple state; just waiting for the specified amount of ms:
check_state_1: cjne @R0, #STATE_WAIT_1C, check_state_2 sjmp wait_for_smth
STATE_POST_1C
Another simple state; just waiting for the POST 1C to actually happen:
check_state_2: cjne @R0, #STATE_POST_1C, check_state_3 jb POSTBIT, go_next_step ; just wait for the POST pin ret
STATE_WAIT_NAND
A slightly more difficult state. Here, the POST codes 0x1E, 0xD0-0xD5 are skipped to reach the moment when it’s safe to enable the I2C slowdown. According to the experiments, performing the slowdown when the NAND is still being read can lead to a system halt. Along with the I2C slowdown, CPU_PLL_BYPASS is also asserted to get more time for the I2C to finish:
check_state_3: cjne @R0, #STATE_WAIT_NAND, check_state_4 orl INT_CNTRL, #01h ; disable interruptswait_post_1e: ;~6ms jb POSTBIT, wait_post_1ewait_post_d01: ;~1.5ms jnb POSTBIT, wait_post_d01 ; in case of bad POST signalwait_post_d23: ;~22 us ; we can accidentally cause jb POSTBIT, wait_post_d23 ; watchdog SMC reboot herewait_post_d45: ;~962 us jnb POSTBIT, wait_post_d45 ; here all NAND communication is done, so we can do HANA i2c setb CPU_PLL ; use PLL slowdown to get more time for that anl INT_CNTRL, #0FEh ; enable interrupts setb I2_BIT_REQ ; enable I2C slowdown mov @R1, A ; update rgh i2c state mov R1, #DELAY_REG mov @R1, #60 ; hana clock switch waiting in ms sjmp go_next_step
STATE_WAIT_SLOW
Same as before, just wait for the millisecond countdown (this time, to allow HANA to reconfigure the reference clocks).
check_state_4: cjne @R0, #STATE_WAIT_SLOW, check_state_5 sjmp wait_for_smth
STATE_GLITCH
The heart of the whole project. This is where the glitch actually happens!
- The PLL slowdown is disabled, the I2C sequence should be finished by now.
- The interrupts are disabled to avoid spoiling the glitch timing countdown.
- Both glitch timing and the PLL delay are configured in the registers.
- The code applies the PLL slowdown after the configured delay, then waits for the POST 0xDA trigger.
- Finally, the glitch pulse is sent with the configured delay after the POST 0xDA trigger.
- The interrupts are re-enabled, all the slowdowns are disabled, and the FSM finally goes into the “waiting for the result” state.
check_state_5: cjne @R0, #STATE_GLITCH, check_state_6 clr CPU_PLL ; remove PLL slowdown, cuz i2c is done orl INT_CNTRL, #01h ; disable interruptswait_post_d67: jb POSTBIT, wait_post_d67 lcall reset_watchdog mov R2, #PLL_DELAY_0 mov R3, #PLL_DELAY_1 mov R4, #PLL_DELAY_2 mov R5, #GLI_PULSE_0 mov R6, #GLI_PULSE_1 mov R7, #GLI_PULSE_2wait_for_slowdown: djnz R2, wait_for_slowdown djnz R3, wait_for_slowdown djnz R4, wait_for_slowdown setb CPU_PLL lcall reset_watchdogwait_post_d89: jnb POSTBIT, wait_post_d89 ; here is the post DA happened, final route to the glitch! lcall reset_watchdog ; we need really much time herewait_for_reset: ; wait for 145.41 ms djnz R5, wait_for_reset djnz R6, wait_for_reset djnz R7, wait_for_reset clr CPU_RST ; reset pulse setb CPU_RST lcall reset_watchdog clr CPU_PLL anl INT_CNTRL, #0FEh ; enable interrupts clr I2_BIT_REQ ; disable I2C slowdown mov @R1, A ; update rgh i2c state mov R1, #DELAY_REG mov @R1, #10 ; about 10 ms to check the glitch result sjmp go_next_step
STATE_WAIT_SUCC
Here, the code waits for the POST signal to be changed by the CB_X bootloader.
check_state_6: cjne @R0, #STATE_WAIT_SUCC, check_state_7 sjmp wait_for_smth
STATE_TEST_SUCC
If the POST signal change is not detected, the code reboots the console to try again. This kind of early failure detection results in a pretty decent speed of 1 boot per second.
check_state_7: cjne @R0, #STATE_TEST_SUCC, check_state_8 mov R1, #DELAY_REG mov @R1, #00 ; 256 ms to check the hardware init result setb SMC_ARG_E ; enable argon processing jnb POSTBIT, go_next_stepgo_reset: mov @R0, #STATE_FINISH ; set halt step to avoid multiple resets ljmp prepare_reset
STATE_WAIT_HW, STATE_TEST_HW
The last optional test checks for the completion of the CB_B hardware init. (quite useful for Fat revisions):
check_state_8: cjne @R0, #STATE_WAIT_HW, check_state_9 sjmp wait_for_smth ; go waitcheck_state_9: cjne @R0, #STATE_TEST_HW, _return jb POSTBIT, go_next_step sjmp go_reset
The delay values were calculated and adjusted using the logic analyzer. The glitch timing values were calculated based on known RGH2 values and further bruteforced with a custom SMC code.
As a special treat
I said at the very beginning that we’d come back to the NAND flashing, so here we are. The XDK Sidecar accesses the flash memory using an SPI interface, specifically by writing into a set of hardware registers inside the southbridge. The whole protocol was reverse engineered way back in 2006. For example, this is how the flash erase is performed:
int xbox_nand_erase_block(uint32_t lba){xbox_nand_clear_status();spiex_write_reg(0x00, spiex_read_reg(0x00) | 0x08);spiex_write_reg(0x0C, lba << 9);spiex_write_reg(0x08, 0xAA);spiex_write_reg(0x08, 0x55);spiex_write_reg(0x08, 0x05);if (xbox_nand_wait_ready(0x1000))return 0x8000 | xbox_nand_get_status();return 0;}
The system software also uses the same interface to access the flash. Although the memory-mapped register set is used instead of the SPI interface, here is the open source flash driver:
int sfcx_erase_block(int address){sfcx_writereg(SFCX_CONFIG, sfcx_readreg(SFCX_CONFIG) | CFG_WP_EN);sfcx_writereg(SFCX_ADDRESS, address);while (sfcx_readreg(SFCX_STATUS) & STATUS_BUSY);sfcx_writereg(SFCX_COMMAND, UNLOCK_CMD_1);sfcx_writereg(SFCX_COMMAND, UNLOCK_CMD_0);while (sfcx_readreg(SFCX_STATUS) & STATUS_BUSY);sfcx_writereg(SFCX_COMMAND, BLOCK_ERASE);while (sfcx_readreg(SFCX_STATUS) & STATUS_BUSY);int status = sfcx_readreg(SFCX_STATUS);sfcx_writereg(SFCX_CONFIG,sfcx_readreg(SFCX_CONFIG) & ~CFG_WP_EN);return status;}
As you can see, both methods look pretty much the same. Thanks to this, the open-source XeLL project (Xenon Linux Loader) is able to reflash the NAND in case of a total system failure (aka brick):

But once upon a time in 2011, Corona motherboards added a new type of flash memory — eMMC:

The well-known SPI access protocol didn’t work there, so the new method of hardware flash access had to come, using an SD card reader:

Since the protocol was unknown, XeLL could not unbrick consoles anymore. No one has reverse-engineered this thing in twelve years, so let’s finally fix this!

Luckily, the basic SPI protocol is the same: a set of 64 hardware registers, some of which can be read and some of which can also be rewritten. But their values don’t match the NAND mode values at all:
0x00: 0xC0462002 <= config?0x04: 0x00000200 <= nand status?0x08: 0x40FF8000 <= exec?0x0C: 0x01020000 <= LBA?0x10: 0x00FF8080 <= data?0x14: 0x00000000 <= ???0x18: 0x000000000x1C: 0x000000000x20: 0x000019210x24: 0x00F200010x28: 0x000000000x2C: 0x000E40070x30: 0x000000000x34: 0x1400040A0x38: 0x00C500000x3C: 0x10003FFF
To find out what’s going on here, let’s check the Xbox 360 kernel eMMC driver. To make it easier, there’s a huge XDK leak that contains Xbox 360 kernels with symbols (in this case, 21256.18_Xenon_Recovery_with_Symbols).
By searching all MMC* methods, it’s easy to find the method that writes the final south bridge registers:

Along with the huge state machine that, according to its name, should perform the reset of the MMC hardware (MmcxContinueResetStateMachine):

Code analysis and extensive experimentation lead to an understanding of some of the hardware registers:
- 0x04 – the size of the transfer data block
- 0x08 – the MMC command argument
- 0x0C – the execution settings including the MMC command
- 0x10..0x1C – the MMC command response
- 0x20 – data buffer FIFO
- 0x24 – could be the hardware status register
- 0x2C – could be the hardware control register
- 0x30 – the interrupt status
- 0x3C – could be the initialization status
From the kernel code, it’s also possible to find few MMC commands and their execution sequence:
GO_IDLE:
- [0x08] = 0
- [0x0C] = 0x1800
SET_BLOCK_COUNT:
- [0x04] = 0x0
- [0x08] = count
- [0x0C] = 0x171A0010
SELECT_CARD:
- [0x04] = 0x200
- [0x08] = 0xffff0000
- [0x0C] = 0x71a0000
DESELECT_CARD:
- [0x04] = 0x200
- [0x08] = 0x0
- [0x0C] = 0x7000000
Unfortunately, the system uses READ_MULTIPLE and WRITE_MULTIPLE commands for flash data access which use DMA for the data transfer. The DMA configuration registers (0x58 / 0x5C) are inaccessible to the SPI interface as well as to the DRAM where the data is stored.
However, using the collected information, it’s easy to understand the purpose of the bits in the main command register 0x0C (the fields are highlighted with a bit mask for simplicity):
- 0xFF000000 – MMC command number
- 0x180000 – “start the command” bits
- 0x000010 – transaction direction (1 – read, 0 -write)
- 0x030000 – command type
- 0x200000 – whether to use the buffer or not
- 0x000001 – DMA mode setting
- 0x000004 – whether to wait for the buffer write before the execution
- 0x000022 – something read/write-related
The other source of the information is the southbridge itself! At the startup, it runs the eMMC initialization sequence by itself:

Guess what? All the configuration for these commands could be read over SPI! So I made a simple project to intercept register values. I had to run it hundreds of times for different registers, but in the end I got all I wanted. For example, this is a list of values for the command register 0x0C (which matches the eMMC init log pretty well):
- 0x1 02 0000
- 0x2 01 0000
- 0x3 1a 0000
- 0x7 1a 0000
- 0x6 1b 0000
- 0x8 3a 0010
- 0x6 1b 0000
- 0x7 00 0000
- 0x6 1a 0000
- x10 1a 0000
- 0x3 01 0000
Here is the interception log for the register 0x08 (the command argument):
- 0x40ff8000 <= argument of the cmd1, send_op_cond
- 0x0
- 0xffff0000 <= argument of the cmd3, send relative address
- 0x3b90100 <= argument of the cmd6, switch function
- 0x0
- 0x200 <= argument of the cmd16, set block length
- 0x3bb0000
- 0x3b70200
- 0x0
Other registers were also intercepted in the same way, but there was nothing interesting, just weird values. Finally based on this analysis, I created a library for block-by-block reading and writing the eMMC. Here are the most important methods:
def read_csd(): spi_reg_write(0x04, 0x00) spi_reg_write(0x08, 0xffff0000) spi_reg_write(0x0C, 0xA010000) # 0x9010000 for CID wait_simple_int() data = b"" for i in range(0x10, 0x20, 4): data += struct.pack("<I", spi_reg_read(i)) return data
def read_block(block): select_card() set_blocklen(0x200) spi_reg_write(0x04, 0x200 | (1 << 16)) spi_reg_write(0x08, block << 9) spi_reg_write(0x0C, 0x113a0010) #0x83A0010 for EXT_CSD wait_int(1) wait_int(0x20) data = b"" for i in range(0x80): data += struct.pack("<I", spi_reg_read(0x20)) deselect_card() return data
def write_block(block, data): select_card() set_blocklen() spi_reg_write(0x04, 0x200 | (1 << 16)) spi_reg_write(0x08, block << 9) spi_reg_write(0x0C, 0x183a0000) wait_int(1) for i in range(0, 0x80): spi_reg_write(0x20, struct.unpack("<I", data[i*4:i*4+4])[0]) wait_int(0x10) wait_int(2) deselect_card()
This library has also been ported to C for the PicoFlasher programmer project, and of course for the XeLL with the eMMC unbrick feature. Finally!
Conclusion
The chipless and reliable Xbox 360 modding method is real. In most cases, it boots instantly for Slim revisions and within a few seconds for Fat revisions!

On my GitHub you can find:
- sources and ready to use binaries for the RGH3 XeLL (Linux / Homebrew loader)
- glitch2m images (a fully working Xbox 360 system image with fuses spoofing)
- PicoFlasher fork with both NAND and eMMC access support
- extended libxenon и xell-reloaded versions with eMMC unbrick support
Sadly Zephyr and Xenon revisions mostly fail when the CPU_PLL_BYPASS is deasserted, so there is no RGH3 for them. Although they were not very reliable and most of them are dead now.
Regarding the Winchester revision – they are immune to RGH, the retail versions additionally have the POST_OUT, CPU_EXT_CLK and CPU_PLL_BYPASS lines disabled. Such research is a topic for a whole new story 🙂