pair of pared pears | articles in the "hardware" category

Jun 30, 2023

Three pitfalls in I²C

I recently implemented an I²C slave, and came across a few interesting corner cases in the specification.

I²C basics

I²C is a multi-master, low-speed, bidirectional bus specified in NXP UM10204. There are only two signals: SCL (the clock) and SDA (the data). Each of the signals is open-drain, with resistors pulling the signals high. This property is used throughout the protocol. For example, by defining an acknowledgement (ack) as holding SDA low, there is an implicit negative acknowledgement (nack) when no device responds to a transaction.

The general format of transactions is

a start condition
a 7-bit address
a read/write bit
an acknowledgement
any number of number of data bytes, each followed by an acknowledgement
a stop condition

For example, in the following diagram shows a single-byte read:

SDA is valid on the rising edge of SCL, and SDA changes on the falling edge of SCL. To signal the start and end of the transaction, SDA transitions with SCL high. This framing violation makes it easy to re-synchronize the master/slave state machines.

An important aspect of I²C that is not visible in the above diagram is who is sending data. Because the signals are open-drain, both the master and slave can drive the bus at the same time. The following diagram shows what the internal drivers of SDA in the above transaction might look like:

At the beginning of the transaction the master sends data on the bus, while the slave leaves its SDA high. Then the slave acknowledges the request and sends a byte of data. Since this is the last byte the master wants to read, the master doesn’t acknowledge the data and sends a stop condition.

Quick reads

One of the shortest types of I²C transactions is the quick read/write (so named by SMBus). These transfer one bit of data in the read/write bit following the address. Once the master receives an ack, it sends the stop condition to end the transaction. In addition to transfering a bit of data, these transactions can also be used as a heuristic way of detecting available slaves (such as with i2cdetect). The following diagram shows a successful quick read:

From the slave’s point of view, a quick read looks just like a regular read transaction. This can prevent the master from sending the stop condition if the first bit of the byte is a 0, since the slave will hold SDA low. If the read byte is all 0s, the slave won’t release SDA until the ack bit:

When designing a slave, this can be avoided by ensuring that the first bit of any read transaction is 1. If the slave has a “command” or sub-address register which needs to be written as the first byte of a transaction, the default data before the command register is written can be all 1s for the same effect.

From the master’s perspective, all that is needed is to continue reading out the byte until there is a high bit. This is guaranteed to happen when the slave waits for an ack.

SDA hold time ambiguity

While using coding violations for framing is a common technique, it creates a conflict on the falling edge of SCL. If a slave sees SDA fall before SCL, it can detect a spurious start/stop condition.

SMBus versions before 3.0 specified a 300 ns minimum hold time (t_HD;DAT). This ensures that other devices on the bus see SCL transition before SDA.

I²C, on the other hand, has a minimum hold time of 0 seconds. Versions 6 and earlier of UM10204 suggested the following solution:

A device must internally provide a hold time of at least 300 ns for the SDA signal (with respect to the V_IH(min) of the SCL signal) to bridge the undefined region of the falling edge of SCL.

That is, if a device detects a start/stop condition it must wait 300 ns before doing anything. If SCL is still high, it was a real start/stop. Otherwise it was just a data transition. The 300 ns value in both I²C and SMBus is t_f, or the maximum fall time. Waiting this long ensures that SCL has transitioned before we sample SDA.

To allow for greater compatibility between SMBus and I²C devices, SMBus versions 3.0 and later reduce t_HD;DAT to 0 seconds. In a lengthy appendix, they suggest using the same strategy as I²C.

Despite this, version 7 of UM10204 seems to suggest that neither a 300 ns hold time nor an internal delay are necessary to resolve this issue. Looking closely at the timing diagram, t_HD;DAT is defined as the time between when SCL falls to 30% V_DD (logical 0), and when SDA rises above 30% V_DD or falls below 70% V_DD. Therefore, it suggests that devices

Ensure SCL drops below 0.3 V_DD on falling edge before SDA crosses into the indeterminate range of 0.3 V_DD to 0.7 V_DD.

Regarding masters which don’t support clock stretching and don’t have inputs on SCL, UM10204 continues:

For controllers that cannot observe the SCL falling edge then independent measurement of the time for the SCL transition from static high (V_DD) to 0.3 V_DD should be used to insert a delay of the SDA transition with respect to SCL

effectively mandating a 300 ns hold time… which is what SMBus switched away from.

However, even masters supporting clock stretching should still use a delay for two reasons: First, it is difficult to detect when SCL falls below 30% V_DD, since in typical implementations the entire region from 30–70% V_DD is indeterminate. And second, devices with series protection resistors might not see the same value on SDA as the transmitter, since there will be a voltage difference across the resistor.

For maximum compatibility, devices should implement both an output hold time and an internal hold time when detecting start/stop conditions.

Implementation support

Unfortunately, despite much vacillation in SMBus and I²C, this issue does not seem to be known to some implementors. A quick survey of open-source implementations reveals fairly patchy handling:

Wikipedia’s bitbang implementation, doesn’t wait between clear_SCL and set_SDA in i2c_write_bit. That said, it doesn’t seem to support multi-master busses, so it may be assuming slaves with an internal hold time.
Linux doesn’t wait between scllo and setsda in i2c_outb, but it doesn’t seem to support multi-master busses either. Some of the hardware-accelerated drivers seem to be aware of this issue, and support configurable hold times. This allows using the SMBus pre-3.0 solution, as long as all slaves also support it.
Neither the master nor slaves in Alex Forencich’s I²C project seem to delay for a hold time or use an internal hold time.
Freecores’ master doesn’t add an internal hold time or use an internal hold time.

It’s often unclear whether commercial implementations correctly handle this ambiguity. For example, this AT24 EEPROM datasheet specifies a 0 second hold time, but doesn’t mention any internal hold time. Many vendors support configurable hold times, which shows they are aware of the issue. Occasionally, there are errata regarding it.

I suspect that for most hardware this ambiguity becomes an issue when the input threshold voltage is on the low end. This could cause a rising SDA to be detected before a falling SCL. This is exacerbated by high bus capacitance, but many busses have low (a few dozen pF) capacitance. As with many timing violations, mean time between failure can be quite long, and incorrect implementations may not be noticed.

Fast-mode Plus compatibility

The original (Standard-mode) I²C runs at 100 KHz, but UM10204 also includes a backwards-compatible “Fast-mode” which runs at 400 KHz. There are also “High-speed mode” and “Ultra Fast-mode” varients which are not backwards compatible. In 2007, NXP introduced a “Fast-mode Plus” which runs at 1 Mhz and was designed to be backwards-compatible. SMBus also incorporated this mode into version 3.0.

To determine what a Fast-mode Plus slave needs to do to be backwards compatible, let’s first examine Fast-mode backwards-compatibility. For a Fast-mode slave to be backwards compatible with Standard-mode, its input and output timings must be compatible with both Standard-mode and Fast-mode. Generally, output timings are the same as Fast-mode. Standard-mode only requires a longer setup time, which will be met as long as the slave doesn’t stretch the clock. Similarly, input timings are mostly the same as Fast-mode. One issue could be the internal hold time necessary for the SDA ambiguity detailed above. However, both Standard- and Fast-mode specify a 300 ns fall time (t_f), which is less than Fast-mode’s 600 ns start condition setup time (t_SU;STA). Therefore, the same 300 ns hold time can be used for both modes.

Unfortunately, Fast-mode Plus reduced t_SU;STA to 260 ns in order to achieve a higher clock rate. This means that every Fast-mode Plus start condition is within the SDA hold time ambiguity in Fast- and Standard-mode. A slave which implements the 300 ns internal delay required by Fast- and Standard-mode will not be able to detect Fast-mode Plus start conditions with minimum-specified delay.

There are some ways to mitigate this at the system level:

All bust masters could be configured to run at 960 kHz, which (if t_SU;STA is scaled as well) will provide enough of a delay to ensure start times will be detected correctly.
Components with higher slew rates could be selected to ensure t_f remains below 260 ns. Alternatively, bus line capacitance could be reduced below the maximum.

As well as some ways to mitigate this at the device level:

A configuration bit (such as a register or a pin) could configure the device to be either Fast-mode or Fast-mode Plus compatible. This could even be automatically detected, although this would need to be done carefully since masters can switch speed at any time. For example, a master might run at one speed when accessing a certain device, and another speed when accessing a different device.
The input drivers could be engineered to have a lower V_IH and a higher V_IL, reducing the time of ambiguity (assuming monotonic transitions).

But, as-written, the Fast-mode Plus timings are incompatible with Fast- and Standard-mode. Pre-3.0 SMBus and post-v7 I²C are not affected because they do not require an internal hold time.

posted at 05:13 · hardware · i2c smbus pedantry

Dec 24, 2022

The smallest inter-frame gap in Fast Ethernet

Fast Ethernet in the form of 100BASE-TX is a very mature technology, although there were some hiccups in getting there. So it was surprising to me to find a way for the PCS receiver to accept an inter-frame gap shorter than the end-of-stream delimeter itself.

The 100BASE-TX Ethernet physical layer is broken up into several sub-layers:

Working from the bottom up, the PMD converts analog signals from the medium to MLT-3 symbols (correcting for channel loss and baseline wander), decodes those symbols to bits, descrambles the bits, and then encodes the bits with NRZI. The PMA converts from NRZI back to bits^[1] and performs some other optional tasks (such as far-end fault detection when using a medium without autonegotiation). The PCS performs 4B5B^[2] alignment and decoding and generates data on the MII.

The PMA provides the PCS with a raw stream of bits. There is no indication of where one nibble begins and another ends. To recover this information, each nibble (4 bits) is encoded using 5 bits. Half of this encoding is used for data, but the other half is either invalid or used for control code groups. When no data is being transferred, the /I/ control code group (0b11111) is continuously transmitted. The control code groups /J/ and /K/ (0b11000 and 0b10001) form the Start-of-Stream Delimiter (SSD), indicating the start of a new frame:

To ensure that this sequence can always be recognized, all other code groups which (in combination with any other code group) could contain the sequence 0b1100010001 are invalid. The last control code groups to cover are /T/ and /R/ (0b01101 and 0b00111), which form the End-of-Stream Delimiter (ESD), indicating the end of a frame. By convention, /V/ is used for invalid code groups.

IEEE 802.3 specifies the PCS receiver through a state diagram. Each state performs its actions continuously, and has several conditions leading to other states. Conditions are formed of several comparisons or events, with * expressing “and” (conjunction). All conditions are evaluated continuously (more on this later), and sometimes there are uncontrolled transitions (UCTs), immediately transitioning from one state to another.

The variables used in the above diagram are:

link_status: Whether the link is up. For our purposes, we can assume that this is always true.
rx_bits: The last 10 bits we have received. The least significant bit (0) is the most recent bit. These are not aligned, except when gotCodeGroup.indicate occurs. In that case, rx_bits[4:0] is the most recent code group, and rx_bits[9:5] is the next most recent.
gotCodeGroup.indicate: This event occurs whenever there is a complete code group in rx_bits. It is automatically generated every five bits by the receive bits process (not shown). The initial alignment is determined by when RX_DV goes high.
RX_DV: An MII signal indicating whether RXD is valid or not. This also determines the alignment of gotCodeGroup.indicate.
RXD: An MII signal indicating the recieved data.
RX_ER: An MII signal indicating there was an error.
receiving: Whether we are currently receiving a packet.
rx_lpi: This is only used for Energy-Efficient Ethernet (EEE).

Let’s add some of these signals to the SSD diagram from before:

There are a few important things to note here. First, note that there is an instant transition from CARRIER DETECT to IDENTIFY JK. This is because one of the conditions for exiting CARRIER DETECT will always be true, and both will be evaluated immediately. Second, although state and gotCodeGroup.indicate are shown as having transition times, they are really instant from the state machine’s point of view. In particular, gotCodeGroup.indicate is only true for one instant. If a state and its successor both depend on gotCodeGroup.indicate to transition, they can’t both transition off of the same gotCodeGroup.indicate event. If implemented in hardware, states like CARRIER DETECT should be viewed more as “superstates” (a la superclass) rather than states proper. The last thing to note is that there’s a delay of two code groups between when the first bit of a code enters the PCS and when the data gets signalled on the MII.

Now, lets’s look at the end of a packet:

Note that like CARRIER DETECT, END OF STREAM instantly transitions to IDLE, but not before setting rx_bits to all 1s. This is as if we had already been sending /I/I/ instead of /T/R/. We can abuse this to create (through non-standard coding) the shortest possible inter-packet gap. The following diagram shows the contents of rx_bits after END OF STREAM as “virtual” bits. It also shows the actual received bits as “real” bits:

A fully-compliant 100BASE-X PCS will exactly follow the state transitions in the diagram above, allowing an inter-frame gap of just 0.8 octets.

It appears that the designers of the PCS state machine wished to allow back-to-back frames (e.g. /T/R/J/K/)^[3]. If this is indeed the case, we need to be careful when fixing the state machine. We could omit the clearing of rx_bits, but this would cause us to erroneously transition to BAD SSD (as we would transition to CARRIER DETECT when rx_bits[9:0] = /R/J/). Adding a condition on gotCodeGroup.indicate would not work for similar reasons. Instead, we should add an intermediate PRE IDLE state after END OF STREAM. This state would set receiving, RX_ER, and RX_DV to FALSE. It would transition to IDLE on gotCodeGroup.indicate.

In practice, this “bug” has no consequences. This is for three reasons. First, conforming PCSs can never generate this bit pattern. Second, conforming MACs use an inter-frame gap of at least 12 octets. And, third, most PCSs do not implement the specification exactly. This is because an exact implementation imposes difficult-to-meet timing requirements due to the tight coupling between the receive bits process and the recieve process. A typical implementation may register the outputs of the receive bits process, which does not allow for the instant feedback necessary to re-align the receive bits process and trigger the bug.

1. Converting from binary to NRZI and back again is not terribly efficient (and I suspect that almost no integrated implementations of 100BASE-TX include it). So why is it specified in the standard? The 100BASE-TX PMD is largely based on the CDDI (Copper Distributed Data Interface) PMD. That PMD was designed to be a drop-in replacement for the FDDI (Fiber Distributed Data Interface) PMD, so that the PCS and PMA could be reused. The FDDI PMD just performed some filtering and conversion from optical to electrical signalling (much like modern optical SFP modules). The NRZI conversion itself was done in the PMA. To keep compatibility, the PMD decodes MLT-3 symbols and then re-encode them as NRZI. This is also why descrambling happens in the PMD, even though it probably should happen in the PMA.

2. Also inherited from FDDI

3. I don’t know whether back-to-back frame support was inherited from FDDI or was introduced with Fast Ethernet. It’s likely that someone produced PCSs which allowed this, and lobbied to ensure that such behavior remained standard

posted at 05:13 · hardware · ethernet pedantry