Speed Matters: How Ethernet Went From 3 Mbps to 100 Gbps… and Beyond

Archive for October, 2011

FPGA Process Technology

Basic process technology types

  • SRAM- based on static memory technology. In-system programmable and re-programmable. Requires external boot devices. CMOS.
  • Antifuse – One-time programmable. CMOS.
  • PROM – Programmable Read-Only Memory technology. One-time programmable because of plastic packaging.
  • EPROM – Erasable Programmable Read-Only Memory technology. One-time programmable but with window, can be erased with ultraviolet (UV) light. CMOS.
  • EEPROM – Electrically Erasable Programmable Read-Only Memory technology. Can be erased, even in plastic packages. Some but not all EEPROM devices can be in-system programmed. CMOS.
  • Flash – Flash-erase EPROM technology. Can be erased, even in plastic packages. Some but not all flash devices can be in-system programmed. Usually, a flash cell is smaller than an equivalent EEPROM cell and is therefore less expensive to manufacture. CMOS.
  • Fuse – One-time programmable. Bipolar

 

Manufacturers of FPGA

Major manufacturers

  • Xilinx and Altera are the current FPGA market leaders and long-time industry rivals.
  • Together, they control over 80 percent of the market, with Xilinx alone representing over 50 percent.
  • Both Xilinx and Altera provide free Windows and Linux design software which provides limited set of devices.
  • Other competitors include Lattice Semiconductor (SRAM based with integrated configuration Flash, instant-on, low power, live reconfiguration), Actel (antifuse, flash-based, mixed-signal), SiliconBlue Technologies (extremely low power SRAM-based FPGAs with option integrated nonvolatile configuration memory), Achronix (RAM based, 1.5 GHz fabric speed) who will be building their chips on Intels’ state-of-the art 22 nm process, and QuickLogic (handheld focused CSSP, no general purpose FPGAs).
  • In March 2010, Tabula announced their new FPGA technology that uses time-multiplexed logic and interconnect for greater potential cost savings for high-density applications

 

Comparison of FPGA

FPGA comparisons

Historically, FPGAs have been slower, less energy efficient and generally achieved less functionality than their fixed ASIC counterparts. A study has shown that designs implemented on FPGAs need on average 18 times as much area, draw 7 times as much dynamic power, and are 3 times slower than the corresponding ASIC implementations.

Advantages include the ability to re-program in the field to fix bugs, and may include a shorter time to market and lower non-recurring engineering costs. Vendors can also take a middle road by developing their hardware on ordinary FPGAs, but manufacture their final version so it can no longer be modified after the design has been committed.

Xilinx claims that several market and technology dynamics are changing the ASIC/FPGA paradigm:

  • Integrated circuit costs are rising aggressively
  • ASIC complexity has lengthened development time
  • R&D resources and headcount are decreasing
  • Revenue losses for slow time-to-market are increasing
  • Financial constraints in a poor economy are driving low-cost technologies

These trends make FPGAs a better alternative than ASICs for a larger number of higher-volume applications than they have been historically used for, to which the company attributes the growing number of FPGA design starts (see History).

Some FPGAs have the capability of partial re-configuration that lets one portion of the device be re-programmed while other portions continue running.

Versus complex programmable logic devices

The primary differences between CPLDs (Complex Programmable Logic Devices) and FPGAs are architectural. A CPLD has a somewhat restrictive structure consisting of one or more programmable sum-of-products logic arrays feeding a relatively small number of clocked registers. The result of this is less flexibility, with the advantage of more predictable timing delays and a higher logic-to-interconnect ratio. The FPGA architectures, on the other hand, are dominated by interconnect. This makes them far more flexible (in terms of the range of designs that are practical for implementation within them) but also far more complex to design for.

In practice, the distinction between FPGAs and CPLDs is often one of size as FPGAs are usually much larger in terms of resources than CPLDs. Typically only FPGA’s contain more advanced embedded functions such as adders, multipliers, memory, serdes and other hardened functions. Another common distinction is that CPLDs contain embedded flash to store their configuration while FPGAs usually, but not always, require an external flash or other device to store their configuration.

Security considerations

With respect to security, FPGAs have both advantages and disadvantages as compared to ASICs or secure microprocessors. FPGAs’ flexibility makes malicious modifications during fabrication a lower risk. For many FPGAs, the loaded design is exposed while it is loaded (typically on every power-on). To address this issue, some FPGAs support bit stream encryption. Although in July 2011, researchers published papers highlighting vulnerabilities in the bit stream encryption of some devices related to the analysis of the device’s power usage fluctuations. These vulnerabilities apply to the current devices of most FPGA manufacturers, including Altera and Xilinx.

 

FPPGA History

Gates

  • 1987: 9,000 gates, Xilinx
  • 1992: 600,000, Naval Surface Warfare Department
  • Early 2000s: Millions

Market size

  • 1985: First commercial FPGA technology invented by Xilinx
  • 1987: $14 million
  • ~1993: >$385 million
  • 2005: $1.9 billion
  • 2010 estimates: $2.75 billion

FPGA design starts

  • 10,000
  • 2005: 80,000
  • 2008: 90,000

 

History of FPGA

The FPGA industry sprouted from programmable read-only memory (PROM) and programmable logic devices (PLDs). PROMs and PLDs both had the option of being programmed in batches in a factory or in the field (field programmable), however programmable logic was hard-wired between logic gates.

1980s

In the late 1980s the Naval Surface Warfare Department funded an experiment proposed by Steve Casselman to develop a computer that would implement 600,000 reprogrammable gates. Casselman was successful and a patent related to the system was issued in 1992.

Some of the industry’s foundational concepts and technologies for programmable logic arrays, gates, and logic blocks are founded in patents awarded to David W. Page and LuVerne R. Peterson in 1985.

Xilinx Co-Founders, Ross Freeman and Bernard Vonderschmitt, invented the first commercially viable field programmable gate array in 1985 – the XC2064. The XC2064 had programmable gates and programmable interconnects between gates, the beginnings of a new technology and market.  The XC2064 boasted a mere 64 configurable logic blocks (CLBs), with two 3-input lookup tables (LUTs).  More than 20 years later, Freeman was entered into the National Inventors Hall of Fame for his invention.

Xilinx continued unchallenged and quickly growing from 1985 to the mid-1990s, when competitors sprouted up, eroding significant market-share. By 1993, Actel was serving about 18 percent of the market.

1990s

The 1990s were an explosive period of time for FPGAs, both in sophistication and the volume of production. In the early 1990s, FPGAs were primarily used in telecommunications and networking. By the end of the decade, FPGAs found their way into consumer, automotive, and industrial applications.

FPGAs got a glimpse of fame in 1997, when Adrian Thompson, a researcher working at the University of Sussex, merged genetic algorithm technology and FPGAs to create a sound recognition device. Thomson’s algorithm configured an array of 10 x 10 cells in a Xilinx FPGA chip to discriminate between two tones, utilising analogue features of the digital chip. The application of genetic algorithms to the configuration of devices like FPGAs is now referred to as Evolvable hardware.

Modern developments

A recent trend has been to take the coarse-grained architectural approach a step further by combining the logic blocks and interconnects of traditional FPGAs with embedded microprocessors and related peripherals to form a complete “system on a programmable chip”. This work mirrors the architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems Group which combined a reconfigurable CPU architecture on a single chip called the SB24. That work was done in 1982. Examples of such hybrid technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices, which include one or more PowerPC processors embedded within the FPGA’s logic fabric. The Atmel FPSLIC is another such device, which uses an AVR processor in combination with Atmel’s programmable logic architecture. The Actel SmartFusion devices incorporate an ARM architecture Cortex-M3 hard processor core (with up to 512kB of flash and 64kB of RAM) and analog peripherals such as a multi-channel ADC and DACs to their flash-based FPGA fabric.

In 2010, an extensible processing platform was introduced for FPGAs that fused features of an ARM high-end microcontroller (hard-core implementations of a 32-bit processor, memory, and I/O) with an FPGA fabric to make FPGAs easier for embedded designers to use. By incorporating the ARM processor-based platform into a 28 nm FPGA family, the extensible processing platform enables system architects and embedded software developers to apply a combination of serial and parallel processing to address the challenges they face in designing today’s embedded systems, which must meet ever-growing demands to perform highly complex functions. By allowing them to design in a familiar ARM environment, embedded designers can benefit from the time-to-market advantages of an FPGA platform compared to more traditional design cycles associated with ASICs.

An alternate approach to using hard-macro processors is to make use of soft processor cores that are implemented within the FPGA logic. MicroBlaze and Nios II are examples of popular softcore processors.

As previously mentioned, many modern FPGAs have the ability to be reprogrammed at “run time,” and this is leading to the idea of reconfigurable computing or reconfigurable systems — CPUs that reconfigure themselves to suit the task at hand. The Mitrion Virtual Processor from Mitrionics is an example of a reconfigurable soft processor, implemented on FPGAs. However, it does not support dynamic reconfiguration at runtime, but instead adapts itself to a specific program.

Additionally, new, non-FPGA architectures are beginning to emerge. Software-configurable microprocessors such as the Stretch S5000 adopt a hybrid approach by providing an array of processor cores and FPGA-like programmable cores on the same chip.

 

FPGA

A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by the customer or designer after manufacturing—hence “field-programmable”. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare). FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the functionality after shipping, partial re-configuration of the portion of the design and the low non-recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer advantages for many applications

FPGAs contain programmable logic components called “logic blocks”, and a hierarchy of reconfigurable interconnects that allow the blocks to be “wired together”—somewhat like many (changeable) logic gates that can be inter-wired in (many) different configurations. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory.

In addition to digital functions, some FPGAs have analog features. The most common analog feature is programmable slew rate and drive strength on each output pin, allowing the engineer to set slow rates on lightly loaded pins that would otherwise ring unacceptably, and to set stronger, faster rates on heavily loaded pins on high-speed channels that would otherwise run too slow. Another relatively common analog feature is differential comparators on input pins designed to be connected to differential signaling channels. A few “mixed signal FPGAs” have integrated peripheral Analog-to-Digital Converters (ADCs) and Digital-to-Analog Converters (DACs) with analog signal conditioning blocks allowing them to operate as a system-on-a-chip. Such devices blur the line between an FPGA, which carries digital ones and zeros on its internal programmable interconnect fabric, and field-programmable analog array (FPAA), which carries analog values on its internal programmable interconnect fabric.

Basics

What is an FPGA?

  • Before the advent of programmable logic, custom logic circuits were built at the board level using standard components, or at the gate level in expensive application-specific (custom) integrated circuits.
  • The FPGA is an integrated circuit that contains many (64 to over 10,000) identical logic cells that can be viewed as standard components.
  •  Each logic cell can independently take on any one of  a limited set of personalities.
  • The individual cells are interconnected by a matrix of wires and programmable switches.
  • A user’s design is implemented by specifying the simple logic function for each cell and selectively closing the switches in the interconnect matrix.
  • The array of logic cells and interconnect form a fabric of basic building blocks for logic circuits.
  • Complex designs are created by combining these basic blocks to create the desired circuit.

What does a logic cell do?

  • The logic cell architecture varies between different device families.
  • Generally, each logic cell combines a few binary inputs (typically between 3 and 10) to one or two outputs according to a Boolean logic function specified in the user program.
  • In most families, the user also has the option of registering the combinatorial output of the cell, so that clocked logic can be easily implemented.
  • The cell’s combinatorial logic may be physically implemented as a small look-up table memory (LUT) or as a set of multiplexers and gates.
  • LUT devices tend to be a bit more flexible and provide more inputs per cell than multiplexer cells at the expense of propagation delay.

What does ‘Field Programmable’ mean?

  • Field Programmable means that the FPGA’s function is defined by a user’s program rather than by the manufacturer of the device.
  • A typical integrated circuit performs a particular function defined at the time of manufacture.
  • In contrast, the FPGA’s function is defined by a program written by someone other than the device manufacturer.
  • Depending on the particular device, the program is either  ‘burned’ in  permanently or semi-permanently as part of a board assembly process, or is loaded from an external memory each time the device is powered up.
  • This user programmability gives the user access to complex integrated designs without the high engineering costs associated with application specific integrated circuits.

How are FPGA programs created?

  • Individually defining the many switch connections and cell logic functions would be a daunting task.
  • Fortunately, this task is handled by special software.
  • The software translates a user’s schematic diagrams or textual hardware description language code then places and routes the translated design.
  • Most of the software packages have hooks to allow the user to influence implementation, placement and routing to obtain better performance and utilization of the device.
  • Libraries of more complex function macros (eg. adders) further simplify the design process by providing common circuits that are already optimized for speed or area.

Backward Error Correction

Backward Error Correction (BEC) is a type of error correction in which the receiver detects an error and sends a request for retransmission to the sender.

BEC protocols impose less bandwidth overhead than Forward Error Correction (FEC) protocols, but require more time and bandwidth to recover from errors.

Backward Error Correction is a better solution when errors are rare and bandwidth should be optimized.

BEC algorithms include:

  • Parity bits
  • CRC (Cyclic Redundancy Check)
  • LRC (Longitudinal Redundancy Check)

 

List of error-correcting codes

  • AN codes
  • BCH code
  • Constant-weight code
  • Convolutional code
  • Group codes
  • Golay codes, of which the Binary Golay code is of practical interest
  • Goppa code, used in the McEliece cryptosystem
  • Hadamard code
  • Hagelbarger code
  • Hamming code
  • Latin square based code for non-white noise (prevalent for example in broadband over powerlines)
  • Lexicographic code
  • Long code
  • Low-density parity-check code, also known as Gallager code, as the archetype for sparse graph codes
  • LT code, which is a near-optimal rateless erasure correcting code (Fountain code)
  • m of n codes
  • Online code, a near-optimal rateless erasure correcting code
  • Raptor code, a near-optimal rateless erasure correcting code
  • Reed–Solomon error correction
  • Reed–Muller code
  • Repeat-accumulate code
  • Repetition codes, such as Triple modular redundancy
  • Tornado code, a near-optimal erasure correcting code, and the precursor to Fountain codes
  • Turbo code
  • Walsh–Hadamard code

Forward Error Correction

Forward error correction

In telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels. The central idea is the sender encodes their message in a redundant way by using an error-correcting code (ECC). The American mathematician Richard Hamming pioneered this field in the 1940s and invented the first error-correcting code in 1950: the Hamming (7,4) code.

The redundancy allows the receiver to detect a limited number of errors that may occur anywhere in the message, and often to correct these errors without retransmission. FEC gives the receiver the ability to correct errors without needing a reverse channel to request retransmission of data, but at the cost of a fixed, higher forward channel bandwidth. FEC is therefore applied in situations where retransmissions are costly or impossible, such as when broadcasting to multiple receivers in multicast. FEC information is usually added to mass storage devices to enable recovery of corrupted data.

FEC processing in a receiver may be applied to a digital bit stream or in the demodulation of a digitally modulated carrier. For the latter, FEC is an integral part of the initial analog-to-digital conversion in the receiver. The Viterbi decoder implements a soft-decision algorithm to demodulate digital data from an analog signal corrupted by noise. Many FEC coders can also generate a bit-error rate (BER) signal which can be used as feedback to fine-tune the analog receiving electronics.

The maximum fractions of errors or of missing bits that can be corrected is determined by the design of the FEC code, so different forward error correcting codes are suitable for different conditions.

How it works

FEC is accomplished by adding redundancy to the transmitted information using a predetermined algorithm. A redundant bit may be a complex function of many original information bits. The original information may or may not appear literally in the encoded output; codes that include the unmodified input in the output are systematic, while those that do not are non-systematic.

This allows an error in any one of the three samples to be corrected by “majority vote” or “democratic voting”. The correcting ability of this FEC is:

  • Up to 1 bit of triplet in error, or
  • up to 2 bits of triplet omitted (cases not shown in table).

Though simple to implement and widely used, this triple modular redundancy is a relatively inefficient FEC. Better FEC codes typically examine the last several dozen, or even the last several hundred, previously received bits to determine how to decode the current small handful of bits (typically in groups of 2 to 8 bits).

Averaging noise to reduce errors

FEC could be said to work by “averaging noise”; since each data bit affects many transmitted symbols, the corruption of some symbols by noise usually allows the original user data to be extracted from the other, uncorrupted received symbols that also depend on the same user data.

  • Because of this “risk-pooling” effect, digital communication systems that use FEC tend to work well above a certain minimum signal-to-noise ratio and not at all below it.
  • This all-or-nothing tendency — the cliff effect — becomes more pronounced as stronger codes are used that more closely approach the theoretical Shannon limit.
  • Interleaving FEC coded data can reduce the all or nothing properties of transmitted FEC codes when the channel errors tend to occur in bursts. However, this method has limits; it is best used on narrowband data.

Most telecommunication systems used a fixed channel code designed to tolerate the expected worst-case bit error rate, and then fail to work at all if the bit error rate is ever worse. However, some systems adapt to the given channel error conditions: hybrid automatic repeat-request uses a fixed FEC method as long as the FEC can handle the error rate, then switches to ARQ when the error rate gets too high; adaptive modulation and coding uses a variety of FEC rates, adding more error-correction bits per packet when there are higher error rates in the channel, or taking them out when they are not needed.

Types of FEC

The two main categories of FEC codes are block codes and convolutional codes.

  • Block codes work on fixed-size blocks (packets) of bits or symbols of predetermined size. Practical block codes can generally be decoded in polynomial time to their block length.
  • Convolutional codes work on bit or symbol streams of arbitrary length. They are most often decoded with the Viterbi algorithm, though other algorithms are sometimes used. Viterbi decoding allows asymptotically optimal decoding efficiency with increasing constraint length of the convolutional code, but at the expense of exponentially increasing complexity. A convolutional code can be turned into a block code, if desired, by “tail-biting”.

There are many types of block codes, but among the classical ones the most notable is Reed-Solomon coding because of its widespread use on the Compact disc, the DVD, and in hard disk drives. Other examples of classical block codes include Golay, BCH, Multidimensional parity, and Hamming codes.

Hamming ECC is commonly used to correct NAND flash memory errors. This provides single-bit error correction and 2-bit error detection. Hamming codes are only suitable for more reliable single level cell (SLC) NAND. Denser multi level cell (MLC) NAND requires stronger multi-bit correcting ECC such as BCH or Reed–Solomon.

Classical block codes are usually implemented using hard-decision algorithms, which means that for every input and output signal a hard decision is made whether it corresponds to a one or a zero bit. In contrast, soft-decision algorithms like the Viterbi decoder process (discretized) analog signals, which allows for much higher error-correction performance than hard-decision decoding.

Nearly all classical block codes apply the algebraic properties of finite fields.

Concatenated FEC codes for improved performance

Classical (algebraic) block codes and convolutional codes are frequently combined in concatenated coding schemes in which a short constraint-length Viterbi-decoded convolutional code does most of the work and a block code (usually Reed-Solomon) with larger symbol size and block length “mops up” any errors made by the convolutional decoder. Single pass decoding with this family of error correction codes can yield very low error rates, but for long range transmission conditions (like deep space) iterative decoding is recommended.

Concatenated codes have been standard practice in satellite and deep space communications since Voyager 2 first used the technique in its 1986 encounter with Uranus. The Galileo craft used iterative concatenated codes to compensate for the very high error rate conditions caused by having a failed antenna.

Low-density parity-check (LDPC)

Low-density parity-check (LDPC) codes are a class of recently re-discovered highly efficient linear block codes. They can provide performance very close to the channel capacity (the theoretical maximum) using an iterated soft-decision decoding approach, at linear time complexity in terms of their block length. Practical implementations can draw heavily from the use of parallelism.

LDPC codes were first introduced by Robert G. Gallager in his PhD thesis in 1960, but due to the computational effort in implementing encoder and decoder and the introduction of Reed–Solomon codes, they were mostly ignored until recently.

LDPC codes are now used in many recent high-speed communication standards, such as DVB-S2 (Digital video broadcasting), WiMAX (IEEE 802.16e standard for microwave communications), High-Speed Wireless LAN (IEEE 802.11n), 10GBase-T Ethernet (802.3an) and G.hn/G.9960 (ITU-T Standard for networking over power lines, phone lines and coaxial cable). Other LDPC codes are standardized for wireless communication standards within 3GPP MBMS.

Turbo codes

Turbo coding is an iterated soft-decoding scheme that combines two or more relatively simple convolutional codes and an interleaver to produce a block code that can perform to within a fraction of a decibel of the Shannon limit. Predating LDPC codes in terms of practical application, they now provide similar performance.

One of the earliest commercial applications of turbo coding was the CDMA2000 1x (TIA IS-2000) digital cellular technology developed by Qualcomm and sold by Verizon Wireless, Sprint, and other carriers. It is also used for the evolution of CDMA2000 1x specifically for Internet access, 1xEV-DO (TIA IS-856). Like 1x, EV-DO was developed by Qualcomm, and is sold by Verizon Wireless, Sprint, and other carriers (Verizon’s marketing name for 1xEV-DO is Broadband Access, Sprint’s consumer and business marketing names for 1xEV-DO are Power Vision and Mobile Broadband, respectively.).

Local decoding and testing of codes

Sometimes it is only necessary to decode single bits of the message, or to check whether a given signal is a codeword, and do so without looking at the entire signal. This can make sense in a streaming setting, where codewords are too large to be classically decoded fast enough and where only a few bits of the message are of interest for now. Also such codes have become an important tool in computational complexity theory, e.g., for the design of probabilistically checkable proofs.

Locally decodable codes are error-correcting codes for which single bits of the message can be probabilistically recovered by only looking at a small (say constant) number of positions of a codeword, even after the codeword has been corrupted at some constant fraction of positions. Locally testable codes are error-correcting codes for which it can be checked probabilistically whether a signal is close to a codeword by only looking at a small number of positions of the signal.