[skip nav] www.ackadia.com
ant worker


» System Architecture - A look at memory types over the years «

Introduction


Memory
[Δ]

Main memory

The storage device used by a computer to hold the currently executing program and its working data. A modern computer's main memory is built from random access memory integrated circuits. In the old days ferrite core memory was one popular form of main memory, leading to the use of the term "core" for main memory. Computers have several other sorts of memory, distinguished by their access time, storage capacity, cost, and the typical lifetime or rate of change of the data they hold. Registers in the CPU are fast, few, expensive and typically change every few machine instructions. Other kinds are cache, PROM, magnetic disk (which may be used for virtual memory), and magnetic tape.

The system memory is the place where the computer holds current programs and data that are in use, and, because of the demands made by increasingly powerful software, system memory requirements have been accelerating at an alarming pace over the last few years. The result is that modern computers have significantly more memory than the first PCs of the early 1980s, and this has had an effect on development of the PC's architecture. Storing and retrieving data from a large block of memory is more time-consuming than from a small block. RAM is an impermanent source of data, but is the main memory area accessed by the hard disk. It acts, so to speak, as a staging post between the hard disk and the processor. The more data it is possible to have available in the RAM the faster the PC will run.

Main memory is attached to the processor via its address and data buses. Each bus consists of a number of electrical circuits or bits. The width of the address bus dictates how many different memory locations can be accessed, and the width of the data bus how much information is stored at each location. Every time a bit is added to the width of the address bus, the address range doubles. All Intel processors from the 386 onwards have had 32-bit address buses enabling them to access up to 4GB of memory. Modern processors have 64-bit data buses, so they can access 8 bytes of data at a time.

Cache Memory

A small fast memory holding recently accessed data, designed to speed up subsequent access to the same data. Most often applied to processor-memory access but also used for a local copy of data accessible over a network etc.

When data is read from, or written to, main memory a copy is also saved in the cache, along with the associated main memory address. The cache monitors addresses of subsequent reads to see if the required data is already in the cache. If it is (a cache hit) then it is returned immediately and the main memory read is aborted (or not started). If the data is not cached (a cache miss) then it is fetched from main memory and also saved in the cache. This depends on the cache design but mostly on its size relative to the main memory. The size is limited by the cost of fast memory chips.

The hit rate also depends on the access pattern of the particular program being run (the sequence of addresses being read and written). Caches rely on two properties of the access patterns of most programs: temporal locality - if something is accessed once, it is likely to be accessed again soon, and spatial locality - if one memory location is accessed then nearby memory locations are also likely to be accessed. In order to exploit spatial locality, caches often operate on several words at a time, a "cache line" or "cache block". Main memory reads and writes are whole cache lines.

When the processor wants to write to main memory, the data is first written to the cache on the assumption that the processor will probably read it again soon. Various different policies are used. In a write-through cache, data is written to main memory at the same time as it is cached. In a write-back cache it is only written to main memory when it is forced out of the cache.

If all accesses were writes then, with a write-through policy, every write to the cache would necessitate a main memory write, thus slowing the system down to main memory speed. However, statistically, most accesses are reads and most of these will be satisfied from the cache. Write-through is simpler than write-back because an entry that is to be replaced can just be overwritten in the cache as it will already have been copied to main memory whereas write-back requires the cache to initiate a main memory write of the flushed entry followed (for a processor read) by a main memory read. However, write-back is more efficient because an entry may be written many times in the cache without a main memory access.

When the cache is full and it is desired to cache another line of data then a cache entry is selected to be written back to main memory or "flushed". The new line is then put in its place. Which entry is chosen to be flushed is determined by a "replacement algorithm".

Some processors have separate instruction and data caches. Both can be active at the same time, allowing an instruction fetch to overlap with a data read or write. This separation also avoids the possibility of bad cache conflict between say the instructions in a loop and some data in an array which is accessed by that loop.

Primary cache (L1 cache, level one cache)
A small, fast cache memory inside or close to the CPU chip. For example, an Intel 80486 has an eight-kilobyte on-chip cache, and most Pentiums have a 16-KB on-chip level one cache that consists of an 8-KB instruction cache and an 8-KB data cache.

Secondary cache ("Second level cache", "level two cache", "L2 cache")
A larger, slower cache between the primary cache and main memory. Whereas the primary cache is often on the same integrated circuit as the central processing unit (CPU), a secondary cache is usually connected to the CPU via its external bus.

Memory (asynchronous)

Asynchronous Operation
An asynchronous interface is one where a minimum period of time is determined to be necessary to ensure an operation is complete. Each of the internal operations of an asynchronous DRAM chip are assigned minimum time values, so that if a clock cycle occurs any time prior to that minimum time another cycle must occur before the next operation is allowed to begin.


Basic DRAM operation (Dynamic Random Access Memory)
16kb chips were introduced in 1980, 64kb in 1982, 256kb in 1984 and 1Mb in 1988.

A DRAM memory array can be thought of as a table of cells. These cells are comprised of capacitors, and contain one or more 'bits' of data, depending upon the chip configuration. This table is addressed via row and column decoders, which in turn receive their signals from the RAS\ (Row Address Select)and CAS\ (Column Address Select) clock generators. In order to minimise the package size, the row and column addresses are multiplexed into row and column address buffers. For example, if there are 11 address lines, there will be 11 row and 11 column address buffers. Access transistors called 'sense amps' are connected to the each column and provide the read and restore operations of the chip. Since the cells are capacitors that discharge for each read operation, the sense amp must restore the data before the end of the access cycle.

The capacitors used for data cells tend to bleed off their charge, and therefore require a periodic refresh cycle or data will be lost. A refresh controller determines the time between refresh cycles, and a refresh counter ensures that the entire array (all rows) are refreshed. Of course, this means that some cycles are used for refresh operations, and has some impact on performance.

A typical memory access would occur as follows. First, the row address bits are placed onto the address pins. After a period of time the RAS\ signal falls, which activates the sense amps and causes the row address to be latched into the row address buffer. When the RAS\ signal stabilises, the selected row is transferred onto the sense amps. Next, the column address bits are set up, and then latched into the column address buffer when CAS\ falls, at which time the output buffer is also turned on. When CAS\ stabilises, the selected sense amp feeds its data onto the output buffer.

Page Mode Access
By implementing special access modes, it was possible to eliminate some of the internal operations. The RAS\ signal could be held active so that an entire 'page' of data is held on the sense amps. New column addresses can then be repeatedly clocked in only by cycling CAS\., since the row address setup and hold times are eliminated. much faster random access reads were possible.

Fast Page Mode
This design eliminated the column address setup time during the page cycle by activating the column address buffers on the falling edge of RAS\ (rather than CAS\). Since RAS\ remains low for the entire page cycle, this acts as a transparent latch when CAS\ is high, and allows address setup to occur as soon as the column address is valid, rather than waiting for CAS\ to fall.

Reduced power consumption, mainly because sense and restore current were not necessary during page mode access. It still had some drawbacks. The output buffers turn off when CAS\ goes high. The minimum cycle time is 5ns before the output buffers turn off, which essentially adds at least 5ns to the cycle time.

EDO (Extended Data Outpout) or Hyperpage RAM
EDO no longer turned off the output buffers upon the rising edge of /CAS. In essence, this eliminated the column pre-charge time while latching the data out. This allows the minimum time for /CAS to be low to be reduced, and the rising edge can come earlier.

Typically refreshed at 70ns, before 60ns became standard (50ns were available at a premium). In addition to much better access times, it used the same amount of silicon and the same package size.

Parity Memory
Parity checking, found on SIMM chips, refers to the way a computer ensures that stored data is not corrupted. SIMMs with no parity use eight bits to store each byte of data, while SIMMs with parity checking dedicate an additional ninth bit specifically for error detection. Parity is mainly found on older machines since newer RAM chips are more reliable, making parity checking unnecessary

SIP - Single In-line Package
These contains a complete RAM bank and was the earlier type of memory used with PC's.

30 pin SIMMs - Single In-line Memory Module

30 pin SIMM Also contains a complete RAM bank. The first SIMMs and SIPs had 30 pads/pins Available capacities were 256kb, 512kb, 1Mb, 2Mb and 4Mb

They were generally refreshed at 80ns or slower, before 70ns before common and came in two forms

  • 9-chip SIMM: 9 chips of 1 bit wide
  • 3-chip SIMM: 2 chips of 4 bits wide and 1 chip of 1 bit wide or 3 chips of 3 bits wide

If the correct refresh is supplied, SIMMs with a different number of chips and different speed can be used together.

72 pin SIMMs were mostly used in 486 class and higher Personal Computers.

Available capacities: 1, 2, 4, 8, 16, 32 and (both rare & hugely expensive) 64 Mbytes.

These were 32 bits and 4 parity bits wide. 4 pins are assigned for speed detection.


MEMORY (synchronous)


Synchronous Operation
Once it became apparent that bus speeds would need to run faster than 66MHz, DRAM designers needed to find a way to overcome the significant latency issues that still existed. By implementing a synchronous interface, they were able to do this and gain some additional advantages as well.

With an asynchronous interface, the processor must wait idly for the DRAM to complete its internal operations, which typically takes about 60ns. With synchronous control, the DRAM latches information from the processor under control of the system clock. These latches store the addresses, data and control signals, which allows the processor to handle other tasks. After a specific number of clock cycles the data becomes available and the processor can read it from the output lines.

Another advantage is that the system clock is the only timing edge that needs to be provided to the DRAM, eliminating the need for multiple timing strobes to be propagated. The inputs are simplified as well, since the control signals, addresses and data can all be latched in without the processor monitoring setup and hold timings. Similar benefits are realised for output operations as well. All DRAMs that have a synchronous interface are known generically as SDRAM

DIMM (Dual In-line Memory Module)
64 bit memory module with 168 pads. When installed in pairs, DIMMs support interleaved memory, with a 128-bit data path

SDRAM (Synchronous DRAM)
168 pin DIMMs, clock synchronised with the processor busses.

SDRAM chips are officially rated in MHz, rather than nanoseconds (ns) so that there is a common denominator between the bus speed and the chip speed. This speed is determined by dividing 1 second (1 billion ns) by the output speed of the chip. For example a 67MHz SDRAM chip (PC 66) is rated as 15ns. Note that this nanosecond rating is not measuring the same timing as an asynchronous DRAM chip. Remember, internally all DRAM operates in a very similar manner, and most performance gains are achieved by 'hiding' the internal operations in various ways.

Originally available for a 66MHz memory bus, this was eventually replaced by the 100Mhz version launched around May 1998 for use with the Intel BX chipset and the latest at 133Mhz, used with the (delayed) Camino chipset and VIA technologies chipsets.

PC100 SDRAM on a 100MHz (or faster) system bus will provide a performance boost for Socket 7 systems of between 10% and 15%, since the L2 cache is running at system bus speed. Pentium II systems will not see as big a boost, because the L2 cache is running at ½ processor speed anyway, with the exception of the cacheless Celeron chips of course.

PC133 (133Mhz) SDRAM is capable of transferring data at up to 1.6GBps. Before CPU's became multiplier locked, 133Mhz was seen as The Holy Grail, and many hundreds of dollars were spent on SDRAM as users strove for maximum memory + performance. Due to locking, gone are the days when you could run (burn out) your Pentium II 333's at 3 x 133 (500Mhz).

DRDRAM (Direct Rambus DRAM) Is a totally new RAM architecture, complete with bus mastering (the Rambus Channel Master) and a new pathway (the Rambus Channel) between memory devices (the Rambus Channel Slaves). On the surface, this looks to be a very fast solution for system memory due to its fast operation (up to 800MHz). The reality is, however, that the design is only up to twice as fast as current SDRAM operation due to the smaller bus width (16 bits vs. 64 bits).

A Direct Rambus channel includes a controller and one or more Direct RDRAMs connected together via a common bus - which can also connect to devices such as micro-processors, digital signal processors graphics processors and.ASICS The controller is located at one end, and the RDRAMS are distributed along the bus, which is parallel terminated at the far end. The two-byte wide channel uses a small number of very high speed signals to carry all address, data and control information at up to 800MHz. The signalling technology is called Rambus Signalling Logic. Each RSL signal wire has equal loading and fan-out is routed parallel to each other on the top trace of a PCB with a ground plane located on the layer underneath. Through continuous incremental improvement signalling data rates are expected to increase by about 100MHz a year to reach a speed of around 1000MHz by the year 2001.

At current speeds a single channel is capable of data transfer at 1.6GBps and multiple channels can be used in parallel to achieve a throughput of up to 6.4GBps. The new architecture will be capable of operating at a system bus speed of up to 133MHz.

Despite the claims from Intel and Rambus, Inc., there are some potentially serious issues which need to be addressed with this technology. The higher speeds require short wire lengths and additional shielding to prevent problems with EMI. In addition, latency times are actually worse than currently available fast SDRAM. Since most of today's applications do not actually utilise the full bandwidth of the memory bus even today, simply increasing the bandwidth while ignoring latency issues will likely not provide any real performance improvements. In addition, processors operating with 800MHz bus speeds will certainly require more than double the current memory bandwidth

It must be noted that on 27th September 1999 Intel aborted the world-wide release of motherboards based on this technology - at the final hour - due to problems actually implementing it!


MEMORY ROM and Static


ROM (Read Only Memory)
A type of data storage device which is manufactured with fixed contents. In its most general sense, the term might be used for any storage system whose contents cannot be altered, such as a gramophone record or a printed book; however, the term is most often applied to semiconductor integrated circuit memories, of which there are several types, and CD-ROM.

ROM is inherently non-volatile storage - it retains its contents even when the power is switched off, in contrast to RAM.

ROM is often used to hold programs for embedded systems since these usually have a fixed purpose. ROM is also used for storage of the lowest level bootstrap software (firmware) in a computer.

PROM (Programmable Read Only Memory)
A kind of ROM which can be written using a PROM programmer. The contents of each bit is determined by a fuse or antifuse. The memory can be programmed once after manufacturing by "blowing" the fuses, which is an irreversible process. Blowing a fuse opens a connection while blowing an antifuse closes a connection (hence the name). Programming is done by applying high-voltage pulses which are not encountered during normal operation.

EPROM (Erasable Programmable Read Only Memory)
Erased by exposing the EPROM to ultraviolet light.
A type of storage device in which the data is determined by electrical charge stored in an isolated ("floating") MOS transistor gate. The isolation is good enough to retain the charge almost indefinitely (more than ten years) without an external power supply. The EPROM is programmed by "injecting" charge into the floating gate, using a technique based on the tunnel effect. This requires higher voltage than in normal operation (usually 12V - 25V). The floating gate can be discharged by applying ultraviolet light to the chip's surface through a quartz window in the package, erasing the memory contents and allowing the chip to be reprogrammed.

EEPROM (Electrically Erasable Programmable Read Only Memory)

Early 8k EEPROM - just $220 CAN

STATIC RAM
Random access memory in which each bit of storage is a bistable flip-flop, commonly consisting of cross-coupled inverters. It is called "static" because it will retain a value as long as power is supplied, unlike dynamic random access memory (DRAM) which must be regularly refreshed. It is however, still volatile, i.e. it will lose its contents when the power is switched off, in contrast to ROM. SRAM is usually faster than DRAM but since each bit requires several transistors (about six) you can get less bits of SRAM in the same area. It usually costs more per bit than DRAM and so is used for the most speed-critical parts of a computer (e.g. cache memory) or other circuit.