Parity Memory Error Correcting Code Memory Results of Mixing Parity and ECC Error Correcting Code-Parity (ECC-P) Memory Requirements for Using ECC-P Enabling ECC-P Performance, ECC vs. ECC-P vs. EOS ECC on SIMMs (EOS) Memory Mixing EOS and Parity? Memory Parity Errors: Causes and Suggestions Preface- This is written for the clone systems. Some stuff is not exactly true for microchannel systems. However, you will recognize some reasons for Traps under OS/2 and the odd memory errors that seem incomprehensible. From M$, KB Article Q101272
1. Memory not functioning at the specified
access rate as required by system board.
2. Memory meets specs, but speeds are different
between SIMMs.
3. Individual chips on SIMM module run at different
access rates.
4.One of the memory chips is being affected
by "cell leakage."
5.Cache memory is another thing to suspect.
In general, you should first carefully clean the system of dust.
This includes the areas allowing ventilation so that heat does not build
up abnormally. The contacts of all boards and SIMMs should be cleaned.
You can use the eraser of a pencil to do this, thus ensuring good contacts.
From Dr. Jim
Not a good plan, if the contacts are gold plated. A pencil eraser will strip away some of the gold. The edges of a dollar bill, or a chunk of good quality bond typing paper, folded over and rubbed briskly over the contacts seems to work very well. Continued...
Parity Memory Parity memory is standard IBM memory with 32 bits of data space and 4 bits of parity information (one check bit/byte of data). The 4 bits of parity information are able to tell you an error has occurred but do not have enough information to locate which bit is in error. In the event of a parity error, the system generates a non-maskable interrupt (NMI) which halts the system. Double bit errors are undetected with parity memory. Error Correcting Code Memory The requirements for system memory in PC servers has increased dramatically over the past few years. Several reasons include the availability of 32 bit operating systems and the caching of hard disk data on file servers. As system memory is increased, the possibility for memory errors increase. Thus, protection against system memory failures becomes increasingly important. Traditionally, systems which implement only parity memory halt on single-bit errors, and fail to detect double-bit errors entirely. Clearly, as memory is increased, better techniques are required. To combat this problem, the IBM PC Servers employ schemes to detect and correct memory errors. These schemes are called Error Correcting Code (or sometimes Error Checking and Correcting but more commonly just ECC). ECC can detect and correct single bit-errors, detect double-bit errors, and detect some triple-bit errors. ECC works like parity by generating extra check bits with the data as it is stored in memory. However, while parity uses only 1 check bit per byte of data, ECC uses 7 check bits for a 32-bit word and 8 bits for a 64-bit word. These extra check bits along with a special hardware algorithm allow for single-bit errors to be detected and corrected in real time as the data is read from memory.
The data is scanned as it is written to memory. This scan generates a unique
7-bit pattern which represents the data stored. This pattern is then stored
in the 7-bit check space.
If a single-bit error has occurred (the most common form
of error), the scan will always detect it, automatically correct it and
record its occurrence. In this case, system operation will not be affected.
Results of Mixing Parity and ECC Memory From Stephan Goll My box (95A) showed the showed the expected memory error. I didn´t know that I mixed ecc and parity (bankwise), so I ran the memory tests. This procedure told me what I has been doing, disabled the ecc-equipped banks, and the box after that ran fine with reduced memory. I believe the reason is that the first bank rules the type of ram the box wanted to see. Btw, I realized that the memory in the first bank is tested more intensive then in other banks, because I have failing mem-modules, but they work very well in one of the other banks, even in the mem-tests and under linux. Error Correcting Code-Parity (ECC-P) Memory Previous IBM servers such as the IBM Server 85 were able to use standard memory to implement what is known as ECC-P. ECC-P takes advantage of the fact that a 64-bit word needs 8 bits of parity in order to detect single-bit errors (one bit/byte of data). Since it is also possible to use an ECC algorithm on 64 bits of data with 8 check bits, IBM designed a memory controller which implements the ECC algorithm using the standard memory SIMMs. The following shows the implementation of ECC-P. When ECC-P is enabled via the reference diskette, the controller reads/writes two 32-bit words and 8 bits of check information to standard parity memory. Since 8 check bits are available on a 64-bit word, the system is able to correct single-bit errors and detect double-bit errors just like ECC memory.
While ECC-P uses standard non-expensive memory, it needs a specific memory controller that is able to read/write the two memory blocks and check and generate the check bits. Also, the additional logic necessary to implement the ECC circuitry make it slightly slower than true ECC memory. With the Server 85 ECC-P implementation, the system views memory as matched pairs of SIMMs and, in case of a double bit failure, will deallocate both SIMMs in a matched pair. With the price between standard memory and ECC has narrowed, IBM no longer implements ECC-P. NOTE! Parity and ECC-on-SIMM memory can not be installed within the same system. Requirements to
use ECC-P
Enabling ECC-P
Performance
Degredation
As previously discussed, systems which employ ECC memory have slightly longer memory access times depending on where the checking is done. It should be stressed that this affects only the access time of external system memory, not L1 or L2 caches. The following table shows the performance impacts as a percentage of system memory access times of the different ECC memory solutions. Again, these numbers represent only the impact to accessing
external memory. They do not represent the impact to overall system performance
which is harder to measure but will be substantially less.
ECC on SIMMs (EOS) Memory A server that supports one hundred or more users can justify the additional cost necessary to implement ECC on the system. It is harder to justify this cost for smaller configurations. It would be desirable for a customer to be able to upgrade his system at a reasonable cost to take advantage of ECC memory as his business grows. The problem is that the ECC and ECC-P techniques previously described use special memory controllers imbedded on the planar board which contain the ECC circuits. It is impossible to upgrade a system employing parity memory (with a parity memory controller) to ECC even if we upgrade the parity memory SIMMs to ECC memory SIMMs. To answer this problem, IBM has introduced a new type of memory SIMM which has the ECC logic integrated on the SIMM. These are called ECC on SIMMs or EOS memory SIMMs. With these SIMMs, the memory error is detected and corrected directly on the SIMM before the data gets to the memory controller. This solution allows a standard memory controller to be used on the planar board and allows the customer to upgrade a server to support error checking memory. The IBM ECC-on-SIMM Memory Upgrades offer 4, 8, 16 and 32MB of error-correcting-code (ECC) memory on a SIMM. This upgrade family -- a plug-compatible, fully retrofittable series of 70ns memory modules --allows you to upgrade a parity system to a fully functional single-error-correct (SEC) ECC system. The ECC function is completely self-contained on the SIMM and provides correction of single-bit errors that occur in each byte of SIMM data. No processor changes are required to receive the enhanced reliability this SIMM family offers. The ECC-on-SIMM Memory Upgrades are organized as x36 bits, support parity, and are packaged on a 72-pin JEDEC (Joint Electronic Device Engineering Council) standard SIMM with gold tabs. The 4MB SIMM has IBM presence detects, while the 8MB (Tall and Wide), 16MB and 32MB SIMMs have industry standard presence detects. Mixing EOS and
Parity?
Ahem ... not that I knew of. The EOS is ECC-On-SIMM ...
basically a workaround to use some ECC sort of error detection on a systemboard
that is originally designed for Parity only (like the crappy Micronics
board in the 320 and 520) and the technical basis of it all is still Parity
....
|