New AMD Opteron-Based ProLiant Servers

- written by Jeff Goldenbaum, Dataram Memory Blog Team Member

Dataram recently released ‘Smart DIMMs” for the newest AMD Opteron-based ProLiant servers from Hewlett-Packard.  Dataram Smart DIMMs have undergone rigorous testing and validation in HP G8 servers with AMD Opteron 6200 Series CPUs, and are guaranteed to be 100% compatible with HP’s SmartMemory options for Gen8 systems.

Dataram’s product offerings for these new AMD-based ProLiant servers include 4, 8, and 16GB 1333MHz and  1600MHz Registered DIMMs (RDIMMs), and a 32GB 1333MHz Load-Reduced DIMM (LRDIMM).

One of Hewlett-Packard’s new servers is the HP ProLiant BL465c Gen8 blade server, featuring one or two AMD Opteron 6200 processors and supporting up to 512GB of Dataram “Smart DIMMs”.   The other new server is the HP ProLiant DL385p Gen 8, featuring one or two AMD Opteron 6200 processors and supporting up to 768GB of Dataram memory.

Following is a list of Dataram memory offerings for these two new ProLiant servers:

Dataram PN                           HP PN                      Description
DRHA1333RSL/4GB          647871-B21        4GB 1Rx4 PC3L-10600R DIMM
DRHA1333RL/8GB            647877-B21        8GB 2Rx4 PC3L-10600R DIMM
DRHA1333RL/16GB         647883-B21        16GB 2Rx4 PC3L-10600R DIMM

DRHA1600RS/4GB            647873-B21        4GB 1Rx4 PC3-12800R DIMM
DRHA1600RS/8GB            647879-B21        8GB 1Rx4 PC3-12800R DIMM
DRHA1600R/16GB            672633-B21        16GB 2Rx4 PC3-12800R DIMM

DRHA1333LR/32GB         647885-B21        32GB 4Rx4 PC3L-10600L DIMM

Posted in Memory Posts | Tagged , , , , , , , , , | Comments Off

40 Years of Change in Main Memory Technology

~written by Jeff Duncan, Dataram Memory Blog Team Member

In 40 years, memories have gone from less than 1K bytes capacity, a size of 15” x 15” x 1”, 2 microsecond speeds (0.000002 seconds) using core memory and no standards, to 32 Gbytes, a size of 5.3” x 1.19 x .3”, 13.75 nanosecond speeds (.00000001375 second) using semiconductor memory and now using industry standard designs.

In the late 60s, main memory was composed of core memory. These modules were all custom designs, all shapes and physical sizes for each computer OEM. The base component in core memory was 20 mil ferrite cores. These needed three to four wires to be strung through the center of each core.  X and Y axis wires for selection of the core to be accessed (a bit) in a given array, a third to sense change and inhibit change (sense/inhibit line) to prevent a core from switching. In earlier core memory’s 4 lines were used sense and inhibit were separate wires and the cores were bigger. A good basic “how do core memories work” can be found at or on many other sites.

When I got involved in the mid-1970s, we were using 18-mil (0.018 inch) cores with three 3-mil (0.003 inch) wires. The first photo below is of the cores next to a pencil point. Looks like a fly left something behind. To give you a sense of size, the next photo shows an electronic test probe (as small as a pencil point) against a core memory array.


Under additional magnification, the previous test probe and cores:


By the way, people manually strung the wires through the core fields viewing through a microscope. This was very labor-intensive and tedious work that could take weeks, depending on the array (memory) size. The photos that follow are the previous array photos magnified more.  So, core memory took a long time to manufacture, was big and heavy, and was fairly expensive.



The core array was attached to a Printed Circuit Board for stability, and that core stack assembly was sometimes soldered to an electronics board or attached via connector system.  In the following photo, a Dataram Corp compatible DEC (Digital Equipment Corp) PDP-11 series Unibus core memory module (32Kbytes) has the core stack assembly underneath the controlling electronics board attached through connectors.  Unlike most of today’s memory modules besides the analog devices (discrete transistors, diodes, transformers, resistors) used to drive the core array, there were many semiconductor SSI (Small Scale Integration) logic devices to interface to the memory bus.  This made these designs much more complex.


The next photo of a core memory module was our custom design used as a generic module (32Kbytes). We would install multiple units in a chassis and they would be interface to a computer bus. Typically, they were used for main memory, extended memory and the first Solid States Disks.


In the late 1970s we built a state-of-the-art 256Kbytes module which we used for extended memory and solid state disk emulations for DEC, Data General, Honeywell, Perkin Elmer, Interdata, Varian, etc. The memory system was 128K x 18 (256Kbytes) and was 16.5 inches by 13.5 inches by 1.75 inches.  They would be installed in the chassis in the following photo with an interface for a specific computer type and disk emulation. This would provide a whopping 2Mbytes of storage in a 19-inch rack mount chassis.


Then in the mid 1980s we started using semiconductor memory made from silicon – more specifically, DRAM (Dynamic Random Access Memory). This was very, very small, and much faster than core memory. The following photo gives you a sense of size.  Four 18-mil cores are sitting on a DRAM die in wafer form. You are looking at millions of bits of semiconductor memory vs. four bits of core memory.


We probably converted over to DRAM memory in the 4Kbit days. Although we were using semiconductor memory, the computer OEM still used the large board sizes and similar bus interfaces, thus the control and buffer logic was still I every memory board.

As time went on, semiconductor cells were getting so small someone finally came up with the idea of putting the memory controller (all the logic) on the CPU board and the memory boards would just have DRAM on them. The SIMM came about (Single In-line Memory Module). These modules would plug right into connectors on the CPU board. SIMMs were typically 1Mx8/9 modules, so for most computers you needed two modules for 16 bit systems plus two bits for parity. The finger edge connectors on these boards were electrically connected top and bottom (Single In-line). Next was the dual in-line module (DIMM), which simply meant the top and bottom side connector fingers were independent allowing for more pin-outs for longer bit widths.

In the meantime DRAM Manufacturers and JEDEC (Joint Electron Devices Engineering Council) established standards for DRAM/Module pin-out, DRAM/Module packages sizes, DRAM/Module specs, etc. DRAM capacities increased 1Mx1, 4Mx1, 1Mx4, 16Mx1, 4Mx4 and on and on until today we are using 4Gbit technology DDR3 (1Gx4, 512Mx8, 256Mx16, 128Mx32). Memory interfaces have gone from FPM (Fast Page Mode), EDO (Extended Data Out), SDRAM, (Synchronous DRAM), DDR1 (Double Data Rate 1 SDRAM), DDR2 (Double Data Rate 2 SDRAM), DDR3(Double Data Rate 3 SDRAM) and heading to the mainstream are DDR4 (Double Data Rate 4 SDRAM). All these interfaces were necessary to handle speed increases.  Also multiple standard modules such as SODIMM, MICRO-DIMM, MINI-DIMM, VLP DIMM, RDIMM, FBDIMM, etc., have been used.  The next photo has examples of these modules, plus there are websites that define all these configurations.  DRAM package sizes and mounting styles also have changed as more pin-outs and faster speeds became available such as SOJ (Small Outline J lead), TSOP(Tiny Small Outline Package) and now BGA (Ball Grid Array).


So to recap, from a 256Kbyte core memory that was 16.5 inches by 13.5 inches by 1.75 inches to a 32Gbyte LRDIMM (Load Reduced DIMM) that is 5.3 inches by 1.2 inches by 0.29 inches ­–pretty amazing. An example of these two memories are in the following photo. The 32Gbyte module is at the bottom of the photo leaning against the 256Kbyte module. It would take 131,072 of the 256Kbyte modules to equal one of the 32 Gbyte modules at the bottom of the photo.

Posted in Memory Posts | Comments Off

RAMDisk: A Gamer’s Perspective

~ written by Francis Symonoski, Dataram Memory Blog Team Member

I have been a casual gamer most of my life and have enjoyed the evolution of games over the years, from coin-ops to consoles to the modern incarnations that we have today in PC gaming.  In today’s modern landscape with online gaming and the fierce competition that arises from it, everyone wants an edge, and under the right circumstances, RAMDisk can provide that edge.

You may want to peruse our blog post about RAMDisk from June 7, 2012 if you are not familiar with this software.  In a nutshell, RAMDisk allows you to create a drive using available RAM, which in turn allows you to cache files for faster access–much faster than a hard disk or SSD could ever achieve.  This is a huge advantage when you are accessing very large files and data, and in the video game world every second can mean the difference between moving forward or restarting from the closest spawn point.

First, let’s talk about the benefits and how they may apply to you.  We have all been there– waiting for a map to load, for new areas to appear, even buildings and grass to be “painted” when moving forward.  Using RAMDisk to store your various texture files is the perfect method to have these files available for immediate use instead of your computer having to go look for them on a disk drive. First person shooters are a perfect vehicle for RAMDisk, since there are so many texture files that need to be refreshed constantly, and if you are looking for your enemies, the ability to spot them before they spot you is a no-brainer.  Some people install the game directly into the RAMDisk, which makes the game run extremely fast because all of the files are in memory.  In other words, RAMDisk can be the key to fast refreshes and quick load times.  You might want to search YouTube with your favorite game and “RAMDisk” to see how others have configured their systems.

Be advised that there are times when RAMDisk may not benefit your gaming needs.  For example, League of Legends and World of Tanks have texture files which need to be loaded, but all of these files are loaded before the game even begins.  Having your files load instantly just to wait for the other 9 people to finish loading is not necessarily beneficial.  Does it load faster if it is in the RAMDisk?  Oh, absolutely.  There’s a nice feeling to seeing your screen load when there are 29 seconds left on a 30 second clock, yet there is no “combat” advantage in this situation, only a satisfied feeling that you are ready.

Finally, RAM itself is volatile, so if you have a power outage or corruption on the RAMDisk, the files will be wiped clean and you will have to re-install or restore the files from backup depending on how you have your software configured.  Granted, if you have a power outage in the middle of an important match you probably are not going to win that match, but you certainly do not want to do a restore before resuming the game either.  Games that have the option of moving just the cache location to RAMDisk are ideal, because no restoration is necessary. Just a quick reload and you’re back in action.

So, if you are a gamer and are looking for ways to speed up your load times and gain a competitive advantage, then RAMDisk may be the perfect solution. There is a huge amount of information out there on RAMDisk and its benefits and operation with many games.   With the cost of RAM at all time lows and with the ability to easily put 32GB of Dataram memory in your home PC, the future for gaming and RAMDisk looks very bright indeed!

Posted in Memory Posts | Comments Off

More than the Eyes Can See – The Use of AOI (Automated Optical Inspection)

~ written by Guy Corsey, Dataram Memory Blog Team Member

It has been well documented in Surface Mount Manufacturing that the better your solder pasting operation is, the better the manufacturing of your product. With smaller apertures, it has become very difficult to see every printed image on a circuit board in a high volume, fast-paced environment with just your eyes. Most manufacturing in these types of environments have put in place the used AOI (Automated Optical Inspection).

AOI is a process in which items can be inspected at a very high rate of speed. It uses direct and indirect light, the shape of objects and image recognition to make decisions on whether an item being inspected is acceptable or is to be rejected.

Many high-volume manufacturing operations will use these pieces of equipment in line with their other manufacturing equipment. These machines keep their manufacturing moving at a very fast pace with very little human intervention. The machine can make decisions on solder paste alignment and volume at one station, determine if all the components have been placed at another station, and tell you if the components are soldered to specification at another. Rejected items are removed from the line automatically and the manufacturing continues running.

Here are some examples of what AOI can do for you:

“AOI” will accept solder prints that meet all the criteria for a “Preferred” result of printing. The lands are covered and have the correct deposition of solder paste.






“AOI” will accept solder prints that meet all the criteria for a “Marginal” result of printing. The solder paste is shifted, but not beyond the specifications set per IPC-610 and has the correct deposition of solder paste.






“AOI” will “Reject” solder paste prints that don’t meet IPC-610 requirements. Missing solder paste and extremely shift solder paste will affect the quality of the production run and the performance of the end product.

Can you find the missing solder paste in this picture?






Imagine yourself on the production line with thousands of the solder paste images for you to inspect.  How are you going to get it done?






AOI allows you to verify the process.

For smaller operations, bench top units can be used to make the same determinations using the same technology as the inline machines. Both types of units can view and make consistence (?) decisions on thousands of images in seconds repeatedly, which is more than the eyes can see.

Posted in Memory Posts | Comments Off

RAMDisk Software – What is RAMDisk?

~written by Nelson Rodriguez, Dataram Memory Blog Team Member

A RAM disk, or RAM drive is software that creates a block of main memory made up of DRAM chips on DIMM modules (volatile memory) to unlock the full potential of your available system RAM.  The computer uses this block of memory as a disk drive (secondary storage).  It is sometimes referred to as a “virtual RAM drive” or “software RAM drive” to distinguish it from a “hardware RAM drive” which is a type of solid-state drive.

The performance of a RAM disk in general, is orders of magnitude faster than other forms of storage media, such as an SSD (up to 100X) and hard drive (up to 200X). This performance gain is due to multiple factors, including access time, maximum throughput, type of file system, and other factors as well.

File access time is greatly decreased since a RAM disk is solid state (no mechanical parts). A physical hard drive or optical media, such as CD-ROM, DVD, and Blu-ray must move a head or optical eye into position and a tape drive must wind or rewind to a particular position on the media before reading or writing can occur. RAM disk can access data with only the memory address of a given file, with no movement, alignment or positioning necessary.

Because the storage is in RAM, it is volatile memory, which means the data will be lost in the event of power loss, whether intentional (computer reboot or shutdown) or accidental (power failure). This is sometimes desirable: for example, when working with a decrypted copy of an encrypted file, or for storing a web cache (doing this on a RAM disk can also improve the speed of loading pages).

In many cases, the data stored on the RAM disk has been moved for faster access from data permanently stored elsewhere.  Dataram’s RAMDisk has the option of saving the data on system shutdown, and having it re-created on the RAMDisk when the system reboots.

As the price of RAM has dropped significantly over the past few years, RAMDisk software has been gaining in popularity as a high speed affordable performance enhancement.  With popular PC’s have 4 DIMM slots and 4GB capacity DIMMs widely available, 16GB PC’s are common–as are RAMDisk sizes ranging from 4GB up to 12GB becoming mainstream.  For the high-end performance segment, Dataram now offers 8GB DIMMs for PCs enabling a powerful 32GB PC configuration.  Imagine the performance possibilities of that much memory dedicated to a RAMDisk!

Who will benefit using RAMDisk Software?  To mention a few, you have hard core Gamers running sophisticated complex games online, Videographers using rendering and movie creation software Adobe Premiere, Photographers using RAMDisk as a scratch disk for Photoshop, Programmers speeding their software verification steps, and others.  Anyone who has an application that is I/O intensive and disk bound will benefit.  RAMDisk will essentially remove disk I/O as a factor in your application’s execution, and files will be accessed at the speed of system memory.

RAMDisk software uses the available RAM from your system when creating a virtual drive. Remember that your PC does need RAM to execute system functions and typically at least 2GB should be reserved for the OS, utilities and other applications.  If you have 4GB of RAM total, then creating a 2GB RAMDisk would be optimal.

Go ahead and take Dataram’s RAMDisk for a test run—RAMDisks up to 4GB are FREE!

To really supercharge your computing experience, purchase a RAMDisk greater than 4GB by visiting to secure a license.

Posted in Memory Posts | Comments Off

Storage – The Devil is in the Details!

~written by Michael Moore, Dataram Storage Blog Team Member

Anyone who has engaged in the storage industry has certainly been challenged in many ways. We have all chased the elusive “perfect” solution; each of us has heard conflicting conclusions as to what that means. Many times all the ideas will work – that makes the decision even harder. But what if it mostly works, or kind of solves the problem? What if you have two or more viable answers to your needs?

More common than not, I/Os are slowly building in your network, slow-down is happening and you do not even know it. Let’s blame it on slow drives, software, processors, old technology, etc. Maybe it is none of these.

Let’s lightly explore the “in-flight” issues that come up when I/O bottlenecks arise. Your network is doing its job and all is well. Does the slow-down just show up? No – it sneaks in. The intangible and elusive I/O bottleneck never announces itself. It is not a bulk arrival, it grows fractionally. There is no alarm or warning – it just happens. We all understand this and can write our own chapter to this story, right?

What next? We all know what’s next—network slow-down complaints come in. Eventually the drama of some person thinking they need to reboot server, and so on. The pressure comes. No one planned for this, no budget line item for it, white boards cannot fix it and traditional solutions are patches, but – hope is at the door – a hardware manufacturer shows up. The perfect solution is here, ALL new hardware! Oh yeah, don’t forget your checkbook, it is hundreds of thousands of dollars. Add disk—$$ cha-ching! Make them fast SSDs—BIG $$$ CHA-CHING! Well maybe we can just do a software optimization? Great idea, but does that really fix the problem? Well maybe add more cache! That will work!

OK, let’s buy what we need! So we need cache for each host, right? We need it for the new ones and the old ones. You are ahead of me now, why put new money into old. Not to mention you might be maxed out already, maybe additional cache will not be enough. Then you have spent more money and, to top it off, it only works for the server in which it is installed.

Let me add one more bit of indigestion. Now you have servers, different configurations, with different maintenance schedules, different build dates, etc. Get the picture? A new time management program depicts at least one man-hour a week to manage the boxes, or the net is over five man-days per year.

Now let’s ask the question this way. Can we get I/O enhancements which meet the following criteria?

  1. No hardware configuration change
  2. No additional maintenance charges from the server manufacturer
  3. No downtime
  4. Ability to remote- and local-manage
  5. Improve both reads and writes
  6. Do not add more disks

ANSWER – Yes, and with less risk than any other solution with immediate benefit.

Here is how – put one HA pair of XcelaSANs into your fibre SAN. No intrusion on your current network. 256GB of TRUE DRAM cache servicing your reads and writes.

Noted storage commentator David F. Bacon said it best – “As a result, the high-speed cache memory that acts as a buffer between main memory and the CPU is an increasingly significant factor in overall performance. In fact, upgrading your cache may give you more of a performance boost than upgrading your processor.”

Cache Advantage. David F. Bacon, BYTE, 1994.
Posted in Storage Posts | Comments Off

Computer In the Slow Lane? Consider More Memory.

~written by David Sheerr, Dataram Memory Blog Team Member

How do you know if you need more memory?

When your computer starts running slow, one of the first pieces of advice most people will give you is to put more RAM in it. While it’s true that adding RAM to your computer can improve its performance, doing so isn’t always the right solution to a slow computer.

There are a few steps you can take to figure out whether or not you need more memory. The first thing you’ll want to do is figure out how much RAM you already have, and what operating system you’re running.

Whether you have a PC or a Mac, it’s easy to figure out just how much RAM you have, and what version of the operating system your computer is running.

PC users just have to right click on either the My Computer icon on the desktop, or in the start menu and click properties. The properties window will show both the version of Windows you’re running, and the amount of RAM installed in your computer.

While Windows XP can, technically speaking, run on as little as 64MB of RAM, you should have at least 1GB of RAM installed to ensure that modern programs will run correctly. Windows Vista and Windows 7 both require at least 2GB of RAM to run correctly.

As for Macs, clicking on About this Mac under the apple menu will yield a window with all the important information about your computer, including the amount of RAM installed, and the version of the Mac OS it’s running.

Older versions of OSX (10.4 and earlier) can run on as little as 128MB of RAM, but again, to run modern software, you’ll want at least 1GB of RAM.

If you use your computer to play video games, watch HD videos or run advanced software, you’re probably going to need more than the bare minimum of memory. Each of those tasks makes your computer work hard, and as such, additional RAM will be necessary to improve performance.

If your computer is running slow, try doubling the amount of memory. Your computer uses RAM as a work space. Each time you open a program or a file, the computer loads it into the RAM so that it can be worked with.

By installing more RAM in your computer, you’re giving it a bigger work area to load files into, which allows it to do more things at once, resulting in increased speed. So, if you have the minimum necessary amount of RAM installed in your system and you aren’t getting the speed you need out of it, a RAM upgrade should help. Doubling your RAM should enable your computer to meet your needs.

Posted in Memory Posts | Tagged , , , | Comments Off

The Next Generation Servers (and Memory) Have Arrived

~written by Jeff Goldenbaum, Dataram Memory Blog Team Member

Last week Intel announced their long anticipated Xeon E5-2600 processors, also known as Sandy Bridge-EP, for dual-socket servers.  In turn, all of the major OEMs released details for their next generation systems that will use these processors.  In addition to the new processors and servers come new memory technologies, specifically DDR3-1600 (PC3-12800) as my associate Paul Henke blogged about earlier this month.

Hewlett-Packard debuted their next generation of HP ProLiant servers (Generation 8)  including the server blade BL460c G8 (16 DIMM slots; 512GB max), rack-optimized DL360p G8 and DL380p G8 (24 DIMM slots; 768GB max), expansion-optimized ML350p G8 (24 DIMM slots; 384GB max), and scalable systems SL230s G8 and SL250s G8 (16 DIMM slots; 256GB max).

IBM is touting new x86 servers with their System x3500 M4 tower server (24 DIMM slots; 768GB max), x3550 M4 and x3650 M4 rack servers (24 DIMM slots; 768GB max), iDataPlex dx360 M4 scale-out server (16 DIMM slots; 256GB max), and HS23 blade server (16 DIMM slots; 256GB max).

Dell is keeping pace with their new 12th generation PowerEdge R620 and R720/R720XD rack servers, M620 blade server, and T620 tower server—all with 24 DIMM slots and up to 768GB of memory.

Cisco offers up their 3rd generation UCS servers with the B200 M3 blade server (24 DIMM slots; 384GB max), C220 M3 (16 DIMM slots; 256GB max) and C240 M3 (24 DIMM slots; 384GB max) rack servers.

And Fujitsu announced new Xeon E5-2600-based systems in desktop, rack and blade with their Primergy RX200 S7, RX300 S7, RX350 S7, BX924 S3, and TX300 G7 servers.  All feature 24 DIMM slots and accept up to 768GB of memory.

Dataram is supporting all of the new systems I mentioned with a full complement of DDR3-1333 (PC3-10600) and DDR3-1600 (PC3-12800) memory options.  Additionally, Dataram is providing a new 32GB DDR3-1333 (PC3L-10600) LRDIMM (Load Reduced DIMM) which is required for these servers to achieve a maximum memory capacity of 768GB.

Posted in Memory Posts | Comments Off


~written by Paul Henke, Dataram Memory Blog Team Member

On the eve of Intel’s release of Sandy Bridge-EP, the next generation of Xeon processors for servers, the memory industry has geared up for another uptick in memory speed.  Following Moore’s law– speed and performance are increasing while costs for memory are at historic lows.   Configuring servers for maximum power and performance has never been more affordable.

By selecting the latest DDR3-1600 (PC3-12800) speed DIMMs that deliver a data rate of 12.8 GB/s, you will realize about 20% more throughput than existing DDR3-1333 (PC3-10600) speed memory which delivers a data rate of 10.66 GB/s.   PC3-10600 RDIMMs, the most commonly used RDIMMs in today’s Xeon 5600 CPUs based servers, will give way to the faster PC3-12800 speed memory.

Just an FYI for the novice reader—the terms “DDR3-1333” and “DDR3-1600” are DRAM speeds, and “PC3-10600” and “PC3-12800” are MODULE (DIMM) speeds, as defined by JEDEC.  However many times, the DRAM speed nomenclature is used universally as both DRAM and DIMM speed indications.

AMD also has adopted the faster DDR3-1600 speed memory for servers with the introduction of Opteron 6200 “Interlagos” CPUs.   Servers and workstations will see significant performance improvements over prior generation Opteron 6100, which also featured DDR3-1333 as the top memory speed.

This discussion is all about speed and performance–since these DDR3-1600 applications will be offered at standard DDR3 1.5V power only.   Low voltage 1.35V (PC3L), and future 1.25V lower voltage dimms, will not be enabled at DDR3-1600 speed in this new generation of servers.   They will however continue to support DDR3L-1333 (PC3L-10600R) memory speeds at low voltage (1.35V), for those wishing to extract maximum power savings from their server infrastructure in lieu of the highest possible performance.   The trade-off decisions continue between maximum power savings vs. maximum compute performance.   The “tug-of-war” between “Going Green” vs. “My Servers are my Strategic Weapons” groups continues, with the power users configuring for maximum speed and competitive advantage by delivering the fastest response times.

Whatever your compute challenges –Dataram has a full complement of memory options enabling your servers to achieve your IT objectives!   All with superior reliability, a lifetime warranty, guaranteed compatibility, service, and priced to deliver significant cost savings for your organization.

Posted in Memory Posts | Comments Off

Understanding the Nature of the “Cache Hit Ratio Curve”, Part 1

~written by Jason Caulkins, Dataram Storage Blog Team Member

Starting back in the 1980s (and even prior), work was being done to identify how cache affects application performance in a typical compute environment.  A very good series of work was done at IBM, the University of California, Berkeley, and a host of other institutions.  Much of this work was facilitated by using “storage traces”.  Once a real-world IO workload could be isolated and recorded to create a “storage trace”, one could then take the storage activity and play it back, without actually performing any of the CPU or network activity.  This allowed researchers to isolate just the disk IO workload and measure performance of disk subsystems, cache, and various tuning algorithms in order to gain a better understanding of their independent impact on storage IO.

This paper focuses on the work done on cache size and location.  An important contributor in this field is Dr. Alan J. Smith with the University of California, Berkeley.  His paper “Disk Cache – Miss Ratio Analysis and Design Considerations” (reference) written in 1985 is one of the first works that analyzed and demonstrated the effect of cache size and location on standard computing workloads.

He measured the effect of cache size at various locations in the compute hierarchy.  His measurements are graphed in terms of “Cache Miss Ratio” and “Cache Size”.  An example is shown below:


Figure 1

(Source: Smith, Allan J.  Disk Cache – Miss Ratio Analysis and Design Considerations. 1985)


The graph in Figure 1 shows a couple of very interesting behaviors.  First, it shows that regardless of cache location, there is a diminishing benefit for increased cache size (the curves are steep at the beginning then flatten out quickly).  Second, it shows that the location of the cache is most effective at the device, then the controller, and lastly “Global” (system RAM).  This is indicated by the curves that are closest to the left (smallest cache size for given miss ratio).

Fast-forward a few decades and the work has been expanded and tested in the real world.  We now talk about “Cache Hit Ratios” because that seems easier to communicate, and also produces graphs that are more intuitive.  Below is an example of a modern, generic “Cache Hit Ratio” graph:


Figure 2

The graph in Figure 2 is expressing the same behavior, but is expressed for one location (in this case the controller), the vertical axis is expressed in % Cache Hits (vs. statistical misses), and the horizontal axis expresses cache size not in absolutes, but as a % of the dataset size.  This is obviously greatly simplified, but makes understanding the principals more straight-forward.

The exact shape of this curve will vary based on the workload, system performance and many other factors, but those can be expressed as coefficients that flatten or exaggerate this same basic shape.

In examination of these graphs a few questions may come to mind:

  1. Why is the cache more effective closer to the storage device?
  2. Why do data access patterns for compute workloads behave like this?
  3. Why is the cache so effective in relatively small sizes, then has a declining benefit?

The answer to question one really has to do with relative performance and locality.  The CPU has cache which is very fast and very close to the CPU.  It provides similar benefits to the CPU’s workload that storage cache provides to the storage workload.  However, this cache is too costly (thus cannot be practically grown with the dataset) and too remote to the disks to provide any significant real-world benefit for a storage workload.  In addition, storage cache must be treated differently, as it has to have a means to protect the data in the event of a power loss.  So, in essence, the highest-performance location of the cache in the storage hierarchy has to do with proximity to the final “resting place” of the data, and to the vast speed difference between the cache layer and the storage device (disk) layer.

The answer to the second question, “Why do data access patterns for compute workloads behave like this?”, has to do with the nature of real-world, structured data access.  In most compute workloads (desktop, financial, database, scientific) certain data is accessed much, much more frequently than others.  For example, file system file allocation tables, database Index files, re-do logs and the most current dataset or record are accessed very much more often than “old” or “cold” data.  Since these data types are so frequently accessed, having them in local storage cache greatly improves overall system performance.

The answer to the third question, “Why is the cache so effective in relatively small sizes, then has a declining benefit?”  is that, by design, file allocation tables, index files, re-do logs and individual records are much smaller than the total dataset and therefore tend to fit nicely in the right-sized cache, and since they are so frequently accessed the benefit is great.

Please come back for Part II which will examine the implications of these behaviors and establish some rules of thumb on how to properly size your cache for a given dataset size.



Posted in Storage Posts | Comments Off