Storage – The Devil is in the Details!

~written by Michael Moore, Dataram Storage Blog Team Member

Anyone who has engaged in the storage industry has certainly been challenged in many ways. We have all chased the elusive “perfect” solution; each of us has heard conflicting conclusions as to what that means. Many times all the ideas will work – that makes the decision even harder. But what if it mostly works, or kind of solves the problem? What if you have two or more viable answers to your needs?

More common than not, I/Os are slowly building in your network, slow-down is happening and you do not even know it. Let’s blame it on slow drives, software, processors, old technology, etc. Maybe it is none of these.

Let’s lightly explore the “in-flight” issues that come up when I/O bottlenecks arise. Your network is doing its job and all is well. Does the slow-down just show up? No – it sneaks in. The intangible and elusive I/O bottleneck never announces itself. It is not a bulk arrival, it grows fractionally. There is no alarm or warning – it just happens. We all understand this and can write our own chapter to this story, right?

What next? We all know what’s next—network slow-down complaints come in. Eventually the drama of some person thinking they need to reboot server, and so on. The pressure comes. No one planned for this, no budget line item for it, white boards cannot fix it and traditional solutions are patches, but – hope is at the door – a hardware manufacturer shows up. The perfect solution is here, ALL new hardware! Oh yeah, don’t forget your checkbook, it is hundreds of thousands of dollars. Add disk—$$ cha-ching! Make them fast SSDs—BIG $$$ CHA-CHING! Well maybe we can just do a software optimization? Great idea, but does that really fix the problem? Well maybe add more cache! That will work!

OK, let’s buy what we need! So we need cache for each host, right? We need it for the new ones and the old ones. You are ahead of me now, why put new money into old. Not to mention you might be maxed out already, maybe additional cache will not be enough. Then you have spent more money and, to top it off, it only works for the server in which it is installed.

Let me add one more bit of indigestion. Now you have servers, different configurations, with different maintenance schedules, different build dates, etc. Get the picture? A new time management program depicts at least one man-hour a week to manage the boxes, or the net is over five man-days per year.

Now let’s ask the question this way. Can we get I/O enhancements which meet the following criteria?

  1. No hardware configuration change
  2. No additional maintenance charges from the server manufacturer
  3. No downtime
  4. Ability to remote- and local-manage
  5. Improve both reads and writes
  6. Do not add more disks

ANSWER – Yes, and with less risk than any other solution with immediate benefit.

Here is how – put one HA pair of XcelaSANs into your fibre SAN. No intrusion on your current network. 256GB of TRUE DRAM cache servicing your reads and writes.

Noted storage commentator David F. Bacon said it best – “As a result, the high-speed cache memory that acts as a buffer between main memory and the CPU is an increasingly significant factor in overall performance. In fact, upgrading your cache may give you more of a performance boost than upgrading your processor.”

Source: http://researcher.ibm.com/view_pubs.php?person=us-bacon&t=1
Cache Advantage. David F. Bacon, BYTE, 1994.
Posted in Storage Posts | Leave a comment

Computer In the Slow Lane? Consider More Memory.

~written by David Sheerr, Dataram Memory Blog Team Member

How do you know if you need more memory?

When your computer starts running slow, one of the first pieces of advice most people will give you is to put more RAM in it. While it’s true that adding RAM to your computer can improve its performance, doing so isn’t always the right solution to a slow computer.

There are a few steps you can take to figure out whether or not you need more memory. The first thing you’ll want to do is figure out how much RAM you already have, and what operating system you’re running.

Whether you have a PC or a Mac, it’s easy to figure out just how much RAM you have, and what version of the operating system your computer is running.

PC users just have to right click on either the My Computer icon on the desktop, or in the start menu and click properties. The properties window will show both the version of Windows you’re running, and the amount of RAM installed in your computer.

While Windows XP can, technically speaking, run on as little as 64MB of RAM, you should have at least 1GB of RAM installed to ensure that modern programs will run correctly. Windows Vista and Windows 7 both require at least 2GB of RAM to run correctly.

As for Macs, clicking on About this Mac under the apple menu will yield a window with all the important information about your computer, including the amount of RAM installed, and the version of the Mac OS it’s running.

Older versions of OSX (10.4 and earlier) can run on as little as 128MB of RAM, but again, to run modern software, you’ll want at least 1GB of RAM.

If you use your computer to play video games, watch HD videos or run advanced software, you’re probably going to need more than the bare minimum of memory. Each of those tasks makes your computer work hard, and as such, additional RAM will be necessary to improve performance.

If your computer is running slow, try doubling the amount of memory. Your computer uses RAM as a work space. Each time you open a program or a file, the computer loads it into the RAM so that it can be worked with.

By installing more RAM in your computer, you’re giving it a bigger work area to load files into, which allows it to do more things at once, resulting in increased speed. So, if you have the minimum necessary amount of RAM installed in your system and you aren’t getting the speed you need out of it, a RAM upgrade should help. Doubling your RAM should enable your computer to meet your needs.

Posted in Memory Posts | Tagged , , , | 1 Comment

The Next Generation Servers (and Memory) Have Arrived

~written by Jeff Goldenbaum, Dataram Memory Blog Team Member

Last week Intel announced their long anticipated Xeon E5-2600 processors, also known as Sandy Bridge-EP, for dual-socket servers.  In turn, all of the major OEMs released details for their next generation systems that will use these processors.  In addition to the new processors and servers come new memory technologies, specifically DDR3-1600 (PC3-12800) as my associate Paul Henke blogged about earlier this month.

Hewlett-Packard debuted their next generation of HP ProLiant servers (Generation 8)  including the server blade BL460c G8 (16 DIMM slots; 512GB max), rack-optimized DL360p G8 and DL380p G8 (24 DIMM slots; 768GB max), expansion-optimized ML350p G8 (24 DIMM slots; 384GB max), and scalable systems SL230s G8 and SL250s G8 (16 DIMM slots; 256GB max).

IBM is touting new x86 servers with their System x3500 M4 tower server (24 DIMM slots; 768GB max), x3550 M4 and x3650 M4 rack servers (24 DIMM slots; 768GB max), iDataPlex dx360 M4 scale-out server (16 DIMM slots; 256GB max), and HS23 blade server (16 DIMM slots; 256GB max).

Dell is keeping pace with their new 12th generation PowerEdge R620 and R720/R720XD rack servers, M620 blade server, and T620 tower server—all with 24 DIMM slots and up to 768GB of memory.

Cisco offers up their 3rd generation UCS servers with the B200 M3 blade server (24 DIMM slots; 384GB max), C220 M3 (16 DIMM slots; 256GB max) and C240 M3 (24 DIMM slots; 384GB max) rack servers.

And Fujitsu announced new Xeon E5-2600-based systems in desktop, rack and blade with their Primergy RX200 S7, RX300 S7, RX350 S7, BX924 S3, and TX300 G7 servers.  All feature 24 DIMM slots and accept up to 768GB of memory.

Dataram is supporting all of the new systems I mentioned with a full complement of DDR3-1333 (PC3-10600) and DDR3-1600 (PC3-12800) memory options.  Additionally, Dataram is providing a new 32GB DDR3-1333 (PC3L-10600) LRDIMM (Load Reduced DIMM) which is required for these servers to achieve a maximum memory capacity of 768GB.

Posted in Memory Posts | 1 Comment

DDR3-1600 RDIMMs for SERVERS—HEAR—HEAR—HERE NOW!!

~written by Paul Henke, Dataram Memory Blog Team Member

On the eve of Intel’s release of Sandy Bridge-EP, the next generation of Xeon processors for servers, the memory industry has geared up for another uptick in memory speed.  Following Moore’s law– speed and performance are increasing while costs for memory are at historic lows.   Configuring servers for maximum power and performance has never been more affordable.

By selecting the latest DDR3-1600 (PC3-12800) speed DIMMs that deliver a data rate of 12.8 GB/s, you will realize about 20% more throughput than existing DDR3-1333 (PC3-10600) speed memory which delivers a data rate of 10.66 GB/s.   PC3-10600 RDIMMs, the most commonly used RDIMMs in today’s Xeon 5600 CPUs based servers, will give way to the faster PC3-12800 speed memory.

Just an FYI for the novice reader—the terms “DDR3-1333” and “DDR3-1600” are DRAM speeds, and “PC3-10600” and “PC3-12800” are MODULE (DIMM) speeds, as defined by JEDEC.  However many times, the DRAM speed nomenclature is used universally as both DRAM and DIMM speed indications.

AMD also has adopted the faster DDR3-1600 speed memory for servers with the introduction of Opteron 6200 “Interlagos” CPUs.   Servers and workstations will see significant performance improvements over prior generation Opteron 6100, which also featured DDR3-1333 as the top memory speed.

This discussion is all about speed and performance–since these DDR3-1600 applications will be offered at standard DDR3 1.5V power only.   Low voltage 1.35V (PC3L), and future 1.25V lower voltage dimms, will not be enabled at DDR3-1600 speed in this new generation of servers.   They will however continue to support DDR3L-1333 (PC3L-10600R) memory speeds at low voltage (1.35V), for those wishing to extract maximum power savings from their server infrastructure in lieu of the highest possible performance.   The trade-off decisions continue between maximum power savings vs. maximum compute performance.   The “tug-of-war” between “Going Green” vs. “My Servers are my Strategic Weapons” groups continues, with the power users configuring for maximum speed and competitive advantage by delivering the fastest response times.

Whatever your compute challenges –Dataram has a full complement of memory options enabling your servers to achieve your IT objectives!   All with superior reliability, a lifetime warranty, guaranteed compatibility, service, and priced to deliver significant cost savings for your organization.

Posted in Memory Posts | Leave a comment

Understanding the Nature of the “Cache Hit Ratio Curve”, Part 1

~written by Jason Caulkins, Dataram Storage Blog Team Member

Starting back in the 1980s (and even prior), work was being done to identify how cache affects application performance in a typical compute environment.  A very good series of work was done at IBM, the University of California, Berkeley, and a host of other institutions.  Much of this work was facilitated by using “storage traces”.  Once a real-world IO workload could be isolated and recorded to create a “storage trace”, one could then take the storage activity and play it back, without actually performing any of the CPU or network activity.  This allowed researchers to isolate just the disk IO workload and measure performance of disk subsystems, cache, and various tuning algorithms in order to gain a better understanding of their independent impact on storage IO.

This paper focuses on the work done on cache size and location.  An important contributor in this field is Dr. Alan J. Smith with the University of California, Berkeley.  His paper “Disk Cache – Miss Ratio Analysis and Design Considerations” (reference) written in 1985 is one of the first works that analyzed and demonstrated the effect of cache size and location on standard computing workloads.

He measured the effect of cache size at various locations in the compute hierarchy.  His measurements are graphed in terms of “Cache Miss Ratio” and “Cache Size”.  An example is shown below:

 

Figure 1

(Source: Smith, Allan J.  Disk Cache – Miss Ratio Analysis and Design Considerations. 1985)

 

The graph in Figure 1 shows a couple of very interesting behaviors.  First, it shows that regardless of cache location, there is a diminishing benefit for increased cache size (the curves are steep at the beginning then flatten out quickly).  Second, it shows that the location of the cache is most effective at the device, then the controller, and lastly “Global” (system RAM).  This is indicated by the curves that are closest to the left (smallest cache size for given miss ratio).

Fast-forward a few decades and the work has been expanded and tested in the real world.  We now talk about “Cache Hit Ratios” because that seems easier to communicate, and also produces graphs that are more intuitive.  Below is an example of a modern, generic “Cache Hit Ratio” graph:

 

Figure 2

The graph in Figure 2 is expressing the same behavior, but is expressed for one location (in this case the controller), the vertical axis is expressed in % Cache Hits (vs. statistical misses), and the horizontal axis expresses cache size not in absolutes, but as a % of the dataset size.  This is obviously greatly simplified, but makes understanding the principals more straight-forward.

The exact shape of this curve will vary based on the workload, system performance and many other factors, but those can be expressed as coefficients that flatten or exaggerate this same basic shape.

In examination of these graphs a few questions may come to mind:

  1. Why is the cache more effective closer to the storage device?
  2. Why do data access patterns for compute workloads behave like this?
  3. Why is the cache so effective in relatively small sizes, then has a declining benefit?

The answer to question one really has to do with relative performance and locality.  The CPU has cache which is very fast and very close to the CPU.  It provides similar benefits to the CPU’s workload that storage cache provides to the storage workload.  However, this cache is too costly (thus cannot be practically grown with the dataset) and too remote to the disks to provide any significant real-world benefit for a storage workload.  In addition, storage cache must be treated differently, as it has to have a means to protect the data in the event of a power loss.  So, in essence, the highest-performance location of the cache in the storage hierarchy has to do with proximity to the final “resting place” of the data, and to the vast speed difference between the cache layer and the storage device (disk) layer.

The answer to the second question, “Why do data access patterns for compute workloads behave like this?”, has to do with the nature of real-world, structured data access.  In most compute workloads (desktop, financial, database, scientific) certain data is accessed much, much more frequently than others.  For example, file system file allocation tables, database Index files, re-do logs and the most current dataset or record are accessed very much more often than “old” or “cold” data.  Since these data types are so frequently accessed, having them in local storage cache greatly improves overall system performance.

The answer to the third question, “Why is the cache so effective in relatively small sizes, then has a declining benefit?”  is that, by design, file allocation tables, index files, re-do logs and individual records are much smaller than the total dataset and therefore tend to fit nicely in the right-sized cache, and since they are so frequently accessed the benefit is great.

Please come back for Part II which will examine the implications of these behaviors and establish some rules of thumb on how to properly size your cache for a given dataset size.

 

 

Posted in Storage Posts | Leave a comment

Dataram’s DRAM Device Qualification Program

~written by Jim Hampshire, Dataram Memory Blog Team Member

At Dataram, we use a “Device Qualification Program” for DRAM (Dynamic Random Access Memory) components, as well as all other active components used on our product.  In this process we place only approved specific manufacturer part numbers into our PLMS (Parts Logistics Management System). Only those defined manufacturer part numbers are permitted to be used.

These devices are qualified by Design Engineering using processes which parallel the original design validation.  This method guards against using unauthorized devices.  Manufacturer part numbers are verified against the Dataram PLMS at the time of receipt and then again at each production lot’s first piece inspection where a component ID is performed.  Although this process requires constant monitoring and verification testing, this system provides protection against any possible approved DRAM part number design variations.

The process is a three-stage program, (1) review of the manufacturer’s technical data sheet, (2) qualification testing on DATARAM products, and (3) review/re-qualification of any changes to the DRAM.

The program begins with an engineering review of the manufacturer’s data sheet.  This is followed by the procurement of a sample quantity of the DRAMs for qualification.  Memory module(s) are selected for the qualification based upon their complexity and the availability of the systems to test the DRAM.  Sufficient quantities of these memory modules are assembled to fully evaluate the devices during the testing process.

The qualification test program parallels the original design verification test. The samples are subjected to initial functional testing and verification of timing and waveform integrity on the Memory Tester.  The samples are then inserted into the target system(s) and are operated from a minimum configuration to a maximum server/workstation configuration.  Wherever applicable, Dataram memory modules with DRAMs obtained from other manufacturers are also introduced with the Dataram memory modules under test in various inter-bank and intra-bank combinations from the lowest to a fully loaded capacity in order to verify compatibility.

After acceptance of a manufacturer’s DRAM PN, we continuously review all changes made by the manufacturer to that part number.  Any changes to the part and/or die revision automatically require a re-qualification prior to use.

 

Posted in Memory Posts | 1 Comment

Inventory Control – What is MRP and Why Do We Use It?

~written by Nick Bukaczyk, Dataram Memory Blog Team Member

Often, employees throughout a company hear such words as purchase order, sales order, shortages, expediting, due dates, forecast, demand, kits, material, inventory, data and bill of material.  They all come together through MRP.

MRP (Material Requirements Planning) is the computerized ordering and scheduling system used by manufacturing and fabrication industries.  It uses bills of material, sales orders and forecasts to generate raw material requirements (components/parts).  It also gathers new order requirements as they come in, presents shortages if they exist and suggests ordering/building when necessary based on data gathered.

Many people throughout an organization contribute to the MRP process:

  1. Sales – enters orders which creates a finished goods requirement
  2. Production Control – reviews inventory levels and sales requirements, then provides manufacturing with work orders to satisfy demand
  3. Purchasing – reviews component stocking levels, forecast and sales orders, then generate purchase orders for raw material to satisfy demand
  4. Receiving personnel –  as raw goods arrive, items are received into inventory to show component availability for  manufacturing orders
  5. Stockroom personnel –  kit work orders, perform transfers and cycle count to maintain inventory accuracy so purchasing and production control see up-to-date availability
  6. Shipping personnel – relieve finished goods inventory and satisfy the sales order demand when shipping/closing orders

An MRP system is used to simultaneously meet three main goals:

  1. Ensure material is available for production and finished goods are                           available for delivery to customers while minimizing inventory levels
  2. Maintain certain stocking levels as dictated by company philosophy
  3. Plan manufacturing and purchasing activities with delivery schedules, lead-times and sales

MRP is used to guide the company in its daily inventory activity. It helps us maintain our standards to consistently provide customers with on time deliveries and high quality product.

 

Posted in Memory Posts | Leave a comment

Why Should You Upgrade Your Memory?

~written by David Sheerr, Dataram Memory Blog Team Member

It’s no secret that upgrading your computer’s memory most often improves performance, but have you ever stopped to wonder why that is? Before you can understand why a RAM upgrade improves your computer’s performance, you have to understand the role memory plays in your computer.

Imagine for a moment that you’re sitting at a small desk in an office somewhere… unless you’re actually sitting at a small desk in an office right now – if that’s the case imagine yourself sitting at a small desk on a tropical island. You deserve it. Now imagine that a man walks up to you and hands you a pile of unsorted papers. Each paper is colored red, blue, green or yellow, and the man demands that you sort the papers by color.

You try to do as he says, but your small desk will only accommodate one pile of paper at a time. In order to complete the task you have to sort one color at a time, hand it off to the man, and then start on the next color.

If you had a larger desk, you could sort all four colors at once, thus cutting the total time the task took in half.

Think of your computer’s RAM as its desk. With more RAM, your computer will have a larger “work area”, allowing it to perform more operations simultaneously.

To put it a bit more technically, each time you start an application to work with a file, your computer has to load the application program as well as the data file to be edited into RAM. The more memory the computer has, the more work it can do at one time, which results in increased speed and performance.

So, why should you upgrade your computer’s memory? To give it a bigger workspace, and allow it to run faster. So go ahead, get your computer a bigger “desk” – it’ll thank you for it, and you’ll thank yourself for making your computer experience a whole lot better.

Posted in Memory Posts | 1 Comment

DDR3 Server Memory: LV 1.35V or Standard 1.5V–What Shall I Choose?

~written by Nelson Rodriguez, Dataram Memory Blog Team Member

DDR3 Low voltage 1.35V RDIMMs are becoming mainstream in today’s x86 servers featuring Intel’s Xeon 5600 series CPUs—also known as “Westmere”.    Standard DDR3 DIMMs run at 1.5V.  DDR3 memory technology has evolved since it became mainstream in 2008, with a series of die shrinks each resulting in lower power consumption.

By “going green”, let’s compare power consumption in a 96GB capacity server.  First generation 50nm DRAMs would draw about 65W of power.  A die shrink to 40nm lowered power consumption to under 43W, for about a 34% reduction.  By again shrinking the die to 30nm technology AND lowering the voltage to 1.35V, this results in an additional 21% power reduction, down to under 34W.  These comparisons are from our friends at Samsung, the world’s largest maker of DRAMs.  As you calculate the power savings over an entire datacenter with large numbers of servers, the savings become substantial.  The advantages of LV (PC3L-) RDIMMs are obvious for servers, so what’s the catch?

Performance users deploy servers with one item in mind—extract the maximum application performance possible, and use IT infrastructure as a competitive advantage.  This is especially true with our customers involved in the financial services industry.  Securities trading in particular are most concerned about low-latency operation and want the fastest possible CPU and memory speeds.  After all, on Wall Street, “TIME IS MONEY”.

Users who have high performance needs of DDR3 RDIMMs must be aware of, when you populate more than one PC3L-low voltage RDIMM per memory channel, a DDR3-1333 speed DIMM will clock down to DDR3-1066 memory speed.   However, the 1.5V standard voltage DDR3 counterpart will run at the full DDR3-1333 memory speed with 2 RDIMMs per channel in Xeon 5650 or higher based servers.  Performance users are better off selecting standard 1.5V RDIMMs when populating up to 12 RDIMMs in a 2-way server.

Which is best—maximum power or maximum performance?   As the customer, you decide based on your needs.  Dataram has the full complement of memory options to enable you to go any way you choose.  Dataram’s team of memory specialists and our industry-leading customer support group will help you choose!

Posted in Memory Posts | Leave a comment

Wire Speed Storage

~written by Jason Caulkins, Dataram Storage Blog Team Member

In my last blog I discussed how storage would evolve into something that looks a lot like main memory today.  The challenge of storing data inside a server (for use in a scale-out datacenter, for example), is that you need to maintain data coherency, availability, performance and scalability, despite the inevitable hardware, communications and human errors (oh, yeah – and don’t forget cost and usability).

This is a very tall order.

So, while we cannot solve this equation today, we can at least break it down into some byte-sized bits.

Let’s start with coherence and performance.  These are nearly mutually exclusive in a large, geographically disperse environments.  The issue is really about latency.  In addition to the latencies associated with connection speeds between remote sites, there are other, programmatic latencies to deal with.  These are associated with ensuring atomicity.

Today, a file or block of data is locked when accessed by a node or application, to avoid changes and data corruption if another node or application is trying to access the same data at the same time.  While this ensures coherency, the performance of the waiting node can suffer greatly, depending on how long the first application holds the lock and the efficiency of the locking mechanism itself.  In the worst cases, you can end up in perpetual locks, which is a bad thing.

A better approach is to implement a journaling system where both nodes or applications are allowed to view/modify the data at the same time, but keep track of the changes so that another process can check for collisions, and roll back the changes as required.  The idea here is that collisions are rare, so why pay the expensive locking penalty for every transaction?  The down side of the journaling system is that it takes up storage space keeping track of redundant data changes.  However, all one needs to do is compare the price of capacity vs. the price/benefit of performance.  For critical transactions (stock trades, financial transactions), the collision detection/repair mechanism must find and fix collisions before a transaction can be committed, adding yet another variable to the equation.

The fundamental issue here is tradeoff.  Physics and economics are at odds.  There is no free lunch.  In order to be low latency, you have to pay for things like solid state and very fast connections between sites.  These systems have a great price/performance metric, but this makes the price/GB attribute go way up.  If you just need bulk, local storage, slow, cheap, high-capacity mechanical drives provide the best price/GB, but have terrible price/performance metrics.

As long as there are different storage workloads and different fundamental storage technologies, it makes sense to tailor the storage system to the workload.

To further muddy the waters, storage workloads are vastly different in the same compute environment.  Even the same applications change their workload characteristics depending on the number of users, the amount if non-storage system resources, and even on version of the application being used.

So the trick to solving this problem is to create a storage environment that has all the characteristics required to address the various workload, performance and economic challenges presented by the applications.  This means that an advanced storage infrastructure must have elements of high price/performance as well as elements of low price per capacity.  It must be intelligent enough to dynamically assign resources as the workloads demand, and must support the ability to modularly add performance elements as well as capacity elements not only on demand, but in a predictive manner, so that the system always provides just the right price/performance for a dynamic application workload driven by ever-changing business workloads.

Posted in Storage Posts | Leave a comment