
By Jason Zandri
What does it all mean?
RAID 0, Duplexing, Mirroring, Striping with parity, ECC, the
terminology bog goes on for what seems to be forever. Do you
need to know what all of these mean? If you're considering
a RAID array for your network, the first step is understanding
the different terms before you even think about making a purchase.
So, off we go.
RAID - To start talking about RAID (redundant array
of independent disks; or redundant array of inexpensive disks;
depending on who you ask, both are accepted as "correct"),
you need to know that are two different types: Software and
Hardware RAID. In a nutshell, hardware RAID is any type of
RAID deployed on a computer system that is controlled at a
hardware level, be it by a controller card or other device,
which is independent of the operating system. Before any type
of operating system is installed on the system (or started
up, in the event an operating system is present), this level
of RAID would already be enabled on the given system. In theory,
the loss of the operating system from errors or configuration
issues in this type of RAID configuration allows easier recovery
of your data, as the disk configuration is held on the hardware
controller. The loss of the hardware device makes data retrieval
much more difficult, as the configuration of the hard drives
is unknown to the operating system, (and any standard data
recovery tools) as it sees the combined space of all the drives
as one logical structure.
[NOTES FROM THE FIELD] - Certain server class hardware
RAID solutions not only write the disk configuration to the
memory of the hardware controller but to certain reserved
sectors of the hard drives as well. This means that if the
controller failed and then was replaced with the same make,
model and BIOS revision of controller, the chances of recovering
the data is greatly improved as the disk configuration would
be written "back" from the disks to the memory of the replacement
controller, and even if the operating system faulted, the
data would most likely be intact for standard disk access
and recovery at that point.
In a software based (software based means "derived from the
operating system," by default) RAID solution, it is the operating
system itself that creates and stores the logical structure
of the drives in the array. Real mode access (such as booting
from a floppy disk or an NTFS boot disk in the NT world) in
most cases is not going to allow any access to the data, as
the operating system was by-passed and not allowed to initialize
and access the logical drive array it has created.
[NOTES FROM THE FIELD] - There is no RAID based
hardware failure (such as a controller card) that you need
to be concerned with in a software based RAID scenario; however,
if you lose the operating system to the point where the repair
function cannot "fix" the issue, all the data of the operating
system created RAID solution is usually lost.
| |
FREE
Trial! Increase
WAN capacity. - With MSR technology that uses DNA
pattern matching to incease capacity up to 10 times. |
The Different Flavors of RAID
For the most part, the RAID levels themselves have the same
characteristics and properties, whether they are software
or hardware based. There are only a few types, RAID 0, 1 and
5, that can be software based (that I am aware of). The remainder
are all hardware based for the most part.
RAID 0 - RAID 0 is a misnomer of sorts, as it is not
really RAID (redundant array of independent disks; or redundant
array of inexpensive disks; depending on who you ask, both
are accepted as "correct") because there is nothing redundant
about RAID 0 as it offers the best read/write performance
but no fault-tolerance.
Click
here for the graphic
All
of the drives are written to sequentially (one after the other,
from A to B to C, etc.), as the data is broken down into 512
byte blocks up to 64KB blocks. This depends on the hardware
controller and / or the software configuration of that controller
or the operating system, as RAID 0 is supported in both hardware
and software solutions. In the above image, each column is
considered to be a physical drive with the letters representing
the datablocks.
RAID 0 is often referred to as striping or striping without
parity, and requires two physical hard drives at a minimum,
or at least two drives in the same system with the same amount
of RAID dedicated space. For example, it's possible to have
four 20GB hard drives in a system (80GB of total space), of
which you have dedicated 2 physical drives - 10GB partitions
from each, for a total of 20GB, to the RAID 0 array.
RAID 0 allows for the use of the entire dedicated space committed
to the array. If you have dedicated ten 20GB hard drives to
your RAID 0 array, you will have 200GB of space available
for use, as the overhead for the array is negligible.
The failure of a single drive in a RAID 0 array causes the
entire array to fail and will result in the loss of any data
that is not committed to backup.
RAID 1 - RAID 1 is often referred to as disk mirroring
when two different hard drives are used on the same IDE, SCSI
or RAID controller and disk duplexing when two different hard
drives are used on two different IDE, SCSI or RAID controllers.
(Normally they are paired by make - IDE to IDE, SCSI to SCSI,
RAID to RAID. While it is possible to mix them, it is not
recommended).
There is no striping of any kind in this
configuration; however, all of the data written to one disk
is duplicated to the other and this is where the fault tolerance
of this configuration exists. (This duplicate data is the
parity information that will be used to maintain the system
in the event of a drive failure.) This has the effect of allowing
twice the read data rate of a single drive, as either can
be accessed at a given time. (There is no write performance
increase, and the system will take a performance hit due to
the need to duplicate single write data to both drives.) This
type of RAID implementation has one of the highest overhead
needs of system or hardware resources of any of the RAID types,
and it is almost always deployed (following best practices)
as a HARDWARE implementation. (It can be done via the OS,
but this causes all of the processing load of the RAID overhead
to fall on the system processor. Inefficient, regardless of
the type and size of the processor in use.)
Click
here for the graphic
(Data is read from or written to A and A, or B and B or C
and C, etc., depending on where the data is held / going).
In the above image, each column is considered to be a physical
drive with the letters representing the datablocks.
RAID 1 requires a minimum of two physical hard drives of equal
space per array and the total amount of net space is half
of the total committed. Effectively, you will lose 50% of
the disk space committed, as the data is a mirror of itself.
Two 40GB drives (totaling 80GB of space) will yield approximately
40GB of usable space in the RAID 1 array.
A standard RAID 1 implementation of two drives can survive
the loss of a single disk and retain its data structure.
There is the ability in some Hardware RAID configurations
to utilize both hot-swappable (sometimes called hot-pluggable)
hard drives and online spares (sometimes called failover drives)
to extend the probability of data preservation in the event
of a failure.
Hot-swappable drives are usually connected to a server system
via an open faced cage in the front of the system, which allows
for the quick removal of a dead drive while the system is
powered on. Sometimes these drives are enclosed within the
case, but this is the exception not the rule. The use of Hot-swappable
drives means that while the system is up and running, you
can pull the failed hard drive and replace it with a replacement
drive and the data will be "rebuilt" to the replacement drive
via the stored redundant information. In the above example
of RAID 1, the mirror would be re-established on the new drive
by taking all of the data on the remaining disk and duplicating
it to the new drive. In a true parity situation, (such as
RAID 5, which will be discussed later) the parity information
that is spread out over all of the hard drives is what will
allow the data to be "added" to the replacement drive.
In the case of an online spare, an additional drive is committed
to the RAID array, allowing for a secondary failure recovery
point to exist. This additional drive's (or drives') sole
purpose is to sit idle UNTIL a failure occurs. At the point
of a single drive failure, the online spare would initialize
and immediately begin to work with the controller to restore
the RAID configuration to its fault tolerant configuration.
For example, if a RAID 1 configuration were deployed with
a single online spare, you would need three hard drives of
the same size. This would limit your usable logical space
to just 33% of the total amount dedicated. Three 20GB hard
drives total 60GB of space, but due to the fact that the array
is set up as a mirror, (data that is totally duplicated from
one drive to the mirror partner) AND that the third drive
has the singular task of "waiting" for a failure, the total
amount of usable space is 20GB.
This configuration allows for two drive failures, if the online
space is totally "rebuilt" before the second failure.
Let's say that on Friday evening around 9:00PM one of the
drives in your RAID 1 configuration fails. The RAID controller
takes the failed drive off-line and brings the spare online.
Your online spare becomes active and all of the data is restored
to the newly initialized spare. (The actual rebuild time will
vary on how much data needs to be copied and the rebuild priority
given. In this example let's say it's ten hours.)
If another failure occurs, either to the other existing RAID
1 drive or the newly initialize spare AFTER the rebuild period
(in this example, ten hours) the system WILL stay up and running.
If hot-swappable drives are being used, BOTH could be pulled
from the system and replaced with new drives and the controller
will take ONE of the two replacement drives and rebuild the
RAID 1 data to it. It will take the other replacement drive
and place it in an off-line status for use in the next drive
failure scenario.
If the second failure were to take place on either drive BEFORE
the rebuild period (in this example, ten hours) has an opportunity
to complete, the system WILL NOT stay up and it WILL FAIL.
RAID 5 - RAID 5 is often referred to as striping with
parity and has some similarities to RAID 0, the main difference
being that here there is fault tolerance and in RAID 0 there
is none. RAID 5 data is broken up into chunks anywhere in
size from 512 bytes blocks up to 64KB blocks, (this depends
on the hardware controller and / or the software configuration
of that controller or the operating system, as RAID 5 is supported
in both hardware and software solutions), and distributed
across all of the disks in the array, with parity information
being written to each drive.
Click
here for the graphic
In the above image, each column is considered to be a physical
drive with the letters representing the datablocks. It shows
that datablock "0" is written first to drive A and then to
B and then to C and then to D, with its parity information
set to the fifth drive. (If there was more data to the "0"
data block it would "wrap" back to the A drive and start the
process over.) The next batch of data (the "1" datablock)
starts on the A drive (although it could have been anywhere,
based on the last write or the controller algorithm) and continues
in the same fashion. You'll notice that this time however,
the location of the parity WRITE information is kept on a
different drive. (Drive D). It is this "spreading out" of
the parity (rebuild) information that gives this configuration
its fault tolerance.
RAID 5 needs to be deployed using a 3 disk minimum in its
standard configuration. (The example above shows 5 drives,
which is the "best practice" minimum suggestion.)
A RAID 5 configuration will allow for use of the total combined
space of all of the drives, minus a single drive. That is,
if five 20GB hard drives totaling 100GB of space are committed
to a RAID 5 array, the total usable space is 80GB. The "lost"
space is allotted for the use of the parity storage.
One key point to remember with RAID 5 is that all of the parity
information is not committed to a single drive. It is spread
out among all of the drives and while it does TOTAL the space
of a single drive, it is not stored ON a single.
RAID 5 can sustain the loss of a single drive (or more, in
the case of online spare usage which I'll cover in a minute.)
In the above image there are five drives. Let's assume that
the D drive fails and no online spare is in use. The remaining
four drives can continue to allow the system to operate, as
the "lost" data (D0, D2, D3 and D4) is "restored" for "direct"
access from the 0 parity, 2 parity, 3 parity and 4 parity
information, spread out on the other hard drives. The 1 parity
information is lost with the drive failure and it is partly
why this RAID array cannot sustain a second failure. Because
all of the other parity information is being used to re-create
(think "in the place of the D drive") the data lost in the
D drive failure and parity 1 is lost, ANOTHER failure would
not allow the array to remain intact and a failure would occur.
The case of a RAID 5 array with an online spare is a little
different.
The above example, assuming an online spare, would mean that
drive D would fail. The controller would mark D off-line and
active the online spare. The online spare would be populated
(rebuilt) with all of the data that was originally on drive
D.
D0, D2, D3 and D4 data would be restored to the newly initialized
drive from 0 parity, 2 parity, 3 parity and 4 parity from
drives A, B, C and E, and 1 parity would be rebuilt to the
newly initialized drive using A1, B1, C1, and E1. (The actual
rebuild time will vary on how much data needs to be copied
and the rebuild priority given. In this example let's say
it's ten hours.)
If another drive failure occurs AFTER ten hours has passed,
the system will continue to remain operational, as it would
function "directly" from the parity information, as shown
in the "no online spare" example. If the second drive failure
occurs BEFORE the ten hour threshold, the system will halt,
as the rebuild of all of the information would not have completed.
(The use of the online spare in RAID 5 in terms of utilization
are the same as they were in the RAID 1 section, an additional
drive is committed to the RAID array, allowing for a secondary
failure recovery point to exist. This additional drive's (or
drives') sole purpose is to sit idle UNTIL a failure occurs.
At the point of a single drive failure, the online spare would
initialize and immediately begin to work with the controller
to restore the RAID configuration to its fault tolerant configuration.
About the Author
Jason Zandri has worked as a consultant, systems engineer
and technical trainer for a variety of corporate clients in
Connecticut over the past five years and currently holds the
position of Technical Account Manager for Microsoft Corporation.
He has also written a number of COMPTIA and MICROSOFT prep
tests for Boson
Software and holds a number of certifications from both
companies. Currently, he writes part time for a number of
freelance projects, including numerous "HOW TO" and best practices
articles for 2000Trainers.com
and MCMCSE.com.
|
|
WebmasterFree
Download
|
|
|
For
more free downloads from WebmasterFree - Click
Here
|
|