New NFS to bring parallel storage to the masses
Sometime around the end of January or early February, the Internet Engineering Task Force will give its final blessing to the latest version of the venerable Network File System (NFS), version 4.1. While the authors of the standard have stressed that this is a minor revision of NFS, it does have at least one seemingly radical new option, called Parallel NFS (pNFS).
The "parallel" tag of pNFS means NFS clients can access large pools of storage directly, rather than go through the storage server. Unbeknown to the clients, what they store is striped across multiple disks, so when that data is needed it can be called back in parallel, cutting retrieval time even more. If you run a cluster computer system, you may immediately recognize the appeal of this approach.
"We're starting the process of feeding all these patches up to the
Linux NFS maintainers,
" said Brent Welch, the director of
software architecture for Panasas who is also one of that storage
company's contributors of the pNFS code. He noted that the work for the
prototyping and implementing pNFS in Linux, as part of NFS, has been
going on for about two years. Ongoing work has included updating both the NFS
client and NFS server software.
The code will be proposed for the Linux kernel in two sets, according to Welch. The first set will have the basic procedures for setting up and tearing down pNFS sessions, using Remote Procedure Call (RPC) operations for exchanging IDs and initiating and ending sessions. The development teams are gunning to have this basic outline of pNFS included in the 2.6.30 version of the kernel. The second set, ready for the 2.6.31 version of the kernel, will be a larger patch, including the I/O commands for accessing and changing file layouts as well as reading and writing data. Given that it will take a few more months after the 2.6.31 Kernel for it to be picked up by the major distributions, pNFS probably won't start to be deployed by even the most ambitious IT shops at least until the early part of 2010.
We all know NFS. It allows client machines to mount Unix drives that reside across the network as if they were local disks. Many Network Attached Storage (NAS)-based storage arrays use NFS. With NAS, a lot of hard drives all lie behind a single IP address, the drives are all managed by the NAS box. NAS allows organizations to pool storage, so storage administrators can more fluidly (and hence efficiently) allocate that storage across all users.
In a 2004 problem statement, two of the developers responsible for getting pNFS in motion, Panasas chief technology officer Garth Gibson and Network Appliance (NetApp) engineer Peter Corbett, explained the limitations of this approach, especially in high performance computing environments:
In a nutshell, the potential roadblock with NAS, or any type of NFS-based network storage, is the NAS head, or server, they explained. If too many of your clients hit the NAS server at the same time, then the I/O slows for everyone. You could go back to direct access, but you lose the efficiencies of pooled storage. For cluster computer systems, in which dozens of nodes can be working on the same data set, such partitioned storage just isn't feasible. Nor are multiple storage servers: An NFS-based system can not support multiple servers writing to the same file system.
Gibson and Corbett were early champions of developing pNFS, along with Los Alamos National Laboratory's Gary Grider. Additional work was carried out by engineers at EMC, Panasas, NetApp and other companies. The University of Michigan's Center for Information Technology Integration (CITI), along with members of the IBM Almaden Research Center are developing a pNFS implementation for Linux, both for clients and storage servers.
pNFS will allow clients to connect directly to the storage devices they need, rather than go through a storage gateway of some sort. The folks behind pNFS like to say that their approach separates the control traffic from the data traffic. When a client requests a particular file or block of storage, it sends a request to a server called the Metadata Server (MDS), which returns a map of where all the data resides within the storage network. The client can then access that data directly, according to permissions set by the file system. Once that storage is altered, the client notifies the MDS of the changes, which updates the file layout.
Since pNFS allows clients to talk directly to the storage devices, as well as permitting client data to be
striped across multiple storage devices, the client can enjoy a higher I/O rate than would be had simply by going through a single NAS head—or by
communicating with a single storage server. In 2007, three developers from
the IBM Almaden Research Center, Dean Hildebrand, Marc Eshel and Roger
Haskin, demonstrated [PDF]
at the Supercomputing 2007 conference (SC07) how three clients could saturate a 10 gigabit
link by drawing data from 336 Linux-based storage devices. Such
throughput "would be hard to achieve using standard NFS in terms of
accessing a single file,
" Hildebrand said. "We wanted to
show that pNFS could scale to the network hardware available.
"
pNFS is largely made up of three sets of protocols. One protocol is for the mapping, or layout, of resources, which resides on the client. It interprets and utilizes the data map returned from the metadata server. The second is the transport protocol, which also resides on the client. It coordinates data transfer between the clients and storage devices. The transport protocol handles the actual I/O with the storage devices. A control protocol will synchronize the metadata server with the storage devices. This last protocol is the only one not specified by NFS—It will be left to storage the vendors, though much of the work that this protocol will do can be codified in NFS commands.
pNFS can work with three types of storage—file-based storage, object-based storage and block-based storage. The NFSv4.1 protocol itself contains the file-based storage protocol. Additional RFCs are being developed for object and block protocols. File-based storage is what most system administrators think of as storage; it is the standard approach of nesting files within a hierarchical set of directories. Block-based storage is used in Storage Area Networks (SANs), in which the applications access disk space directly, by sending the Small Computer System Interface (SCSI) commands over Fibre Channel, or, increasingly of late, TCP/IP via the Internet SCSI (iSCSI) protocol. Object-based storage is somewhat of a newer beast, a parallel approach that involves embedding the data itself with self-describing metadata.
A word on semantics: Keep in mind that just as NFS is not a file system itself, neither is pNFS. NFS provides the protocols to work with remote files as if they were local. Likewise, pNFS offer the ability to work with files managed by a parallel file system as if they were on a local drive, handling such tasks as setting permissions and ensuring data integrity. Fortunately, a number of parallel file systems have been spawned over the past few years that should work easily with pNFS. On the open source front, there is the the parallel Virtual File System (pVFS). Perhaps the most widely-used open-source parallel file system now in use is Lustre, now overseen by Sun Microsystems. On the commercial front, Panasas' PanFS file system has been successfully deployed in high performance computer clusters, as has IBM's General Parallel File System (GPFS). All of these approaches use a similar idea—let the clients talk to the storage server's devices directly, while having some form of metadata server keep track of the storage layout. But most other options rely on using a single vendor's gear.
"The main advantage [to using pNFS] is expected to be on the client
side
", noted CITI programmer J. Bruce Fields, who does the NFS 4.1
testing on Linux servers. With most parallel file systems you have to do some
kernel reconfigurations on the clients so that they can work with the file systems. With the prototype
Linux client, you can run a standard mount command and get the files you need. "
The client will automatically negotiate
pNFS and find the data servers. By the time we're done that should work
on any out-of-the-box Linux client from the distribution of your
choice
", he says.
The advantage that pNFS will bring is familiarity, and that it will come already built in as part of NFS. Since NFS is a standard component in almost all Linux kernel builds, that will greatly reduce the amount of work administrators need to do to set up a parallel file system for Linux servers. Most administrators are more familiar with the general operating procedures of NFS, much more so than dealing directly with, say, Lustre, which requires numerous kernel patches and a different mindset when it comes to understanding commands.
pNFS should help storage vendors as well, as they will not have to port client software to numerous Linux distributions. Welch, for instance, noted that Panasas has to maintain code for dozens of different Linux distributions. Instead, they can rely on NFS and focus on storage devices. Already, Panasas, NetApp, EMC, IBM and have all promised [PDF] to support pNFS in at least some of their storage products, according to a collective talk some of the developers gave last month at the SC08 conference. Sun Microsystems also plans to support pNFS in Solaris.
And while much of the early focus of pNFS has been for large scale
cluster operations, one day it may be feasible that even workstations
and desktops will use pNFS in some form. LANL's Gary Grider pointed out that,
"at some point, having several teraflops may even be possible in
your office, in which case you may need something more than just NFS for
data access for such a powerful personal system. pNFS may end up being
handy in this environment as well.
"
Indeed. Once upon a time we were limited to working on files on our own machines,
FTP'ing in anything that was located elsewhere. But NFS allowed us to mount drives across
the network with a relatively simple command. Now, pNFS may take simplify things a step further,
by allowing to us to pull in and write large files or myriad files with a speed that we can now only dream about. At least that is the promise of pNFS.
Index entries for this article | |
---|---|
Kernel | Clusters/Filesystems |
Kernel | Network filesystems |
GuestArticles | Jackson, Joab |
New NFS to bring parallel storage to the masses
Posted Jan 22, 2009 4:21 UTC (Thu)
by jwb (guest, #15467)
[Link] (7 responses)
Posted Jan 22, 2009 4:21 UTC (Thu) by jwb (guest, #15467) [Link] (7 responses)
New NFS to bring parallel storage to the masses
Posted Jan 22, 2009 12:50 UTC (Thu)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Jan 22, 2009 12:50 UTC (Thu) by epa (subscriber, #39769) [Link] (1 responses)
New NFS to bring parallel storage to the masses
Posted Jan 22, 2009 16:18 UTC (Thu)
by eli (guest, #11265)
[Link]
Posted Jan 22, 2009 16:18 UTC (Thu) by eli (guest, #11265) [Link]
http://www.codemonkey.org.uk/projects/fsx/
New NFS to bring parallel storage to the masses
Posted Jan 22, 2009 16:21 UTC (Thu)
by snitm (guest, #4031)
[Link] (4 responses)
Posted Jan 22, 2009 16:21 UTC (Thu) by snitm (guest, #4031) [Link] (4 responses)
New NFS to bring parallel storage to the masses
Posted Jan 22, 2009 16:56 UTC (Thu)
by jwb (guest, #15467)
[Link] (3 responses)
Posted Jan 22, 2009 16:56 UTC (Thu) by jwb (guest, #15467) [Link] (3 responses)
New NFS to bring parallel storage to the masses
Posted Jan 22, 2009 20:32 UTC (Thu)
by felixfix (subscriber, #242)
[Link]
Posted Jan 22, 2009 20:32 UTC (Thu) by felixfix (subscriber, #242) [Link]
New NFS to bring parallel storage to the masses
Posted Jan 24, 2009 14:14 UTC (Sat)
by xav (guest, #18536)
[Link] (1 responses)
Posted Jan 24, 2009 14:14 UTC (Sat) by xav (guest, #18536) [Link] (1 responses)
New NFS to bring parallel storage to the masses
Posted Jan 24, 2009 17:19 UTC (Sat)
by jwb (guest, #15467)
[Link]
Posted Jan 24, 2009 17:19 UTC (Sat) by jwb (guest, #15467) [Link]
New NFS to bring parallel storage to the masses
Posted Jan 25, 2009 22:04 UTC (Sun)
by job (guest, #670)
[Link]
Posted Jan 25, 2009 22:04 UTC (Sun) by job (guest, #670) [Link]
New NFS to bring parallel storage to the masses
Posted Jan 29, 2009 16:26 UTC (Thu)
by malcolmparsons (guest, #46787)
[Link] (1 responses)
Posted Jan 29, 2009 16:26 UTC (Thu) by malcolmparsons (guest, #46787) [Link] (1 responses)
s/disc/disk/
New NFS to bring parallel storage to the masses
Posted Jan 29, 2009 16:48 UTC (Thu)
by jake (editor, #205)
[Link]
Posted Jan 29, 2009 16:48 UTC (Thu) by jake (editor, #205) [Link]
fixed, thanks!
jake