GridFTP

From The Right Wiki
Jump to navigationJump to search

GridFTP is an extension of the File Transfer Protocol (FTP) for grid computing.[1] The protocol was defined within the GridFTP working group of the Open Grid Forum.[2][3][4] There are multiple implementations of the protocol; the most widely used is that provided by the Globus Toolkit.[citation needed] The aim of GridFTP is to provide a more reliable and high performance file transfer, for example to enable the transmission of very large files. GridFTP is used extensively within large science projects such as the Large Hadron Collider and by many supercomputer centers and other scientific facilities. GridFTP also addresses the problem of incompatibility between storage and access systems. Previously, each data provider would make their data available in their own specific way, providing a library of access functions. This made it difficult to obtain data from multiple sources, requiring a different access method for each, and thus dividing the total available data into partitions. GridFTP provides a uniform way of accessing the data, encompassing functions from all the different modes of access, building on and extending the universally accepted FTP standard. FTP was chosen as a basis for it because of its widespread use, and because it has a well defined architecture for extensions to the protocol (which may be dynamically discovered). Numerous GridFTP clients have been developed. The Globus Online software-as-a-service system is particularly popular.[citation needed]

Features of GridFTP

GridFTP integrates with the Grid Security Infrastructure, which provides authentication and encryption to file transfers, with user-specified levels of confidentiality and data integrity, also for cross-server transfers (what FTP calls the File eXchange Protocol, FXP). GridFTP achieves much greater use of bandwidth than conventional data stream technology by using multiple simultaneous TCP streams.[5] Files can be downloaded in pieces simultaneously from multiple sources; or even in separate parallel streams from the same source, which is still able to make better use of the bandwidth. Striped and interleaved transfers, again either from multiple or single sources, allow further speed increases. Although FTP has the ability to resume an interrupted file transfer from a specific point in a file, it does not support the transmission of only a certain portion of a file. GridFTP allows a subset of a file to be sent. Such a feature is useful in applications where only small sections of a very large data file are required for processing (a motivating example being the processing of data from a high energy physics experiment, a traditional use of Grid technology). GridFTP provides a fault tolerant implementation of FTP, to handle network unavailability and server problems. Transfers can also be automatically restarted if a problem occurs. The underlying TCP connection in FTP has numerous settings such as window size and buffer size. GridFTP allows automatic (or manual) negotiation of these settings to provide optimal transfer speeds and reliability (optimal settings are likely to be different with large files and for large groups of files).

References

  1. Allcock, W.; Bresnahan, J.; Kettimuthu, R.; Link, M. (2005). "The Globus Striped GridFTP Framework and Server". ACM/IEEE SC 2005 Conference (SC'05). p. 54. doi:10.1109/SC.2005.72. ISBN 1-59593-061-2. S2CID 1039563.
  2. "Research data management simplified. | globus". www.globus.org. Retrieved 2020-06-09.
  3. Allcock, W. (April 2003). "GridFTP: Protocol Extensions to FTP for the Grid" (PDF).
  4. Mandrichenko, Igor (July 11, 2003). "GridFTP Protocol Improvements" (PDF).
  5. Sarro, Luis Manuel. (2012). Astrostatistics and Data Mining. Eyer, Laurent., O'Mullane, William. Dordrecht: Springer. ISBN 978-1-4614-3323-1. OCLC 809767631.