What is FTP?
FTP (“File Transfer Protocol”) is the granddaddy of all modern TCP-based file transfer protocols.
Regardless of your personal feelings or experiences with this venerable and expansive protocol, you must be fluent in FTP to be effective in any modern file transfer situation because all other protocols are compared against it. You should also know what FTP/S is and how it works, as the combination of FTP and SSL/TLS will power many secure file transfers for years to come.
First Generation (1971-1980)
The original specification for FTP (RFC 114) was published in 1971 by Abhay Bhushan of MIT. This standard introduced down many concepts and conventions that survive to this day including:
- ASCII vs. “binary” transfers
- Username authentication (passwords were “elaborate” and “not suggested” at this stage)
- “Retrieve”, “Store”, “Append”, “Delete” and “Rename” commands
- Partial and resumable file transfer
- A protocol “designed to be extendable”
- Two separate channels: one for “control information”, the other for “data”
- Unresolved character translation and blocking factor issues
Second Generation (1980-1997)
The second generation of FTP (RFC 765) was rolled out in 1980 by Jon Postel of ITI. This standard retired RFC 114 and introduced more concepts and conventions that survive to this day, including:
- A formal architecture for separate client/server functions and two separate channels
- A minimum implementation required for all servers
- Site-to-site transfers (a.k.a. FXP)
- Using “TYPE” command arguments like “A” or “E” to ask the server to perform its best character and blocking translation effort
- Password authentication and the convention of client-side masking of typed passwords
- The new “ACCT” command to indicate alternate user context during authentication
- New “LIST”, “NLST” and “CWD” commands that allowed end users to experience a familiar directory-based interface
- New “SITE” and “HELP” commands that allowed end users to ask the server what it was capable of doing and how to access that functionality
- Revised “RETR”, “STOR”, “APPE”, “DELE”, “RNFR/RNTO” and “REST” commands to replace the even shorter commands from the previous version.
- A new “NOOP” command to simply get a response from the remote server.
- Using “I” to indicate binary data (“I” for “Image”)
- Passive (a.k.a. “firewall friendly”) transfer mode
- The 3-digits-followed-by-text command response convention. The meaning of the first digit of the 3-digit command responses was:
- 1xx: Preliminary positive reply; user should expect another Final reply.
- 2xx: Final positive reply: whatever was just tried worked. The user may try another command.
- 3xx: Intermediate positive reply: the last command the user entered in a multi-command chain worked and the user should enter the next expected command in the chain. (Very common during authentication, e.g., “331 User name okay, need password”)
- 4xx: Intermediate positive reply: the last command the user entered in a multi-command chain did not work but the user is encouraged to try it again or start the command sequence over again. (Very common during data transfer blocked by closed ports, e.g., “425 Can’t open data connection”)
- 5xx: Final negative reply: whatever was just tried did not work.
A new, second-generation specification (RFC 959) was rolled out in 1985 by Jon Postel and Joyce Reynolds of ISI. This standard retired RFC 765 but introduced just a handful of new concepts and conventions. The most significant addition was:
- New “CDUP”, “RMD/MKD” and “PWD” commands to further enhance the end user’s remote directory experience
At this point the FTP protocol had been battle-tested on almost every IP-addressable operating system ever written and was ready for the Internet to expand beyond academia. Vendor-specific or non-RFC conventions filled out any niches not covered by RFC 959 but the core protocol remained unchanged for over ten years.
From 1991 through 1994 , FTP arguably wore the crown as the “coolest protocol on campus” as the Gopher system permitted users to find and share files and content over high-speed TCP links that were previously isolated on scattered dial-up systems.
FTP under RFC 959 even survived the invention of port-filtering and proxy-based firewalls. In part, this was due to the widely-implemented PASV command sequence which permitted data connections to become inbound connections. In part this was also because firewall and even router vendors took special care to work around the multi-port, session-aware characteristics of this most popular of protocols, even when implementing NAT.
Third Generation (1997-current)
The third and current generation of FTP was a reaction to two technologies that RFC 959 did not address: SSL/TLS and IPv6. However, RFC 959 was not replaced in this generation: it has instead been augmented by three key RFCs that address these modern technology challenges.
Third Generation (1997-current): FTPS
The first technology RFC 959 did not address was the use of SSL (later TLS) to secure TCP streams. When SSL was combined with FTP, it created the new FTPS protocol. Several years earlier Netscape had created “HTTPS” by marrying SSL to HTTP; the “trailing S” on “FTPS” preserved this convention.
The effort to marry SSL with FTP led to RFC 2228 by Bellcore and a three-way schism in FTPS implementation broke out almost immediately. This schism remains in the industry today but has consolidated down to just two main implementations: “implicit” and “explicit” FTPS. (There were originally two competing implementations of “explicit” FTPS, but one won.)
The difference between the surviving “implicit” and “explicit” FTPS modes is this:
Implicit FTPS requires the FTP control channel to negotiate and complete an SSL/TLS session before any commands are passed. Thus, any FTP commands that pass over this kind of command channel are implicitly protected with SSL/TLS.
Many practitioners regard this as the more secure and reliable version of FTPS because no FTP commands are ever sent in the clear and intervening routers thus have no chance of misinterpreting the cleartext commands used in explicit mode.
Explicit FTPS requires the FTP control channel to connect using a cleartext TCP channel and then optionally encrypts the command channel using explicit FTP commands such as “AUTH”, “PROT” and “PBSZ”.
Explicit FTPS is the “official” version of FTPS because it is the only one specified in RFC 4217 by Paul Ford-Hutchinson of IBM. There is no RFC covering “implicit mode” and there will likely never be an “implicit mode” RFC. However explicit mode’s cleartext commands are prone to munging by intermediate systems or credential leakage through the occasional “USER” command sent over the initial cleartext channel.
Almost all commercial FTP vendors and many viable noncommercial or open source FTP packages have already implemented RFC 2228; most of these same vendors and packages have also implemented RFC 4217.
Third Generation (1997-current): IPv6
The second technology RFC 959 did not address was IPv6. There are many structures in the 1971-1985 RFCs that absolutely require IPv4 addresses. RFC 2428 by (SEVERAL FOLKS) answered this challenge. This specification also took early and awful experiences with FTPS through firewalls and NAT’ed networks into account by introducing the clever little EPSV command .
However, the EPSV command did not, by itself, address the massive problems the file transfer community was having getting legacy clients and servers (i.e., those that didn’t implement RFC 2428) to work through a still immature collection of firewalls in the early 2000s. To address this need, a community convention around server-defined ranges of ports grew up around 2003. With this convention in place, a server administrator could open port 21 and a “small range of data ports” in his firewall to limit the exposure he/she would otherwise have taken opening up all high ports to the FTP server.
Despite the fact that RFC 2428 is now (as of 2011) thirteen years old, support for IPv6 and the EPSV command in modern FTP is still uncommon. The slow adoption of IPv6 and FTP’s identity as a protocol “designed to be extendable” have allowed server administrators and FTP developers to avoid the remaining issue by clinging to IPv4 and implementing firewall and NAT workarounds for over a decade. Nonetheless, support for IPv6 and EPSV remains a formidable defense against the inevitable future; it would be wise to treat file transfer vendors and open source projects who fail to implement or show no interest in RFC 2428 with suspicion.
BEST PRACTICES: A minimally interoperable FTP implementation will completely conform to RFC 959 (FTP’s 1985 revision). All credible file transfer vendors and projects have already implemented RFC 2228 (adding SSL/TLS), and should offer client certificate authentication support as part of their current implementation. Any vendor or project serious about long term support for their FTP software has already implemented RFC 2428 (IPv6 and EPSV) or has it on their development schedule. Forward-looking vendors and projects are already considering various FTP IETF drafts that standardize integrity checks and more directory commands.