SimpleSSH: A Basic Interoperability Profile for the SSH Protocol ================================================================ Peter Gutmann, circa 2009. Abstract -------- The widespread adoption of SSH has seen the emergence of numerous SSH implementations, but also numerous interoperability problems among many of the non-mainstream versions. This problem arises because the complexity and in places ambiguity of the specification makes it possible to create specification-compliant but non-interoperable implementations, and is exacerbated by the fact that in many cases where SSH is used, for example for the control interface of an embedded device or a Windows file transfer facility, the developers are required to implement a specification designed to provide a full-blown Unix VPN solution even though in their case they'll never use the majority of its facilities. This document describes a simplified profile of SSH that provides a standard minimal feature set for use in applications that just require a basic no-frills secure channel from A to B, building on a decade of SSH implementation experience to avoid known problem areas in the SSH protocol. As a side-effect this minimal profile reduces the large attack surface of SSH to a more manageable level by eliminating much of the complexity in the protocol. Introduction ------------ The adoption of SSH has spawned numerous implementations for a variety of platforms, may of which are some way removed from the original Unix design target. Because of the complexity of the SSH protocol it's become somewhat standard practice to use the most common SSH server (let's call it O) and the most common SSH client (let's call it P) as a form of compliance-test suite. Once a new SSH implementation can successfully connect to O or receive a connection from P, it's declared complete and shipped to users. Unfortunately when two implementations that aren't O or P meet, all manner of problems can arise. Since O and P communicate using a stereotyped subset of SSH's capabilities, any deviation from this stereotyped exchange can result in anything from the remote system crashing or hanging through to security breaches via carefully-chosen (but fully specification-compliant) message exchanges (each section of this document contains a rationale explaining the reasons for any changes made, along with specific examples of how the original version has caused interoperability or security problems). Part of this problem is due to the fact that SSH started life as a general- purpose Unix VPN solution (whether it was deliberately intended as such or not), with the result that it contains many facilities that only make sense in a Unix environment (ptys, shells, a stderr facility, PAM authentication, and so on). Implementers creating SSH servers or clients for other systems typically resort to going through the motions sufficiently convincingly to satisfy the other side (where the other side is either O or P as appropriate) but no further. In order to provide for a simplified secure-pipe implementation, SimpleSSH cuts down the number of options and capabilities to the minimum required for this purpose, creating a profile that's tailored for a single channel of either encrypted terminal traffic (for example to remotely administer an embedded network device like a router) or file transfer (for example to securely copy a file from one Windows PC to another). The intent is to increase interoperability among implementations that don't require a full Unix VPN solution by removing much of the protocol's complexity and ambiguity. A side effect of this removal of complexity and locking down of details is that the SSH attack surface is greatly reduced, since only a small number of (hopefully) well-defined operations are now possible, and in particular ones that have proven particularly troublesome in the past are removed and left to implementations of the full SSH protocol. [NB: The above is intended purely to provide a problem statement, it's not meant as a criticism of any SSH implementation]. Design Goals ------------ The overall design goal for SimpleSSH is to provide a basic profile of the SSH protocol to simplify implementation and increase interoperability among implementations that don't require the full-scale VPN capabilities provided by the SSH protocol, at the same time reducing the attack surface by constraining the protocol flow to a subset of basic, well-defined operations. This overall design goal is achieved through the following sub-goals: - Ambiguous protocol operations are locked down. For example the SSH specification leaves the order of some messages undefined, leading to interoperability problems among implementations that don't follow certain stereotyped patterns. - Known problem areas are eliminated if possible. For example rehandshakes have been an ongoing source of trouble for implementations and are only really useful for long-running VPN sessions (if that), so they're left to implementations of the full SSH protocol to support. - Complex and error-prone aspects of the protocol are reduced to basic operations sufficient for most tasks. For example the SSH authentication process allows for arbitrarily iterated exchanges with partial results triggering further exchanges, when a typical encrypted-telnet session to a router requires no more than a user name and password. - Encompassing all of the above, a side-effect of the simplification of the protocol is a significant reduction in its attack surface, which is of particular importance in embedded systems which often contain very cut-down implementations due to resource constraints, or Windows implementations that merely contain SSH functionality as a feature-checkbox add-on to an established application rather than being the entire application as O and P are. Signalling SimpleSSH Use ------------------------ The SSH protocol contains a (currently unused) flags field in the header of the SSH_MSG_KEXINIT message for use with future extensions. This profile uses two bits in the field to indicate an implementation's compliance with SimpleSSH. The two flags are SSH_FLAG_MAY_SIMPLESSH to indicate that the peer may, at their discretion, choose to employ the SimpleSSH profile to communicate, and SSH_FLAG_MUST_SIMPLESSH, to indicate that the peer must use the SimpleSSH profile to communicate: Symbolic name Value ------------- ------ SSH_FLAG_MAY_SIMPLESSH 0x01 SSH_FLAG_MUST_SIMPLESSH 0x02 If the SSH_FLAG_MUST_SIMPLESSH is asserted by either side then only SimpleSSH is permitted. If one side is unwilling or unable to communicate using SimpleSSH then it must discontinue the handshake by disconnecting with an SSH_MSG_DISCONNECT. If the SSH_FLAG_MAY_SIMPLESSH is asserted by both sides then the handshake continues using the SimpleSSH profile. If only one side asserts SSH_FLAG_MAY_SIMPLESSH then the handshake continues with standard SSH. In the event that an implementation (erroneously) asserts both SSH_FLAG_MAY_SIMPLESSH and SSH_FLAG_MUST_SIMPLESSH, the behaviour for SSH_FLAG_MUST_SIMPLESSH takes precedence. Rationale --------- The SimpleSSH flags are intended to be used as a straightforward means for a client or server to announce their intentions (or lack thereof) in regard to using SSH. If either side definitely must use SimpleSSH (either for implementation or security reasons) then the semantics of the SSH_FLAG_MUST_SIMPLESSH ensure that both sides use SimpleSSH, or the handshake negotiations end. If either side can optionally use SimpleSSH and the other side decides not to then the handshake continues with standard SSH semantics. In other words in the very first message exchange the two sides decide whether they wish to continue using the full SSH protocol or the SimpleSSH profile. If SimpleSSH is being used for security reasons then there is a potential risk of a rollback attack in which an attacker clears the SSH_FLAG_MUST_SIMPLESSH / SSH_FLAG_MAY_SIMPLESSH flags in the first message to cause a fallback from the restricted SimpleSSH to the full SSH protocol and then employs an attack that takes advantage of the SSH protocol's complexity before the final authentication exchange can detect the change to the first message. If SSH_FLAG_MUST_SIMPLESSH was asserted then this attack isn't possible because any attempt to perform protocol steps outside the SimpleSSH framework should cause the handshake to fail. However if SSH_FLAG_MAY_SIMPLESSH was asserted then this rollback can't be easily detected until the final authentication step. If the motivation for the use of SimpleSSH is primarily security then SSH_FLAG_MUST_SIMPLESSH should be used in preference to SSH_FLAG_MAY_SIMPLESSH in order to force the use of the restricted SimpleSSH profile. Neither RFC 4250 nor RFC 4253 define any process for assigning values to the flags field. In the absence of any other information, this document assumes that the process specified in RFC 4250 for SSH assigned numbers will be followed for any future extensions to this field. SSH Transport ------------- When performing the protocol version exchange both sides send a single line of version information with no additional preceding lines of data. The version ID is formatted as "SSH-vvv-name-yyy", where 'vvv' is the SSH version (currently '2.0'), 'name' is the SSH implementation name, for example 'FooSSH', and 'yyy' is an optional software version, for example '1.95'. For a hypothetical FooSSH the version string would be "SSH-2.0-FooSSH-1.95", or "SSH-2.0-FooSSH" if the version string is omitted. Ref: TRANS 4.2. When sending the SSH_MSG_KEXINIT the server always speaks first. The client waits for the server's SSH_MSG_KEXINIT and then responds in kind. Since the client is responding to the server's options its own SSH_MSG_KEXINIT contains only the single algorithm choice that it prefers for each option, not the full range of available algorithm options. The complex algorithm-matching process outlined in the SSH specification is not required. Ref: TRANS 6.1. The algorithm pairs specified in the SSH_MSG_KEXINIT (client-to-server and server-to-client) are the same in both directions. In other words if the client-to-server algorithm list is "a,b,c" then the server-to-client list must also be "a,b,c". Ref: TRANS 6.1. Key exchange guessing is not used, i.e. first_kex_packet_follows in SSH_MSG_KEXINIT is always false. The only key and signature format used is "ssh-rsa" or "ssh-dss", not "spki- sign-rsa" or "pgp-sign-dss" or "x509v3-sign-rsa" or ... Ref: TRANS 5.6. As for SSH_MSG_KEXINIT, the server speaks first for SSH_MSG_NEWKEYS. Ref: TRANS 6.3. DH_GEX_REQ implements SSH_MSG_KEX_DH_GEX_REQUEST, not SSH_MSG_KEX_DH_GEX_REQUEST_OLD. [NB: Could this be a problem? At one point a number of implementations would fail if sent a SSH_MSG_KEX_DH_GEX_REQUEST rather than a SSH_MSG_KEX_DH_GEX_REQUEST_OLD]. Rehandshake is never performed. Ref: TRANS 8. Rationale --------- The SSH specification allows the version string to be preceded by arbitrary amounts of free-form text. This comes as a complete surprise to many implementations, who simply disconnect. The version string itself has no defined format (apart from the "SSH-2.0" at the start) which causes considerable problems for implementations that parse the version string to work around bugs (or at least specification-compliant but unexpected features), with the result that developers resort to complex regular-expression parsing front-ends to sort out the different versions. This is further complicated by the fact that the lack of a clear format means that some SSH implementations use ID formats that result in them being mis- identified as older (buggy) versions of other implementations, with the inevitable effect on interoperability. Requiring a basic ID in a predictable format should remove these problems. Since the software version reveals details about potentially vulnerable implementations, it's left optional. The SSH_MSG_KEXINIT is one of the locations where the SSH specification leaves the order of messages undefined so that if two implementations that disagree over the order meet, the result is deadlock. Alternatively, both sides can speak at the same time, with the specification containing an awkward conflict- resolution mechanism that complicates implementation and leads to interoperability problems. This profile simplifies the process by requiring that the server speak first, offering its available algorithms and mechanisms, and the client respond, specifying in its response the one algorithm that it has chosen for each purpose. In this manner the complexity and guesswork of the initial negotiations vanish. The SSH specification (theoretically) allows for different algorithms to be used in different directions. A straw poll on the SSH list indicated that nothing actually does this, this profile makes the behaviour explicit. A number of SSH key and signature formats are underspecified or ambiguous, with some actually being dropped as the RFC draft progressed when no-one could figure out what the format required. This profile simplifies implementation requirements by requiring only the two universally-supported formats "ssh-rsa" and "ssh-dss", and by extension any future successors to these well-defined formats, for example one that replaces the SHA-1 used in "ssh-rsa" with SHA-2 while retaining the same general format. The intent is not to prohibit any future use of new key and signature formats but to select the subset of currently-defined formats that are universally supported and interoperable with other implementations. SSH_MSG_NEWKEYS is another undefined-order message in the pattern of SSH_MSG_KEXINIT, leading to similar problems. The SSH rehandshake process is probably the single biggest problem area in the specification, with things becoming so bad at one point that some major implementations would detect when they were talking to a different implementation and disabling rehandshake completely in order to ensure interoperability. Many of the non-mainstream versions either don't support rehandshake at all or if sent an SSH_MSG_KEXINIT packet in the middle of an ongoing exchange eventually become confused, leading to communications breaking down. This leads to erratic, hard-to-diagnose errors (the exact details depend on what the SSH layers are doing at the time of the problem), typically a bad-packet error when the other side tries to interpret a connection-layer packet as part of the rehandshake, or when the two sides disagree on when to switch keys and one of the two decrypts with the wrong keys and gets a garbled packet type. To make things even messier, neither side can avoid the problem by ignoring the SSH_MSG_KEXINIT because the lack of SSG_MSG_WINDOW_ADJUST messages will mess up flow control and lead to deadlock. Rehandshake is something that's really only necessary for long-running VPN sessions (if that), the sort of thing that's left to the full SSH protocol. Removing it in SimpleSSH acknowledges the fact that many implementations don't support it anyway, and eliminates a significant source of interoperability problems. SSH Authentication ------------------ This profile simplifies the SSH authentication process by requiring a straightforward request/response exchange beginning with an optional request for available authentication methods followed by one or more authentication attempts, with retries only being permitted if the authentication method is "password". The exact message flow is: Step 0 (optional): Client sends SSH_MSG_USERAUTH_REQUEST with method "none" to query available authentication method types. Server responsd with SSH_MSG_USERAUTH_FAILURE listing available methods. Step 1: Client sends SSH_MSG_USERAUTH_REQUEST with method "password" or "publickey" and password data or a digital signature as appropriate. Step 2, one of: a. Server responds with SSH_MSG_USERAUTH_SUCCESS and the authentication exchange terminates. b. Server responds to method "password" with SSH_MSG_USERAUTH_FAILURE, the client may retry step 1 if permitted by the server as described in the SSH specification. c. Server responds to method "publickey" with SSH_MSG_USERAUTH_FAILURE and the authentication exchange terminates. The server or client may only send one message at each step in the above exchange, after which they must stop and wait for the other side's response. Step 0 is optional, a server must be able to process an authentication request with method "password" or "publickey" without first requiring a wakeup call with method "none". Because the authentication is atomic and cannot be performed in parts the partial_success flag is always false. If the authentication is iterated to allow a user to retype their password then only the authenticator (i.e. the password) but not any other portions of the message such as the user name, service name, or method name, can change. The server may only advertise authentication methods that it supports, and the client may only send authentication requests that it knows that the server supports (this may sound redundant but the SSH specification specifically allows the server to request that the client use an authentication method that the server knows it can't support, and vice versa). For any SSH_MSG_USERAUTH_REQUEST other than the optional initial one with method name "none", if the server responds with SSH_MSG_USERAUTH_FAILURE it must return an available_auth_types that matches the one used by the client if the authentication failed, or any available_auth_types containing the available authentication methods if the wrong authentication method was used. In other words if the client supplied the wrong password then the SSH_MSG_USERAUTH_FAILURE available_auth_types will be "password" and if the client used password authentication when public-key authentication was required then the SSH_MSG_USERAUTH_FAILURE available_auth_types will be "publickey". If public-key authentication is used then the client must send a signature packet directly, without any of the additional message exchanges described in the specification. In other words the only public key sub-message permitted is one that corresponds to the (unnamed) flag parameter in the SSH_MSG_USERAUTH_REQUEST message being set to true. In addition to the standard list of available authentication methods the server can also respond with the method "no-auth" to indicate that no authentication is required, for example if the authentication is performed via external means or via the protocol that's being tunnelled over the SSH link. [How can this be handled properly? No-auth currently relies on side-effects of implementations to work, see the rationale]. The SSH specification recommends that clients be given up to 10 minutes and 20 retries to get their password right. Although this profile leaves issues such as setting bounds for authentication attempts as a server configuration issue, in the light of DoS and SSH port-scanning attacks a significantly lower timeout and a retry limit set to the de facto industry standard of three attempts is recommended. Rationale --------- As currently specified the SSH authentication exchange can be iterated arbitrarily, the authentication is carried out in bits and pieces with malicious clients able to change the details at each iteration, the client is allowed to send requests that it knows the server can't handle (the server is supposed to ignore them and wait for other requests and in compensation is allowed to instruct the client to use authentication methods that the server knows it can't handle (this is explicitly stated in the specification!)), and the client can spray requests at a server without having to wait for responses, with complicated races possible if a request at position n results in a further exchange of messages but gets overtaken by the request already sent at position n+1, and so on. Ref: AUTH 3.1. From experimentation with sending permitted but slightly unexpected sequences of authentication requests to servers it seems to be mostly coincidence that some servers can handle authentication, since any deviation from the stereotyped pattern set by a few widespread clients, typically P, often results in the server becoming profoundly confused and either disconnecting suddenly, hanging, or (in a few notable cases) allowing access when it shouldn't. In order to simplify the authentication exchange and reduce its considerable attack surface this profile reduces the arbitrarily complex process to a straightforward request/response exchange with well-defined semantics. The result is a standard password- or public-key based authentication without the large amount of leeway provided by the original specification. It should be noted that much of the behaviour specified here is already implemented as a de facto standard in many clients, with for example a failed password authentication result prompting the user to retype their password and then sending a new request with the same user name and method but with the new password. The SSH specification defines a somewhat schizophrenic way of indicating authentication failures which requires complex decoding in the client to sort out whether the failure occurred because the wrong method was used or the wrong authenticator was supplied, which in turn leads to confusing error messages being displayed to users. This profile makes it easier for the client implementation to distinguish "wrong password" (or key) from "wrong authentication method used". The no-authentication situation is currently handled in a rather ad-hoc manner with the server returning a somewhat unexpected SSH_MSG_USERAUTH_SUCCESS in response to a query for available authentication methods (although there are creative-interpretation implementations that return an SSH_MSG_USERAUTH_FAILURE with available_auth_types set to an empty string to indicate that no-authentication is allowed to continue because there's no way defined in the specification to distinguish "no authentication is allowed to continue" from "'no-authentication' is allowed to continue"). This results in the protocol stalling at this point if the client doesn't perform the initial method query since there's no other way beyond this side-effect of the method query to indicate that no authentication is required, or at least that authentication is performed via out-of-band means (the server is explicitly prohibited from advertising "none" as a permitted authentication method). This profile makes the no-authentication behaviour explicit by treating it as a standard authentication method type. The SSH specification allows the public-key authentication process to be broken down into further sub-exchanges in which the client can send queries to the server and perform assorted other operations. As with the overall authentication process, this profile limits the messages exchanged to a basic authentication request followed by a response. A downside to immediately sending the signature without an additional exchange of messages is that if for some reason the wrong key is used it may cause extra overhead on low-powered devices that could be avoided if further steps in the handshake were used. However this overhead is likely to be much smaller than the overhead of the additional message exchange (even on low-powered ARM CPUs such a signature can be generated in a fraction of a second), and re-adding this protocol complexity defeats the purpose of SimpleSSH being a minimal interoperability profile. In addition it's not clear in which actual real-world scenario (as opposed to hypothetical situation) this would be an issue. Another way of looking at this is that TLS authentication has relied on doing it this way for 15-odd years without any real problems, and TLS implementations, because of how the protocol is used, run on much lower-powered devices than SSH does. SSH Connection -------------- The client immediately follows the user authentication with all messages related to a session/channel open. It cannot wait an arbitrary amount of time after the user authentication has completed before proceeding or perform the open in parts, for example by sending the SSH_MSG_CHANNEL_OPEN on connect and the SSH_MSG_CHANNEL_REQUEST that renders it usable at some future date. The client only ever opens one channel, with the sender and recipient channel having channel number 0. Since there's only one channel, no window adjusts are used and the initial window size is set to the special-case value 0xFFFFFFFF. All data exchanged is standard channel data SSH_MSG_CHANNEL_DATA, not SSH_MSG_CHANNEL_EXTENDED_DATA. Ref: CONN 5.2. Rationale --------- The SSH specification doesn't place any limits on when session-control messages such as channel opens can be sent, and exacerbates the problem by breaking many session control operations into multiple bits and pieces, with implementations free to send the various messages that go into an operation at any time they want. This means that implementations either need to maintain a background thread that awaits the arrival of control data or perform continuous polling to check whether anything new has arrived. This is particularly problematic for SSH libraries and wrapper implementations that provide standard BSD-sockets style semantics because alongside the standard send() and recv() there's now also a requirement for a check_for_and_process_control_message() which can in turn affect the semantics of subsequent send() and recv() operations. In order to remove this problem, this profile requires that session-control messages that initiate sessions or channels be sent as part of the initial SSH handshake and not an arbitrary amount of time later. In this manner the full session setup is processed at the initial handshake stage, leaving the session ready for normal data exchange once the handshake has completed. Since there's only one channel there's no need for flow control as there aren't multiple channels multiplexed onto the SSH link, and any higher-level flow control is handled by TCP/IP's flow-control mechanisms. SSH_MSG_CHANNEL_EXTENDED_DATA is a Unix-ism used to handle stderr output, other systems can't do much with this information and at the programmers' whim either drop it or treat it as standard channel data. This profile only allows a single, unambiguous data type in order to eliminate this confusion. General ------- Neither client nor server have the HMAC keysize bug or the DSA signature format bug or the RSA signature padding bug or the ... (this is mostly attack surface reduction since the client or server can now disable special-case handling for all of these bugs if it's talking to a SimpleSSH peer).