Why XML Security is Broken ========================== Peter Gutmann, pgut001@cs.auckland.ac.nz October 2004 Introduction ------------ This writeup was motivated by the following exchange on a mailing list: >>I have some questions related to XML-Dsig: > >Argghh!! Run away! A near-universal reaction. So why is "Run away!" a near-universal reaction to XML-Dsig (and XML security in general)? Because it doesn't work, that's why. The problem with XML security can be traced back to two fundamental causes: 1. XML is an inherently unstable and therefore unsignable data format. XML-Dsig attempts to fix this via canonicalistion rules, but they don't really work. 2. The use of an "If it isn't XML, it's crap" design approach that lead to the rejection of conventional, proven designs in an attempt to prove that XML was more flexible than existing stuff. These problems are covered in more detail below, along with a simple solution to the problem that's already in use by some XML users. XML is an inherently unsignable data format ------------------------------------------- XML signatures are an attempt to hammer signatures onto inherently un-signable data. Even at the most basic syntax level (ignoring for now the equally problematic XML semantic features), you need to handle text-canonicalisation for whitespace, line endings, character-set encoding, word wrapping, escape sequences, and so on. Even this relatively straightforward process was so incredibly difficult that the X.509 world abandoned it years ago by mutual unspoken agreement because it was just Too Hard to do. So X.509 spent about the first ten of its twenty-odd years trying to get this right and failed, and yet the X.509 canonicalisation rules are vastly simpler than the XML ones. Much more worrying though is the fact that at the semantic level XML, like MS Word, consists of highly dynamic content, but about two orders of magnitude more complex than Word. With XML you have to deal with XSLT (transformations that handle tree construction, format control, pattern selection, and other issues), XPath selection, the fact that the data can be affected (often drastically) by external forces such as style sheets, schemas, and DTDs, XML namespace declarations and namespace attributes, and about a million other things, none of which anyone can quite agree on how to handle, mostly because there is no way to handle them. "Secure XML", the definitive reference on the topic, spends fully half of its 500-odd pages trying to come to grips with XML and its canonicalistion problems, without really ever resolving things. In fact it reads more like a 250-page essay on how not to do things than a solution. The PGP and S/MIME canonicalistion rules in contrast mostly fit into a single sentence: "Grab the input blob and sign it as is". Because of these problems with XML and XML canonicalisation, you have an inherently unstable medium that you're supposed to base your business transactions on. Imagine how this would end up in court: "Your honour, although the plaintiff claims we signed this, we have 39 differently- canonicalised forms that show we didn't, 18 different namespace types that prove the plaintiff is in fact at fault and not us, 7 applications of DTDs that show beyond a doubt that they owe us the amount they're claiming, and four schemas whose use will clearly show that we have rights to their house and car as well". The plaintiff then gets to explain XML DTDs, and why their particular one should be accepted and the 17 the defendant is presenting shouldn't, to a 60- year-old judge with an arts degree and a jury of people whose VCRs blink 12:00. (For an especially fun abuse of the inability to canonicalise XML, make your product the first one to market in a given area, advertise it widely in the appropriate trade journals as being fully standards-compliant, get the canonicalisation wrong, and threaten all of your competitors with prosecution under the DMCA if they as much as download your software to figure out what on earth you're doing with your XML. With a little effort this can be even more lucrative than a USPTO-assisted patent shakedown). (Incidentally, both S/MIME and PGP can be coerced into a mode of operation where they also have a subset of the problems of XMLdsig. If you use a detached signature rather than combining signature and data into a single PGP or S/MIME entity then external applications are free to mangle the data in any way they want when it's in transit, with the result that the signature check fails. Trying to canonicalise the content into an unmangled (or mangle- resistant) form has been an ongoing battle, particularly with PGP signed email using detached signatures, where intermediate MTAs and (more usually) MUAs can do things like stripping trailing whitespace which have no user- visible effect on the data but cause the signature check to fail. The simplest solution to this is "don't do that, then" - send your signed data as a single S/MIME or PGP entity rather than breaking it up into two parts, one of which can be modified in transit). Welcome to all things XML, where if it's not XML, it's crap ----------------------------------------------------------- There are a number of well-established existing ways for securing content, the two best-known being S/MIME and PGP. If you go beyond the bit-bagging methods used, PGP and S/MIME are practically identical at the structural level. In fact if it wasn't for the fact that some of the bit-bagging fields were cryptographically secured, you could rewrite PGP into S/MIME and vice versa just by changing the packaging format. More details on this, and on the following message-format discussion, can be found in "Performance Characteristics of Application-level Security Protocols" available from my home page. The similarity between the two isn't because they tried to copy each other. In fact precisely the opposite, there was some animosity between the two camps at the time the standards were being created. The reason why they're structurally identical is because there's really only one (sensible) way to encapsulate data cryptographically. For signed data this is: signature hash algorithm indicator; data; signature; and for encrypted data it's: recipient/key-exchange information; encrypted data; This is necessary to allow straightforward one-pass processing. Consider for example signed data arranged as: data; signature; In this case it's not known which hash algorithm is required to compute the signature, so it's necessary to process and buffer all of the (arbitrarily large) data to locate the signature, extract the hash algorithm identifier from that, and go back and re-process all of the data. Only then is it possible to actually check the signature. Similarly, the encrypted data has the recipient/key exchange information at the start so that decryption keys can be set up before processing the encrypted data. Putting it anywhere else produces the same problem as rearranging the signed data fields. Since there's really only one logical way to do these things, and since there are a large number of well-established, field-proven toolkits out there to do PGP and S/MIME, the obvious approach to XML security would be to define and tags, and break for tea and biscuits. Anyone who needed to secure XML could grab their favourite security toolkit (including all manner of open source/free ones if they felt the need) and be done with it. Unfortunately, this approach was heresy to the XML security folks because, well, PGP and S/MIME aren't XML. So they had to reinvent the wheel in XML. This lead to a second problem: Since there's only one logical way to structure secured data, it'd be obvious to anyone that all they'd done was reivent the wheel in XML. To avoid this problem as well, they reinvented the wheel in XML, but made it square to avoid accusations that they'd just reinvented the wheel. So with XML security it is indeed possible to do things like: data; signature hash algorithm indicator; signature; and: encrypted data; recipient/key-exchange information; and all manner of other horrible things. Consider a case where 'data' is a 4GB message being streamed through a system. Without the one-pass processing capability, you have to buffer the entire message somewhere until you get to the trailer which tells you what to do with it (since XML applications tend to consume CPU and memory like Homer Simpson consumes doughnuts, this frequently isn't noticed beyond the general complaints that XML is very slow to work with). Even worse, if you need to process these messages on devices without the storage to buffer the entire message in memory, there's no way to do it. A real-world example of this is medical equipment that secures/checks large medical images on the fly as they're streamed over a hospital network. In addition to the processing problem, XML security gives you the flexibilty to shoot yourself in the foot in a dozen different ways without even knowing it. For example there are applications that sign the document header (rather than the document itself), because XML gives you the flexibility to do that. There's at least one application that signs an empty string, because XML gives you the flexibility to do that. I don't even want to count the number of homebrew (and broken) key exchange mechanisms I've seen where messages contain embedded keys before or after the secured payload, because it's so much more convenient to do it that way. The PGP and S/MIME approach is "Take a blob, sign it/encrypt it". The XML security approach is to hand the user a large pile of toothpicks and a tube of glue and hope they'll get it right, while loudly proclaiming how much more flexible and powerful XML is than other approaches. The crypto operations are performed and the signatures verify, but nothing's actually being secured. Brad Hill (see the reference further down) has a great example of this where he takes a signed XML purchase order and, using XML tricks, swaps a $1.50 box of pencils for a $2,500 laptop without invalidating the signature, something that the sign-everything-as-a-blob approach of S/MIME and PGP would never allow. What's even worse is that the XML-ueber-alles approach makes it impossible to separate the security component from XML. That is, in order to implement a security toolkit for XML, you need to implement a complete XML processing system. This is akin to requiring anyone creating a (non-XML) security toolkit to implement a complete MTA/MUA and web server capable of handling SMTP, MIME, HTTP, and HTML, as part of the toolkit. There are reasons why no standard security toolkit does things this way. These are the same reasons why no standard security toolkit can support XML security, requiring expensive, usually proprietary XML security solutions that force users into whatever XML-processing system the toolkit vendor has chosen. As an example of this inflexibility, if you want to use a standard security toolkit (and I'll use my own cryptlib as an example because I'm most familiar with that, insert your favourite alternative here), you can use it as a static library, a shared library, a Windows DLL, a COM object, from scripting languages like Python, to implement a web server, a raw SSL or SSH tunnel, S/MIME, raw encrypted data, encrypted files via uucp or FTP, and so on ad infinitum. In contrast with XML if you don't like the fact that the toolkit vendor has chosen to use (say) the SAX way of looking at the world when you're working with Xerces or DOM XML or LibXML or XML .NET or AElfred or Electric XML or Xparse or MSXML..., well... tough. It's impossible to create something that's simply a security component that you can plug in wherever you need it, because XML security is inseparable from the underlying XML processing system. This breaks the basic principle of modularity, and ensures that XML security toolkits will be created either by XML vendors with little knowledge of security or security vendors with little knowledge of XML, a recipe for disaster. In contrast a non-XML security toolkit can be plugged in wherever you need it to do whatever you want to do with it. This is why a large number of protocols that need security simply defer to an existing mechanism/toolkit: SCEP uses S/MIME, SFTP uses SSH, SIP again uses S/MIME, and so on. Since the application isn't tied to the underlying security mechanism, it's possible to use any standard security toolkit from any established security vendor to do the job. (There exists a subset of XML folks who appear to agree with this view. For example RFC 3923, "End-to-End Signing and Object Encryption for the Extensible Messaging and Presence Protocol (XMPP)" uses S/MIME rather than XML security to provide its security). The solution ------------ The solution to the problem is to do what was rejected by the XML security folks at the beginning: Define and tags, tell anyone who wants to do XML security to grab any existing, well-established, field-proven security toolkit, and leave it at that. This avoids all of the problems mentioned above, at the (rather slight) cost of having to admit that XML may not actually be the solution to all the world's problems. This was exactly the approach taken by the Jabber folks in the Jabber security mechanisms "End-to-End Signing and Object Encryption for the Extensible Messaging and Presence Protocol (XMPP)", RFC 3923. This mechanism relies on MIME body-parts to handle the signing and encryption, which has the advantage that it's easily implementable using existing off-the-shelf software, and works with anything that talks MIME. The Oasis folks have come to a similar conclusion, using a basic "sign-the- blob" approach in their SAML signing. The reference for this is still subject to change, but a Google search for the title "SAMLv2.0 HTTP POST 'SimpleSign' Binding" should find the current version of the document. A slightly different approach has been proposed by Johannes Ernst in his XML-RSig design, where RSig stands for Really simple Signatures (a bit of a tautology really, since *anything* is simpler than XML-DSig.. You can read about it at http://netmesh.info/jernst/Technical/really-simple-xml-signatures.html. In XML-RSig you pick a node to sign, take everything from the first character of the start tag to the last character of the end tag, and sign this is a blob using your favourite technique (OpenPGP, S/MIME, whatever). Finally, you insert a new node as a child of the node whose signature it is. The same thing applies for encryption. Simple, easy to implement, and exactly what XML-DSig should have been in the first place. Other comments on this issue ---------------------------- James Clark has some interesting thoughts on the same thing from an XML- centric point of view (rather than the security-centric one presented here) at http://blog.jclark.com/2007/10/bytes-not-infosets.html. The W3C is also aware of some of these issues and is working to address them, although the approach seems to be to apply a series of patches rather than a reconsideration of the overall approach: http://www.w3.org/2007/xmlsec/ws/report. Brad Hill from iSEC Partners has done a lot of work in the area of XML (in-) security and found all sorts of problems. You can get slides and associated writeups for some of his talks at http://www.isecpartners.com/files/iSEC_HILL_AttackingXMLSecurity_bh07.pdf, http://www.isecpartners.com/files/XMLDSIG_Command_Injection.pdf, and http://www.isecpartners.com/files/iSEC_HILL_AttackingXMLSecurity_Handout.pdf. A footnote from a non-XML security toolkit author ------------------------------------------------- I'm the author of a security toolkit (cryptlib, mentioned above) that implements pretty much every Internet security protocol there is except XML security. I've tried to support XML security, I really have, but after repeated attempts to figure out how to do this I just can't do it without incorporating a complete XML processing interface into cryptlib. I can do PGP standalone, I can do S/MIME standalone, I can do SSH standalone, I can do SSL/TLS standalone, I can do standalone, but there's simply no way to support XML security in a general- purpose toolkit. Even if there was, as a security person I don't know whether I could ship a toolkit that would allow developers to shoot themselves in the foot a dozen different ways while thinking that they're securing their data. (If anyone has further comments or other war stories that I can add here (there were some that I couldn't add because they would have identified the original source), please get in touch).