Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More secure parsing #17

Closed
adriaanm opened this issue Nov 20, 2013 · 22 comments
Closed

More secure parsing #17

adriaanm opened this issue Nov 20, 2013 · 22 comments
Milestone

Comments

@adriaanm
Copy link
Contributor

@jroper says to add the following to XMLLoader.parser:

See http://blog.csnc.ch/2012/08/secure-xml-parser-configuration/

try { 
  f.setFeature("http://xml.org/sax/features/external-general-entities", false);
  f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
} catch {
  case e: ParserConfigurationException => // warn that the SAXParserFactory supplied by the JDK doesn't support this feature, and that the application may therefore be vulnerable to external entity attacks, encourage to define your own parser instead
  case e: SAXNotRecognizedExcetpion => // as above
  case e: SaxNotSupportedException => // as above
}
@milessabin
Copy link

This IETF BCP is worth a read: https://www.ietf.org/rfc/rfc3470.txt

@retronym
Copy link
Member

Thanks, @milessabin!

Quoting the juicy parts:

XML mechanisms that follow external references (Section 4.14) may
also expose an implementation to various threats by causing the
implementation to access external resources automatically. It is
important to disallow arbitrary access to such external references
within XML data from untrusted sources. Many XML grammars define
constructs using URIs for external references; in such cases, the
same precautions must be taken.

.

Acknowledgements
The authors would like to thank the following people who have
provided significant contributions to the development of this
document:

Mark Baker, Tim Berners-Lee, Tim Bray, James Clark, ... , Miles Sabin ...

A blast from the past :)

@jroper
Copy link

jroper commented Mar 23, 2015

Something not mentioned in the RFC is the memory issues that doctypes introduce, including the billion laughs vulnerability (recursive entity references leading to exponential memory usage) and the quadratic blowup vulnerability (many references to a single large entity leading to quadratic memory usage). Both of these vulnerabilities allow an attacker, with a small payload (as small as 100B for billion laughs, as small as 200KB for quadratic blowup) to cause a JVM to OOME. While the JDKs XML parser does have protection against billion laughs (recursion limits, but off by default), it doesn't have any protection against quadratic blowup. So the only safe way to handle untrusted XML on the JVM is to disallow doctypes altogether. It would be nice if the JDK XML parsers allowed you to simple ignore doctypes, but unfortunately, they don't, either it accepts the doctype, or fails if the doctype is present.

Disallowing doctypes is likely to cause some problems for users - many systems sending XML will automatically add a doctype, and most users will be frustrated that Scala doesn't accept this. So, we need to consider that if we decide to introduce this as a default. On Play, we do have users every so often report this issue, but we find that after explaining to them that if doctypes were allowed, the attacker could take down their webapp with a single request, they're happy to change the sending system to not include a doctype.

@milessabin
Copy link

Indeed ... I think Billion Laughs came a little later.

@dnvriend
Copy link

Last year I got a warning from a friendly hacker that warned me about this issue in JBoss Fuse 6.0.0 that we used then. We also use spray and accept XML. Other components also use the SecureXML. The xxeFilter is also used by other routes. The code I created then:

object SecureXml {
  def loadString(xml: String): NodeSeq = {
      val spf = SAXParserFactory.newInstance()
      spf.setFeature("http://xml.org/sax/features/external-general-entities", false)
      spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
      val saxParser = spf.newSAXParser()
      XML.withSAXParser(saxParser).loadString(xml)
  }
}

The 'xxeFilter' is:

trait XxeFilter {
  def log: LoggingAdapter

  implicit def executionContext: ExecutionContext

  def xxeFilter(xxe: String): Future[String] = {
    log.info("Validating xml: [{}]", xxe)
    Future(xmlMustStartWith(xmlMustNotContain(xxe)))
  }

  def throwError(xxe: String, message: String): String = {
    log.error("{}:\n [{}]", message, xxe)
    throw new Error(message)
  }

  private def xmlMustNotContain(xxe: String): String = xxe match {
    case _ if xxe.toUpperCase.contains("<!DOCTYPE") => throwError(xxe, "XML contains DOCTYPE declaration")
    case _ if xxe.toUpperCase.contains("<!ENTITY") => throwError(xxe, "XML contains ENTITY declaration")
    case _ if xxe.toUpperCase.contains("<!ELEMENT") => throwError(xxe, "XML contains ELEMENT declaration")
    case _ if xxe.toUpperCase.contains("SYSTEM") => throwError(xxe, "XML contains SYSTEM declaration")
    case _ => xxe
  }

  private def checkForXXE(xxe: String): String = {
    SecureXml.loadString(xxe).toString()
    xxe
  }

  private def xmlDataMustStartWith(xxe: String): String = xxe match {
    case _ if xxe.startsWith("//removed - non open source//") && xxe.contains("""xmlns="//removed non open-source"""") => checkForXXE(xxe)
    // snip
    case _ => throwError(xxe, "received XML does not start with xml declaration or //snip // element or namespace is not correct.")
  }

  private def xmlMustStartWith(xxe: String): String = xxe match {
    case _ if xxe.startsWith("<?xml") => xmlDataMustStartWith(xxe.substring(xxe.indexOf("?>") + 2, xxe.size).trim)
    case _ => xmlDataMustStartWith(xxe)
  }
}

BasicRoute:

trait BasicRoute extends Directives {
  implicit def executionContext: ExecutionContext
  implicit def timeout: Timeout
  implicit def system: ActorSystem

  def badRequestXml = "//snip//"

  def internalServiceErrorXml(description: String, t: Option[Throwable]) = "//snip//"

  def validationError(code: String, description: String) = "//snip//"

  def completeWithBadRequest = complete(BadRequest, badRequestXml.toString())

  def completeWithError(throwable: Throwable) = complete(InternalServerError, internalServiceErrorXml(s"An error occurred: ${throwable.getMessage}", Option(throwable)))

  def completeWithError = complete(InternalServerError, internalServiceErrorXml(s"An error occurred", None))

  def completeWithValidationError(code: String, description: String) = complete(BadRequest, validationError(code, description))
}

use in route:

trait ARoute extends BasicRoute with XxeFilter {

  lazy val ARoute: Route =
    path("//snip//") {
      post {
        entity(as[String]) { xxe =>
          onComplete(xxeFilter(xxe)) {
            case Success(xml) =>
                camel(xml).map { _ =>
                  complete(StatusCodes.NoContent)
                }.recover { case t: Throwable =>
                  completeWithError(t)
                }.getOrElse {
                  complete(StatusCodes.InternalServerError)
                }
            case Failure(ex) => completeWithBadRequest
          }
        }
      }
    }

  def camel(xml: String): Try[Unit]
}

@adriaanm
Copy link
Contributor Author

Thanks, @dnvriend! Maybe we could include this so that others can just call 'xml.beSafe'?

@dnvriend
Copy link

Or just be safe... implicitly :)

@jawshooah
Copy link

Any plans to implement this? I'd love it if scala-xml had XXE prevention bundled in.

@biswanaths
Copy link
Contributor

Let me have a look at this.

@EdgeCaseBerg
Copy link
Contributor

curious, has any work been done on this?

@biswanaths
Copy link
Contributor

https://github.com/scala/scala-xml/blob/master/src/main/scala/scala/xml/factory/XMLLoader.scala#L28

Changing the parser features as suggest while creating the parser should be simple enough.

What should be the route to upgrade ? We can change the code but not the data. Anybody who upgrades to latest version there is a possibility that some data path might fail.

  • Option A:
    Provide a non-default secure parser. With warning generated, for using the default non-safe parser.
  • Option B:
    Make default parser to safe and allow a non-default, unsafe parser if anybody wants to fall back on.

Please let me know, how we should proceed with this.

@EdgeCaseBerg
Copy link
Contributor

If Option A: Then people who upgrade their code will see the warnings right away and know they should put some time into implementing the necessary changes.

If Option B: Then people who upgrade their code will be happily unaware of issues with any of their other software that might be at risk if using non-updated versions.

Option A seems to increase awareness of these type of issues in general which I'd count as a +1 to that approach, but I can see why many would like to have safety by default and would advocate for that change instead.

@v6ak
Copy link

v6ak commented Mar 15, 2016

I am against option B for some other major reason: It might increase fear of security updates (as it might break things), which is bad for security. What is worse, everything will compile after the change, unit tests will likely also pass, but the app still might break. (I know, integration tests should theoretically catch it, but still… Even in the case that integration tests catch the issue, it does not encourage that updates are painless.)

Maybe the only advantage of Option B is forcing old code to secure mode. Maybe this goal might be achieved with some tradeoff. For example, we would have three objects:

  • scala.xml.SecureXml – newly added object with secure behavior
  • scala.xml.InsecureXml – newly added object with compatible (potentially insecure) behavior
  • scala.xml.Xml – old deprecated object; By default, it behaves as scala.xml.InsecureXml for compatibility reasons. However, one might globally change the behavior to scala.xml.SecureXml by switching some system property. While I am generally not an advocate of global config, I consider this to be the least evil there.

We might call this tradeoff as Option C.

Finally, there is an Option D. It is like Option C, but with scala.xml.Xml object completely removed. As a result, all programmers would be forced to rewrite the code using the legacy scala.xml.Xml. Or maybe all the methods of scala.xml.Xml object would be marked as synthetic and start throwing an exception.

Scenarios of upgrade:

  1. Programmer upgrades to a new version of the library. A: Programmer is warned and will hopefully take the action. B: Code will compile without warning, but might silently break. C: Programmer is warned and will hopefully take the action. D: Programmer is forced to take the action.
  2. Programmer upgrades another library that evicts the old version. The code is, however, compiled against the old version, maybe because it is part of a 3rd party library. A: Programmer is not warned. B: Code will compile without warning, but might silently break. C: Programmer is not warned, the code will not break (but might be insecure). D: The code will suddenly start throwing weird errors at runtime. (Grrrr!)
  3. Someone wants to force the secure behavior for all the application (with taking the risks of potential incompatibility). I suppose that there is the new XML library version in the project. A: No way. B: Automatically done. C: Can be done by setting a global system property. D: Forced manual action.

The Option A covers only one of these three scenarios.

The Option B covers all these three scenarios. The cost is, however, potential silent BC breaks and discouraging users from security updates.

Option C covers scenarios 1 and 3 and performs definitely better than Option A. While Option C seems to fail at the scenario 2, it is not as bad as it might sound. If the programmer also uses the XML library directly, the scenario 3 might be triggered.

The Option D might look like the safest option, because it forces taking some action, but it is not so simple. At the scenario 3, the programmer can't simply evict the old version of scala-xml. All the relevant libraries are required to be upgraded before, which can actually delay the fix. In such case, the Option D performs significantly worse than others.

In short term: I am in favour of Option C. The Option A looks the second best for me. Options B and D look wrong for me in the short term.

In long term: After the old object or methods being deprecated for some period of time, I'd use the Option D and remove the deprecated scala.xml.Xml object.

@biswanaths
Copy link
Contributor

Anybody else has any opinion please let us know. I think we should try to get a satisfactory solution to this.

@adrianbn
Copy link

Just curious since this issue seems old and still open: what is the recommended way to disable insecure features with scala-xml? Maybe override XMLLoader.parser with an implementation that includes the following?

spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false)
spf.setFeature("http://xml.org/sax/features/external-general-entities", false)
spf.setXIncludeAware(false)
spf.setExpandEntityReferences(false)

Thank you!

@tyohDeveloper
Copy link

The name of the "secure" option should be something like "no XSS, mostly secure*." We don't want to imply security that we can't implement. Hackers will holes in every implementation. Even if there isn't a hole in the library, some naïve developers will use unsafe practices. We shouldn't imply more than what can be delivered.

  • I have no ide what it should be called. LimitedXssSchema seems too bulky. KindaSecure and KnownSecured can be interpreted to be the opposite of what is intended. MoreSecure? I think the more secure version should be the default (option 2).

Good programming practices always put parsing in its own thread. Futures make that somewhat easy. Unsafe usages or unintended Xss should throw run time exceptions that will kill the process. Only that thread will die, the main thread won't be affected. It can help control DSS attacks as well.

A number of strategies need to be employed to control resource usage, naturally. If possible, well known public key, elliptical (or other) methods that are somewhat difficult to crack.

AES 256 or a version that doesn't have well known bugs. The usual, but not always available methods. All of which take more runtime resources. So some resource control options should be available.

Most of this would require a lot of work to implement. Perhaps even a complete rewrite. Hooks to allow the consumer to add those features might be possible with considerably less work. A library of monads could be made available incrementally. That would help with implementation time. The community could add those over time. It would require extensive testing to ensure they don't actually weaken security. The NSA has injected weaknesses in the Linux (and others) distributions a number of times. The known ones have been caught and addressed. Monads would make them far less obvious and harder to spot.

Most of the committers in the scala know to each other. The use base is small enough to be ignored. As a paranoid system architect, I have high standards. They are probably unreasonable.

LessSecure, XML, MoreSecure?

@v6ak
Copy link

v6ak commented Jun 9, 2019 via email

@tyohDeveloper
Copy link

tyohDeveloper commented Jun 10, 2019 via email

@v6ak
Copy link

v6ak commented Jun 10, 2019 via email

@SethTisue
Copy link
Member

"safer"?

@tyohDeveloper
Copy link

tyohDeveloper commented Jun 10, 2019 via email

@SethTisue
Copy link
Member

fixed by #177

SethTisue added a commit to SethTisue/scala-xml that referenced this issue Jan 26, 2023
since scala#17 is considered fixed, we have more secure defaults now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests