Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when XML contains Processing Instruction , but no Declaration #292

Open
ashwinss3 opened this issue Apr 11, 2019 · 10 comments
Open

Issue when XML contains Processing Instruction , but no Declaration #292

ashwinss3 opened this issue Apr 11, 2019 · 10 comments

Comments

@ashwinss3
Copy link

When an XML file starts like this:

<?xml-stylesheet type='text/xsl' href='https://zzz.xsl' alternate='no' title='Default'?>
<rootTag>
      <tag1>
      <tag2>

since there is no xml declaration, it gives a warning:

:1:6: whitespace expectedstylesheet type='text/xsl' href='https://zzz.xsl' alternate='no' title='Default'?>     ^
:1:6: '?' expected instead of '-'<rootTag xmlns="something" xmlns:xsi="http://www.w3.org/something">     ^
:1:6: '>' expected instead of '-'	<tag1 code="QQ" />     ^

And then it processes normally as if tag2 is the first tag. It does not give any error, but parse the file wrongly. If an xml declaration is added it works fine:

<?xml version = "1.0" ?>
<?xml-stylesheet type='text/xsl' href='https://zzz.xsl' alternate='no' title='Default'?>
<rootTag>
      <tag1>
      <tag2>

Shouldn't it be giving an error, if xml declaration was the issue?

@ashawley
Copy link
Member

Are you talking about the ConstructingParser? I'm guessing since you didn't include any Scala code.

Also, what version of scala-xml are you using?

@ashwinss3
Copy link
Author

Yeah, I'm using scala.xml.pull.XMLEventReader .
And version of scala-xml, i'm using is 1.1.0.
I have also tried with the latest version. Issue is still not resolved.

@ashwinss3
Copy link
Author

Here is an overview of my code,

val storage = StorageOptions.getDefaultInstance.getService
val file = storage.get("bucket_name", "filepath").getContent()

val myData = Source.fromBytes(file)

val xml = new XMLEventReader(myData)

var currNode: List[String] = List()

while (xml.hasNext) {
 xml.next match {

  case EvElemStart(_, label, c, _) => {
   println("Start : " + label)
   currNode = label::currNode
  }

  case EvElemEnd(_, label) => {
   println("End : " + label)
   currNode = currNode.tail
  }
  case EvText(text) => {

   val value = text.trim
   if (value.length > 0) {
    println(text)
   }
  }
  case EvEntityRef(x) => {
   println("Entity:" + x)
  }
  case _ => None
 }
}

I'm reading the file from GCP.

@SethTisue
Copy link
Member

SethTisue commented Apr 11, 2019

isn't scala.xml.pull deprecated in its entirety, as per #193 ? and as a result we closed all the bug tickets on it. see my comments there about its overall low quality. there is also discussion on that ticket about available alternatives

@ashawley
Copy link
Member

Yes, that's true about XMLEventReader. Thanks for pointing that out.

It should be pointed out too that the deprecation warning was added in 1.1.1, released Sep-2018, so the original poster wouldn't have seen it with version 1.1.0.

@ashwinss3
Copy link
Author

ashwinss3 commented Apr 12, 2019

So could u guys give me some suggestion regarding this. Stop using XMLEventReader would be difficult, as i have built a pretty big code around it. Anything in scala, that works in a similar way would be great.

@SethTisue
Copy link
Member

SethTisue commented Apr 12, 2019

As aside, I see that the declaration is merely recommended in XML 1.0, but actually required in XML 1.1 (according to https://stackoverflow.com/a/7007781/86485).

Is there no possibility of altering the files themselves to add the declaration?

If that isn't feasible, then I expect you would need to construct a scala.io.Source that begins with the needed declaration, followed by the contents of the file.

Looking at the constructors listed at https://www.scala-lang.org/api/2.12.8/scala/io/Source$.html, I think fromIterable could be a good solution path?

scala> new Iterable[Char] { override def iterator = "foo\n".iterator ++ io.Source.fromFile("/etc/passwd") }
res11: Iterable[Char] =
Iterable(f, o, o,
, #, #,
, #,  , U, s, e, r,  , D, a, t, a, b, a, s, e,
, #,  ,
...

@SethTisue
Copy link
Member

Another possibility would be to paste the XMLEventReader source code into your project in its entirety, change the package name, and then tweak it as needed.

What's wrong with using https://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLEventReader.html instead, is it really so different that your code would have to be completely reworked? (Genuine question, I don't know the answer.)

@ashwinss3
Copy link
Author

Thank You for the suggestions,
Well https://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLEventReader.html seems good. But will have to make some tweaks.
For the time being, i altered the code as following.

val storage = StorageOptions.getDefaultInstance.getService
var arrayBytes = storage.get("bucket_name", "filepath").getContent()

val string = new String(arrayBytes.take(20))

if(!string.startsWith("<?xml ")
{
    arrayBytes= "<?xml version = \"1.0\" ?>\n".getBytes() ++ arrayBytes
}

val myData = Source.fromBytes(arrayBytes)

val xml = new XMLEventReader(myData)

As I'm reading the file from GCP, i'm getting file as Array[Bytes]. Also the file size could be very large at times, so didn't want to load entire file as string.

@SethTisue
Copy link
Member

SethTisue commented Apr 12, 2019

Thanks for documenting your workaround here. It might help others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants