-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML syntax error in stream writer #1519
Comments
Simple test function: func TestSetLongCellValuesXML(t *testing.T) {
value := strings.Repeat("<>", TotalCellChars/2)
f := NewFile()
streamWriter, _ := f.NewStreamWriter("Sheet1")
err := streamWriter.SetRow("A1", []interface{}{value})
assert.NoError(t, err)
err = streamWriter.Flush()
assert.NoError(t, err)
v, err := f.GetCellValue("Sheet1", "A1")
assert.NoError(t, err)
assert.Equal(t, value, v)
} |
As a temporary solution for someone who faced this bug: if len(val) > excelize.TotalCellChars/5 {
for {
buff := &bytes.Buffer{}
_ = xml.EscapeText(buff, []byte(val))
if buff.Len() <= excelize.TotalCellChars {
break
}
val = utils.Utf8Substr(val, 0, utf8.RuneCountInString(val)-(buff.Len()-excelize.TotalCellChars))
}
} Use it for every value which you want to add into stream writer. |
Any thoughts? |
Thanks for your issue. We need to change the order for trim cell value and escape XML characters. Combine these two steps in diff --git a/cell.go b/cell.go
index 1f01ce3..edfcb3c 100644
--- a/cell.go
+++ b/cell.go
@@ -451,17 +451,22 @@ func (f *File) setSharedString(val string) (int, error) {
sst.Count++
sst.UniqueCount++
t := xlsxT{Val: val}
- val, t.Space = trimCellValue(val)
+ val, t.Space = trimCellValue(val, false)
sst.SI = append(sst.SI, xlsxSI{T: &t})
f.sharedStringsMap[val] = sst.UniqueCount - 1
return sst.UniqueCount - 1, nil
}
// trimCellValue provides a function to set string type to cell.
-func trimCellValue(value string) (v string, ns xml.Attr) {
+func trimCellValue(value string, escape bool) (v string, ns xml.Attr) {
if utf8.RuneCountInString(value) > TotalCellChars {
value = string([]rune(value)[:TotalCellChars])
}
+ buf := &bytes.Buffer{}
+ if escape {
+ _ = xml.EscapeText(buf, []byte(value))
+ value = buf.String()
+ }
if len(value) > 0 {
prefix, suffix := value[0], value[len(value)-1]
for _, ascii := range []byte{9, 10, 13, 32} {
@@ -492,15 +497,13 @@ func (c *xlsxC) setCellValue(val string) {
// string.
func (c *xlsxC) setInlineStr(val string) {
c.T, c.V, c.IS = "inlineStr", "", &xlsxSI{T: &xlsxT{}}
- buf := &bytes.Buffer{}
- _ = xml.EscapeText(buf, []byte(val))
- c.IS.T.Val, c.IS.T.Space = trimCellValue(buf.String())
+ c.IS.T.Val, c.IS.T.Space = trimCellValue(val, true)
}
// setStr set cell data type and value which containing a formula string.
func (c *xlsxC) setStr(val string) {
c.T, c.IS = "str", nil
- c.V, c.XMLSpace = trimCellValue(val)
+ c.V, c.XMLSpace = trimCellValue(val, false)
}
// getCellDate parse cell value which containing a boolean.
@@ -1031,7 +1034,7 @@ func setRichText(runs []RichTextRun) ([]xlsxR, error) {
return textRuns, ErrCellCharsLength
}
run := xlsxR{T: &xlsxT{}}
- run.T.Val, run.T.Space = trimCellValue(textRun.Text)
+ run.T.Val, run.T.Space = trimCellValue(textRun.Text, false)
fnt := textRun.Font
if fnt != nil {
run.RPr = newRpr(fnt) What do you think about, and whould you like made a PR for it? |
I'll definitely take a look. Just a bit busy this days. I was thinking about an edge case. What should we do here? I tried to create an example xlsx file (attached), which contains 32767 > symbols in one cell and it works. |
Thanks for your feedback. I have fixed this issue. Please upgrade to the master branch code. This patch will be released in the next version. |
Great, thank you! Just a little fix: |
Thanks for your advice. I've fixed it in the commit |
@xuri Hi, what about the next release with the fix? It's been a long time since April |
This library usually releases a stable version every 3 or 4 months, if you need to use the code on the master branch version, please use the Go module to lock the version. |
Description
We have this line: https://github.com/qax-os/excelize/blob/master/cell.go#L495 which escapes text for xml via
xml.EscapeText
.Than
buff.String()
goes totrimCellValue
function which has this https://github.com/qax-os/excelize/blob/master/cell.go#L462Because of how
xml.EscapeText
works, it makes string much more sometimes (for example, if we put html into it).Than
value[:TotalCellChars]
cuts this escaped xml without care of tags.So, if final escaped XML has ending like this ">" and overflows
TotalCellChars
by 1 or 2, out XML becomes invalid. It will have invalid unclosed tag.Steps to reproduce the issue:
Describe the results you received:
invalid character entity > (no semicolon)
Describe the results you expected:
It works.
Output of
go version
:Excelize version or commit ID:
Environment details (OS, Microsoft Excel™ version, physical, etc.):
MacOS Ventura
The text was updated successfully, but these errors were encountered: