Skip to content

Commit

Permalink
Fix bug when missing workbook relationship
Browse files Browse the repository at this point in the history
Add a check to see if the list is empty before trying to access it's
contents. If an excel file has an overridden relationship with no word
"book" in the name it will attempt to grab the first item of an empty
list when looking up workbook relationships.

    IndexError: list index out of range

There could be a better fix to this issue I'm not well enough versed in
the xslx specification. The following xlsx file caused the issue.

    $ unzip -l some_file.xlsx
    Archive:  some_file.xlsx
      Length      Date    Time    Name
    ---------  ---------- -----   ----
          142  02-06-2024 13:28   xl/worksheets/_rels/sheet1.xml.rels
     65968555  02-06-2024 13:28   xl/worksheets/sheet1.xml
      2078037  02-06-2024 13:28   xl/sharedStrings.xml
         9867  02-06-2024 13:28   xl/styles.xml
          566  02-06-2024 13:28   xl/_rels/workbook.xml.rels
          388  02-06-2024 13:28   xl/workbook.xml
          297  02-06-2024 13:28   _rels/.rels
         1122  02-06-2024 13:28   [Content_Types].xml
    ---------                     -------
     68058974                     8 files

In `[Content_types].xml` it is overriding the relationships to point at
`_rels/.rels` rather than `xl/_rels/workbook.xml.rels`. This causes the
`workbook_relationships` list to be empty causes the error mentioned
above. One can see that it does indeed have a workbook relationship,
however it is being overridden.

`[Contenet_types].xml`:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
      <Default Extension="png" ContentType="image/png"/>
      <Default Extension="jpeg" ContentType="image/jpeg"/>
      <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
      <Default Extension="xml" ContentType="application/xml"/>
      <Default Extension="vml" ContentType="application/vnd.openxmlformats-officedocument.vmlDrawing"/>
      <Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
      <Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/>
      <Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>
      <Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>
      <Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
    </Types>

`xl/_rels/workbook.xml.rels`:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
      <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet1.xml"/>
      <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="sharedStrings.xml"/>
      <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
    </Relationships>
  • Loading branch information
tmiller committed Mar 17, 2024
1 parent f2a429a commit ef3ff50
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions xlsx2csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,8 +222,11 @@ def __init__(self, xlsxfile, **options):
self.shared_strings = self._parse(SharedStrings, self.content_types.types["shared_strings"])
self.styles = self._parse(Styles, self.content_types.types["styles"])
self.workbook = self._parse(Workbook, self.content_types.types["workbook"])
workbook_relationships = list(filter(lambda r: "book" in r, self.content_types.types["relationships"]))[0]
self.workbook.relationships = self._parse(Relationships, workbook_relationships)
workbook_relationships = list(filter(lambda r: "book" in r, self.content_types.types["relationships"]))
if len(workbook_relationships) > 0:
self.workbook.relationships = self._parse(Relationships, workbook_relationships[0])
else:
self.workbook.relationships = Relationships()
if self.options['no_line_breaks']:
self.shared_strings.replace_line_breaks()
elif self.options['escape_strings']:
Expand Down

0 comments on commit ef3ff50

Please sign in to comment.