Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError item_type == subclass.ITEM_TYPE #17

Closed
Azeirah opened this issue Aug 27, 2023 · 11 comments
Closed

AssertionError item_type == subclass.ITEM_TYPE #17

Azeirah opened this issue Aug 27, 2023 · 11 comments

Comments

@Azeirah
Copy link
Contributor

Azeirah commented Aug 27, 2023

Back again with another error!

This one is a bit more difficult to figure out, but I feel like I got a decent idea about what's approximately going on.

Error

Traceback (most recent call last):
  File "/home/lb/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/lb/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/__main__.py", line 151, in <module>
    main()
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/__main__.py", line 147, in main
    run_remarks(input_dir, output_dir, **args_dict)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/remarks.py", line 90, in run_remarks
    process_document(metadata_path, out_path, **kwargs)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/remarks.py", line 157, in process_document
    dims = determine_document_dimensions(rm_annotation_file)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/conversion/parsing.py", line 200, in determine_document_dimensions
    build_tree(tree, blocks)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/scene_stream.py", line 756, in build_tree
    for b in blocks:
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/scene_stream.py", line 731, in read_blocks
    yield from _read_blocks(stream)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/scene_stream.py", line 713, in _read_blocks
    yield block_type.from_stream(stream)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/scene_stream.py", line 375, in from_stream
    assert item_type == subclass.ITEM_TYPE
AssertionError

Alright, so we have a SceneItemBlock where the read block_info item_type doesn't match up with the subclass.ITEM_TYPE

IE, this:

def from_stream(cls, stream: TaggedBlockReader) -> SceneItemBlock:
    "Group item block?"
    _logger.debug("Reading %s", cls.__name__
    
    assert stream.current_block
    block_type = stream.current_block.block_type
    if block_type == SceneGlyphItemBlock.BLOCK_TYPE:
        subclass = SceneGlyphItemBlock
    elif block_type == SceneGroupItemBlock.BLOCK_TYPE:
        subclass = SceneGroupItemBlock
    elif block_type == SceneLineItemBlock.BLOCK_TYPE:
        subclass = SceneLineItemBlock
    elif block_type == SceneTextItemBlock.BLOCK_TYPE:
        subclass = SceneTextItemBlock
    else:
        raise ValueError(
            "unknown scene type %d in %s" % (block_type, stream.current_block)
        )

    parent_id = stream.read_id(1)
    item_id = stream.read_id(2)
    left_id = stream.read_id(3)
    right_id = stream.read_id(4)
    deleted_length = stream.read_int(5)

    if stream.has_subblock(6):
        with stream.read_subblock(6) as block_info:
            item_type = stream.data.read_uint8()
            # ---------------- HERE ---------------- #
            assert item_type == subclass.ITEM_TYPE
            value = subclass.value_from_stream(stream)
        # Keep known extra data
        extra_data = block_info.extra_data
    else:
        value = None
        extra_data = b""

The read item_type in this case was 1, whereas the subclass.ITEM_TYPE is 3. So we have a subclass of SceneLineItemBlock, but the item_type of the subblock corresponds to a SceneGlyphItemBlock.

What I found in the .rm file is that there's a singular byte immediately following the read item_type which contains the expected item type (at least, in this scenario? Maybe it doesn't always match up?), so I modified the code to read that byte when the assertion fails.

if stream.has_subblock(6):
    with stream.read_subblock(6) as block_info:
        item_type = stream.data.read_uint8()
        if not item_type == subclass.ITEM_TYPE:
            override_item_type = stream.read_byte(0)
            assert override_item_type == subclass.ITEM_TYPE

Next, I found that the override_item_type was followed up by four IDs and an int, so clearly that has to be a block.

My guess is that somehow a block ends up becoming another block. What exactly's going on, I'm not super sure yet.

So I recur to read the block again, starting after the override_item_type.

if stream.has_subblock(6):
    with stream.read_subblock(6) as block_info:
        item_type = stream.data.read_uint8()
        if not item_type == subclass.ITEM_TYPE:
            override_item_type = stream.read_byte(0)
            assert override_item_type == subclass.ITEM_TYPE
            # recur here
            return SceneItemBlock.from_stream(stream)

This works fine up until the point of trying to read the subclass' value in value_from_stream. This makes sense, since our subblocks' item_types weren't matching up. So my guess was that we ended up here with a SceneLineItemBlock with a GlyphRange subblock. Again, I'm not sure how that makes sense but it does look like it's what's happening here.

So I modify the code a bit more again, to override what reader class is getting used. Instead of determining the block subclass by block_type in SceneItemBlock, I use the item_type as read in the second (recur) call to SceneItemBlock subblock to determine the subclass. The code is a bit ugly but as far as I can see this does work as you would expect, no more parsing errors after this, making the code look like this:

@classmethod
    def from_stream(
        cls, stream: TaggedBlockReader, override_item_type: int = -1
    ) -> SceneItemBlock:
        "Group item block?"
        _logger.debug("Reading %s", cls.__name__)

        assert stream.current_block
        block_type = stream.current_block.block_type
        if block_type == SceneGlyphItemBlock.BLOCK_TYPE:
            subclass = SceneGlyphItemBlock
        elif block_type == SceneGroupItemBlock.BLOCK_TYPE:
            subclass = SceneGroupItemBlock
        elif block_type == SceneLineItemBlock.BLOCK_TYPE:
            subclass = SceneLineItemBlock
        elif block_type == SceneTextItemBlock.BLOCK_TYPE:
            subclass = SceneTextItemBlock
        else:
            raise ValueError(
                "unknown scene type %d in %s" % (block_type, stream.current_block)
            )

        parent_id = stream.read_id(1)
        item_id = stream.read_id(2)
        left_id = stream.read_id(3)
        right_id = stream.read_id(4)
        deleted_length = stream.read_int(5)

        if stream.has_subblock(6):
            with stream.read_subblock(6) as block_info:
                item_type = stream.data.read_uint8()
                if not override_item_type == -1:
                    if override_item_type == SceneGlyphItemBlock.ITEM_TYPE:
                        subclass = SceneGlyphItemBlock
                    elif override_item_type == SceneGroupItemBlock.ITEM_TYPE:
                        subclass = SceneGroupItemBlock
                    elif override_item_type == SceneLineItemBlock.ITEM_TYPE:
                        subclass = SceneLineItemBlock
                    elif override_item_type == SceneTextItemBlock.ITEM_TYPE:
                        subclass = SceneTextItemBlock
                    elif override_item_type == SceneGlyphItemBlock.ITEM_TYPE:
                        subclass = SceneGlyphItemBlock
                    else:
                        raise ValueError("unknown scene type %d in %s" % (override_item_type, stream.current_block))
                elif item_type != subclass.ITEM_TYPE:
                    override_item_type = stream.read_byte(0)
                    assert override_item_type == subclass.ITEM_TYPE
                    return subclass.from_stream(stream, item_type)
                value = subclass.value_from_stream(stream)
            # Keep known extra data
            extra_data = block_info.extra_data
        else:
            value = None
            extra_data = b""

        return subclass(
            parent_id,
            CrdtSequenceItem(item_id, left_id, right_id, deleted_length, value),
            extra_data=extra_data,
        )

I do have one exception left:

Traceback (most recent call last):
  File "/home/lb/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/lb/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/__main__.py", line 151, in <module>
    main()
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/__main__.py", line 147, in main
    run_remarks(input_dir, output_dir, **args_dict)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/remarks.py", line 90, in run_remarks
    process_document(metadata_path, out_path, **kwargs)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/remarks.py", line 157, in process_document
    dims = determine_document_dimensions(rm_annotation_file)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/remarks/conversion/parsing.py", line 200, in determine_document_dimensions
    build_tree(tree, blocks)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/scene_stream.py", line 781, in build_tree
    for b in blocks:
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/scene_stream.py", line 756, in read_blocks
    yield from _read_blocks(stream)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/scene_stream.py", line 738, in _read_blocks
    yield block_type.from_stream(stream)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/scene_stream.py", line 375, in from_stream
    with stream.read_subblock(6) as block_info:
  File "/home/lb/miniconda3/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/tagged_block_reader.py", line 174, in read_subblock
    self._check_position(subblock)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/tagged_block_reader.py", line 185, in _check_position
    raise BlockOverflowError(
rmscene.tagged_block_reader.BlockOverflowError: <class 'rmscene.tagged_block_reader.SubBlockInfo'> starting at 698, length 0, read up to 809 (overflow by 111)

Which makes sense since I read two blocks worth of data but I report only one block's length.

@Azeirah
Copy link
Contributor Author

Azeirah commented Aug 27, 2023

The relevant binary data

000002b8:  0000 0101 031f 000b 2f01 123f 0111 4f00 0054 0000 0000 6c56  :......../..?..O..T....lV
000002d0:  0000 0001 2483 0400 0034 1800 0000 4403 0000 005c 1b00 0000  :....$....4....D....\....
000002e8:  1901 6469 6520 4c69 7465 7261 7475 7220 c3bc 6265 7220 6469  :..redacted text...
00000300:  6573 656c 2100 0000 01c6 a340 4a92 6474 c080 0dcd 74de 718d  :text!.......@J.dt....t.q.

The initial assertion error occurs at position 2bb, which is the second 01 in the first row.

As you can see, that 01 is immediately followed up by an 03, which is the override_item_type I was referring to earlier.

After the override_item_type, we get 4 ids and a length of 86 (0x56), which is for the the "overridden" (?)/new block, the SceneGlyphItemBlock. The rest after that is read by glyph_range_from_stream which read the redacted text as expected.

@Azeirah
Copy link
Contributor Author

Azeirah commented Aug 27, 2023

What I'm not too sure about is the semantics I guessed about the byte following the assertion error. In this case its value is 3, which (coincidentally?) corresponds with the expected item_type. However, what I noticed is that 3 is also the block_type for SceneGlyphItem. So instead it might be the case that it's not the overridden item_type, but just signifying the block type of the following block.

I think that would make a little bit more sense as well, but it's hard to tell given I only have one example. It simplifies the code by quite a lot too.

@Azeirah
Copy link
Contributor Author

Azeirah commented Aug 27, 2023

Also, I don't have a solution yet for the block length error, but I wanted to get the important parts sorted out first.

@ricklupton
Copy link
Owner

Thanks for reporting!

The initial assertion error occurs at position 2bb, which is the second 01 in the first row.

Are you sure about where the assertion is happening? The data you posted looks ok to me:

000002b8:  0000 0101 031f 000b 2f01 123f 0111 4f00 0054 0000 0000 6c56  :......../..?..O..T....lV
             |       | block type = 0x03
             | start of block header: `0x00`, then two version bytes (both `0x01`)
000002d0:  0000 0001 2483 0400 0034 1800 0000 4403 0000 005c 1b00 0000  :....$....4....D....\....
                  | item type = 0x01
000002e8:  1901 6469 6520 4c69 7465 7261 7475 7220 c3bc 6265 7220 6469  :..redacted text...
00000300:  6573 656c 2100 0000 01c6 a340 4a92 6474 c080 0dcd 74de 718d  :text!.......@J.dt....t.q.

Which seems to match the SceneGlyphItemBlock values of 0x03 and 0x01 respectively?

Anyway it would be good to improve error handling during parsing, so that an error is reported for the block where the problem occurs, but the rest of the file can continue to be parsed.

@Azeirah
Copy link
Contributor Author

Azeirah commented Aug 29, 2023

If I misread the binary data I'll have to take a closer look again. The code does think the block is a SceneLineItemBlock, not a SceneGlyphRangeBlock.

The binary data I posted doesn't show that there are two block headers immediately following one another, the first is of course relevant. The second one is what is shown in the post I made, the second is indeed correct as that is what I end up parsing with the recursive call to SceneItemBlock.

I'll post more context of the binary data since I obviously missed something.

@Azeirah
Copy link
Contributor Author

Azeirah commented Aug 30, 2023

Here's the same binary data with more context

As you can see, this starts with a block of type 0x05

00000288:  ecd2 8440 001d 3013 901d 7740 00f0 eea6 9789 4940 1100 0000  :...@..0...w@......I@....
                                                             | block length = 17
000002a0:  0002 0205 1f00 0b2f 0111 3f01 104f 0000 5401 0000 006c 0000  :......./..?..O..T....
                  | block type = 0x05                          |
                  											   | start of subblock
                  											   | index = 6, tagtype = 0xC
000002b8:  0000 0101 031f 000b 2f01 123f 0111 4f00 0054 0000 0000 6c56  :......../..?..O..T....lV
		   |	| subblock is another block?
		   | subblock length = 0
000002d0:  0000 0001 2483 0400 0034 1800 0000 4403 0000 005c 1b00 0000  :....$....4....D....\....
000002e8:  1901 ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____  :..redacted text
00000300:  ____ ____ ____ 2100 0000 01c6 a340 4a92 6474 c080 0dcd 74de 718d  :etc!......@J.dt....t.q.

Looking at it again, it seems more reasonable that this block simply doesn't have any data. How that is supposed to work, I'm not sure. Given the reported subblock length = 0, the code assumes a subblock definition is followed by a byte item_type. However, as you said those bytes are version bytes (0x01)

Given the subblock has a length of 0, it could be the case that it is a similar scenario as the the previous issue/PR I worked on, that when data is removed, the binary data gets rid of some bytes (presumably to save some space?) assuming the block's presence is not a bug.

Following this hypothesis, I reverted back my changes and use this simpler code

        ...
        deleted_length = stream.read_int(5)

        if stream.has_subblock(6):
            with stream.read_subblock(6) as block_info:
                # checking block size
                if block_info.size > 0:
                    item_type = stream.data.read_uint8()
                    assert item_type == subclass.ITEM_TYPE
                    value = subclass.value_from_stream(stream)
                else:
                    value = None
            # Keep known extra data
            extra_data = block_info.extra_data
        else:
            value = None
            extra_data = b""

I get a BlockOverflowError where the current problematic block is reported to have overflowed.

  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/tagged_block_reader.py", line 150, in read_block
    self._check_position(self.current_block)
  File "/home/lb/PhpstormProjects/rm-notesync-laravel/binaries/remarks/.venv/lib/python3.10/site-packages/rmscene/tagged_block_reader.py", line 185, in _check_position
    raise BlockOverflowError(
rmscene.tagged_block_reader.BlockOverflowError: <class 'rmscene.tagged_block_reader.MainBlockInfo'> starting at 676, length 17, read up to 698 (overflow by 5)

This leads me to believe this block is simply kaput. The subblock header is five bytes if I understand it correctly, a tag which is one byte, 0x6c followed by 4 byte integer specifying the size of the subblock.

It is not unlikely this is a bug on ReMarkable's side if I'm correct, this block simply shouldn't exist. It is a SceneLineItemBlock containing a subblock header lacking any further data, also reporting the incorrect length. The block does end at 698, as a regular well-behaving block starts at 699 (0x2bb) which you pointed out as well.

@ricklupton
Copy link
Owner

Ok, I think the problem is that the 6c is actually the start of the length of the next block, but it confuses the has_subblock(6) test because it looks like the start of the next subblock.

I think probably the solution is to make has_subblock immediately return False if the current block has already all been read, and only if it hasn't then check if the next data is a subblock with the given index (so the has_subblock call would return False in this case)

@Azeirah
Copy link
Contributor Author

Azeirah commented Aug 31, 2023

That makes a lot of sense, although I'm still a little bit confused about what the purpose is of an empty block.

@ricklupton
Copy link
Owner

The empty blocks occur when something has been deleted. The deleted item id information is kept so that editing conflicts can be resolved, but the deleted data is not kept. You can tell that it's a deleted item because the deleted_length (tagged 0x54) is not zero.

@Azeirah
Copy link
Contributor Author

Azeirah commented Sep 3, 2023

I made a PR where I abort checking the subblock if the SceneItemBlock is fully read. It's working as you'd expect now.

@ricklupton
Copy link
Owner

I think this should be fixed by 9bf093d -- if you have a chance can you confirm that it works with this document with that commit (without your PR #19)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants