Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read .chd files internally to avoid chdman dependency and need for temporary files #8

Open
j68k opened this issue May 22, 2022 · 30 comments
Labels
enhancement New feature or request

Comments

@j68k
Copy link
Owner

j68k commented May 22, 2022

There are at least a couple of libraries for reading .chd files now, which would hopefully make this fairly easy.

@j68k j68k added the enhancement New feature or request label May 22, 2022
@i30817
Copy link

i30817 commented Jun 9, 2022

If you ever create a cross platform library to do this, if you'd publish it on pypi, i'd start using it right away to start renaming chds in my rhdndat project (the rhdndat-rn part in particular).

I'd love if there was a way to write new chds with a project like this too, because it would allow to create delta child chds from romhacking style xdelta patches without a whole complicated mess of temporary files and cutting bins into tracks just to create yet another image to delta encode with the original chd.

MAME is sometimes so disappointing, they create the only cd image format that has delta encoding, then they do absolutely nothing to make it easy to use with the current state of the world (which for romhacks in particular is xdelta patches on the binary track). They're even considering 'reworking' chd to a 'more accurate' format - to be clear, this probably would require redumps of absolutely everything, and i can think of no finer way to kill the format.

@chyyran
Copy link

chyyran commented Aug 7, 2022

Thought I would drop by from rtissera/libchdr#79 since I think it's unlikely libchdr will ever provide Python bindings.

chd-rs-py provides Python bindings to chd-rs which is all in pure Rust (without the undocumented fast_zlib feature which switches to using zlib-ng for Deflate decompression) and so will work on Windows, Linux and macOS.

I used maturin to publish it hastily to pypi so I'm not sure if the current version on pypi is compiled for Linux. Python bindings isn't something I use so I probably won't go further than this personally but I'll be happy to accept PRs for anything you need as long as chd-rs exposes it, which should pretty much be everything.

@j68k
Copy link
Owner Author

j68k commented Aug 7, 2022

Thank you very much for the work and for letting me know. I'll definitely check that out next time I'm working on this program. Very much appreciated!

@i30817
Copy link

i30817 commented Aug 7, 2022

Doesn't have a linux wheel yet, i think it needs it specified in cargo.toml for the pypa then build the wheels somehow.

As in this article: https://blog.yossarian.net/2020/08/02/Writing-and-publishing-a-python-module-in-rust

[package.metadata.maturin]
classifier = [
  "Programming Language :: Rust",
  "Operating System :: POSIX :: Linux",
]

same for the rest, windows, macos, bsd etc.

I'm also curious if the metadata is enough to split/know the extents of the tracks in memory.

@chyyran
Copy link

chyyran commented Aug 7, 2022

0.1.2 should have a wheel for linux + sdist if you have a Rust compiler installed

@i30817
Copy link

i30817 commented Aug 7, 2022

Thanks. Can this support reading delta chds?

I suppose the chd_open call either would have to take a list (of ancestors) or internally it would have to search the filesystem itself.

It's rare the emulator that supports the delta chd format, but i'd like that to change, preferably by scanning/enabling a option to scan the current directory.

@chyyran
Copy link

chyyran commented Aug 7, 2022

chd-rs supports opening a parent but because of lifetime issues (unlike Python you can't have multiple references to the inner stream) you can't really pass in a Chd object directly in from Python easily.

It's an API problem that should be easy to solve but I don't have time to do much more than this. Feel free to explore it and send a PR. You basically need to hack in an impl Clone for Chd.

@i30817
Copy link

i30817 commented Aug 7, 2022

I see, python doesn't have the equivalent from 'passing ownership' so to prevent errors, the constructor should call clone to prevent python objects being reused after being passed to the rust constructor.

@i30817
Copy link

i30817 commented Aug 7, 2022

Oh hey, i checked the code and this appears to already exist?

#[pyfunction]
fn chd_open(path: String, parent: Option<String>) -> PyResult<Chd> {
    let file = File::open(path)?;
    let parent = parent
        .map(|p| File::open(p).map(BufReader::new))
        .map_or(Ok(None), |v| v.map(Some))?
        .map(|n| ChdFile::open(n, None))
        .map_or(Ok(None), |v| v.map(|v| Some(Box::new(v))))
        .map_err(ChdPyError)?;

    let chd = ChdFile::open(BufReader::new(file), parent).map_err(ChdPyError)?;
    Ok(Chd {
        inner: chd,
        cmp_vec: Vec::new(),
    })
}

Were you mentioning the 'parent of a parent' situation, not just one level of indirection of reads?

I mean, yes, it's a bit awkward that you have to instantiate ChdFiles just to figure out if the file is a delta chd and to find the parent delta (from the checksum of the parent, iirc) before finally making the final result and discarding the rest, but it should work no?

@chyyran
Copy link

chyyran commented Aug 7, 2022

Yeah, I already considered single-parent which is pretty easy to solve.

Theoretically any level of indirection is easy as long as the lifetimes of the parent CHDs are fully resolved within chd_open but you'll just need to come up with an API that does that.

It would be more ergonomic to not have to do that ad-hoc. Something like Chd::open_child(&mut self, String) would probably work and you just build the tree bottom up. nvm, forgot that opening a child takes ownership of self

@chyyran
Copy link

chyyran commented Aug 7, 2022

I mean, yes, it's a bit awkward that you have to instantiate ChdFiles just to figure out if the file is a delta chd and to find the parent delta (from the checksum of the parent, iirc) before finally making the final result and discarding the rest, but it should work no?

Yes, that would work but you can't open the ChdFile directly because it needs the parent to already be valid so the ChdFile is always valid. Instead use ChdHeader::try_read_header to get the header and check the parent SHA if you want to do it that way.

@chyyran
Copy link

chyyran commented Aug 7, 2022

As far as I know though, support for multiple parents is iffy even with chdman so I don't think you're going to run into that situation a lot

@i30817
Copy link

i30817 commented Aug 7, 2022

I suspect it will always be awkward unless a alternate constructor gives a option to scan the filesystem/current dir (at least, maybe even check parent dir, for a obvious tree directory structure for hacks, or sibling directories, or all of these, without recursion naturally), because in most usecases (not mame i suppose, which has its software lists), you don't know which file is the parent.

Although, people tend to say it's bad form for libraries to do file io, for obvious reasons.

I'm interested in delta encoding as a 'softpatch for isos'. I'm interested in (eventually) creating a utility that takes a chd and a xdelta, and makes a child delta chd with the binary track patched, so all the iso hacks in romhacking.net finally have a softpatch format.

Also patching chd or updating patches of chds is a nightmare, you have to uncompress, divide into tracks again, patch and then recompress. Not good usability. Like this it would be a single command and the original game would still be available and to 'update' just means deleting the children and patching the parent again.

@i30817
Copy link

i30817 commented Aug 7, 2022

The header definition in chd doesn't have access to the sha1 of the parent, or doesn't show it. It only shows 'has_parent' not the parent checksum.

This makes it kind of impossible to find the parents by scanning

edit: oh, it's public. I shouldn't complain without building at least once...

@i30817
Copy link

i30817 commented Aug 7, 2022

What about exposing the try_read_header?

I wonder if something like this is enough:

#[pyclass]
struct Header {
    inner: ChdHeader,
}

#[pyfunction]
fn try_read_headers(paths: Vec<Option<String>>) -> Vec<PyResult<Header>> {
    let r = Vec::new()
    for path in paths {
        let h = path
            .map(|p| File::open(p).map(BufReader::new))
            .map_or(Ok(None), |v| v.map(Some))
            .map(|n| ChdHeader::try_read_header(n))
            .map_or(Ok(None), |v| v.map(|v| Some(Box::new(v))))
            .map_err(ChdPyError);
    
        r.push(
            Ok(Header {
                inner: h
            }))
    }
    r
}

/// A Python module implemented in Rust. The name of this function must match
/// the `lib.name` setting in the `Cargo.toml`, else Python will not be able to
/// import the module.
#[pymodule]
fn chd(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(chd_open, m)?)?;
    m.add_function(wrap_pyfunction!(try_read_headers, m)?)?;
    Ok(())
}

ie: exposing two functions for python code to 'try' to read the headers of all chd files it asks, then if a file it wants to load has a parent, do:

r = try_read_headers([file])[0]
if r not None and r.has_parent():
    l = try_read_headers([....every other file, found with some application specific strategy...])
    for h in l:
         if h.sha1 == r.parent_sha1:
              #create file

#etc

I don't know if i should be testing for 'None' or the exception too. I'm not too sure how this is handled, any exception should interrupt or just be ignored. Not sure if it's needed to painstackingly expose all needed function/properties to python or #[pyclass] takes care of that.

@chyyran
Copy link

chyyran commented Aug 7, 2022

Any error that bubbles up will automatically be turned into an Exception after crossing the Python FFI barrier. You don't need to test for None unless Option<T> is returned in Rust, otherwise it's guaranteed to either throw an Exception (in the case of Err), or return something. You will need #[pyclass] to expose the functions you need from Header, but there's not that many.

@i30817
Copy link

i30817 commented Aug 7, 2022

and the public properties?

@chyyran
Copy link

chyyran commented Aug 7, 2022

Yeah you'll need to expose the public properties as methods in #[pyclass] since to CPython it's just a builtin.

@chyyran
Copy link

chyyran commented Aug 7, 2022

HeaderV* implement Clone, so you can possibly use #[pyo3(get, set)] instead if you only care about CHD v5.

let header = ...;
let header: HeaderV5 = match header {
  V5Header(h) => h,
  _ => unimplemented!(); // will crash python, but you can return `Err` instead which will throw an exception
}

@i30817

This comment was marked as outdated.

@chyyran
Copy link

chyyran commented Aug 8, 2022

Instead of storing CHDHeader as inner just store HeaderV5 which should have everything you need. I think PyO3 should handle checking if None gets passed into the method.

I would set up a Rust toolchain and fiddle with it until it compiles. Just follow the guide at https://pyo3.rs/v0.16.4/

Docs for chd-rs are at https://docs.rs/chd/latest/chd/index.html

@i30817

This comment was marked as outdated.

@chyyran
Copy link

chyyran commented Aug 8, 2022

Sure I'd be happy to take PRs

@i30817
Copy link

i30817 commented Aug 8, 2022

After the pr was done this is what i have to build the chd @j68k , hope you can find a way to 'draw the rest of the owl' because i'm drawing blanks into turning the fourcc tags into actual track sha1 sums. I know we have to calculate the extents of the tracks into bytes elapsed from the tags...

code is untested so don't be surprised if it has some dumb bugs.

Spoiler warning
from chd import chd_read_header, chd_open
def chd_tracks_sha1(chdfile: Path, possible_parents: List[Path]):
    '''
    Try to extract a list of track sha1 sum strings from a chd
    This method supports cd type chds
    If passed a non-cd chd, or a delta chd without the ancestors in the list of possible parents, returns None
    '''
    parents_headers = [ chd_read_header(str(parent)) for parent in possible_parents ]
    headersmap = { str(path.resolve(strict=True)) : header for path, header in zip(possible_parents, parents_headers) }
    
    chdfile = str(chdfile.resolve(strict=True))
    headersmap[chdfile] = chd_read_header(chdfile)
    
    chd = bottom_up_chd_build(chdfile, headersmap)
    if not chd:
        return None
    
    tags = [ m.tag().to_bytes(4, byteorder='big') for m in chd.metadata() ]
    
    def is_cdrom(tag):
        return tag == b'CHCD' or tag == b'CHTR' or tag == b'CHT2' or tag == b'CHGT' or tag == b'CHGD'
    
    if not any(map(is_cdrom, tags)):
        return None
    
    #do sha1sum of tracks here, draw the rest of the owl

def bottom_up_chd_build(chdfile, headers):
    header = headers[chdfile]
    del headers[chdfile]
    if header.has_parent():
        try:
            parentfile, _ = next(filter(lambda kv: kv[1].sha1() == header.parent_sha1(), headers.items()))
        except StopIteration:
            return None
        parentchd = bottom_up_chd_build(parentfile, headers)
        if parentchd:
            return chd_open(chdfile, parentchd)
        else:
            return None
    else:
        return chd_open(chdfile)

@j68k
Copy link
Owner Author

j68k commented Aug 8, 2022

Thank you very much for getting that done. And that's definitely helpful to have an outline of how to use the .chd library.

Just to set expectations appropriately, I'm afraid I'm painfully busy with another project for the rest of this month at least, so it will be a while before I can do substantial work on verifydump. So, sorry about the wait, but thank you again!

@chyyran
Copy link

chyyran commented Aug 8, 2022

https://github.com/Manorhos/imageparse/blob/master/src/chd.rs this might be of interest, but this is using chd-rs in Rust directly and not with Python

@i30817
Copy link

i30817 commented Aug 9, 2022

That's complex. I guess i'm going to bug that person to add the delta-chd constructor so that a eventual python binding of that can work, because i have no confidence in myself on translating that code.

edit: then again, that project is a abstraction over both chd and cue/bin, so i'm not sure they'd add a chd specific factory method.

@chyyran
Copy link

chyyran commented Aug 9, 2022

That code just uses chd-rs and as per SnowflakePowered/chd-rs#2 there are no plans until MAME adds support for it.

@i30817
Copy link

i30817 commented Aug 9, 2022

I was writing about children chd, I just call them 'delta chds'. Xdelta subchild support is a feature apart.

The code in that image project is using a slightly older version of chd-rs without the two arguments chd_open.

@i30817
Copy link

i30817 commented Aug 9, 2022

BTW i found a github action that builds multiplatform wheels of rust maturin projects: https://github.com/messense/maturin-action

Might be useful if we make the binding to that library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants