Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALMA test xmls are not valid MARCXML #1348

Closed
TobiasNx opened this issue Jun 15, 2022 · 10 comments
Closed

ALMA test xmls are not valid MARCXML #1348

TobiasNx opened this issue Jun 15, 2022 · 10 comments

Comments

@TobiasNx
Copy link
Contributor

See here

The problem is, that the namespace is not provided in the marc xml files:
https://github.com/hbz/lobid-resources/blob/b1b8a24d6958026a2adebf1ff607a6ec9dd664aa/src/test/resources/alma

It s: <record>

Should be: <record xmlns="http://www.loc.gov/MARC21/slim">

If there is a<collection>element as in https://github.com/hbz/lobid-resources/blob/4fbc7cfa76792bc5730f19a17f2292d24ab515ac/src/test/resources/alma/almaMarcXmlTestFiles.xml.tar.bz2 then the namespace should include the namespace refrence not the <record> element.
It s: <collection><record>

Should be: <collection xmlns="http://www.loc.gov/MARC21/slim"><record>

Because of this the playground as well as other Flux-Scripts do not recognize the files without the namespace as marc-xml.

@dr0i
Copy link
Member

dr0i commented Jun 20, 2022

But our ETL is working with it.
What looks your flux like?
In java its: FileOpener->XmlDecoder->MarcXmlHandler ...

@dr0i dr0i assigned TobiasNx and unassigned dr0i Jun 20, 2022
@dr0i
Copy link
Member

dr0i commented Jun 20, 2022

ah, uh - and use marcXmlHandler.setNamespace(null); - this prevents an ns-check. In flux, use | handle-marcxml(namespace=null).
Try that. May be it's not working because the null must be a null and not a string "null", so that this workaround doesn't work with flux.
Then you could also try | handle-marcxml(namespace="").

@TobiasNx
Copy link
Contributor Author

Thanks. But the option behaviour seems odd and is not documented:
https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md#handle-marcxml

This should somehow be documented since normally this would be "tru"/"false"

@blackwinter
Copy link
Member

But the option behaviour [...] is not documented

Of course it's documented:

options: namespace (String), attributemarker (String)

You can either set it to null as @dr0i suggested (if that works in Flux) in order to disable the namespace check or set it to the required namespace value ("http://www.loc.gov/MARC21/slim") in order make the check pass (see metafacture/metafacture-core#331).

normally this would be "tru[e]"/"false"

This only applies to boolean options.

@blackwinter
Copy link
Member

But the option behaviour [...] is not documented

Oh, did you mean the part about disabling the namespace check? Then you're right, that's currently not documented. Sorry if I misunderstood.

@dr0i
Copy link
Member

dr0i commented Oct 13, 2023

@TobiasNx can you document it?

TobiasNx added a commit to metafacture/metafacture-core that referenced this issue Oct 13, 2023
Update documentation for namespace handling in decode-marcxml.

See: hbz/lobid-resources#1348 (comment)
@TobiasNx
Copy link
Contributor Author

@dr0i
Copy link
Member

dr0i commented Nov 21, 2024

This issue is related to metafacture/metafacture-core#569.
Btw: is it solved? What is missing here?

@dr0i dr0i moved this from Backlog to Review in lobid-resources Nov 21, 2024
@TobiasNx
Copy link
Contributor Author

While we can work with the marc data from lobid with metafacture now, due to the adjustemts made in metafacture/metafacture-core#569 . The marc data is still not valid.

But since the alma publishing data is part of our records one could argue the data is no "pure" marc xml.

So we could still add the namespace to the marc xml records OR just put this on wont fix until somebody asks for it.

@dr0i
Copy link
Member

dr0i commented Nov 29, 2024

Won't fix atm as discussed in our meeting. Closing.

@dr0i dr0i closed this as completed Nov 29, 2024
@github-project-automation github-project-automation bot moved this from Review to Done in lobid-resources Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants