Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add category support to OPDS #318

Closed
rgaudin opened this issue Aug 15, 2019 · 12 comments · Fixed by kiwix/libkiwix#459
Closed

Add category support to OPDS #318

rgaudin opened this issue Aug 15, 2019 · 12 comments · Fixed by kiwix/libkiwix#459

Comments

@rgaudin
Copy link
Member

rgaudin commented Aug 15, 2019

Following #317, OPDS entries should include a category for ZIM files which matches one.

The Category is fetched from the source catalog and not the ZIM files.

OPDS feed should be query-able by category as well.

@rgaudin
Copy link
Member Author

rgaudin commented Aug 15, 2019

It is worth noting that this information being fetched from the catalog, it won't be availble to people using catalog-less kiwix-serve for instance (kiwix-serve *.zim)

@veloman-yunkan
Copy link
Collaborator

OPDS entries should include a category for ZIM files which matches one.

@rgaudin Currently category information for a catalog entry is included via the _category tag. Is it enough, or you want a separate <category></category> XML node?

@veloman-yunkan
Copy link
Collaborator

OPDS feed should be query-able by category as well.

Similarly, the OPDS feed can be filtered using the _category tag. Do you want a separate category parameter in the query?

@kelson42
Copy link
Contributor

kelson42 commented Mar 3, 2021

@rgaudin @mgautierfr Last comments of @veloman-yunkan make sense. Is that still a valid request? Maybe we just need a proper software primitive at libkiwix level? It seems to me that we definitly need a dedicated ability to filter via category in the OPDS API.

@rgaudin
Copy link
Member Author

rgaudin commented Mar 3, 2021

It depends on what we want to do with categories.
When this request was made, we envisioned we'd have a CMS that would allow customization of those metadata, independently of what's in the ZIM ; to build the library, that itself feeds the OPDS.

I can query by category with https://library.kiwix.org/catalog/search?tag=_category:ted or read its tags for category info at the moment but if we agreed Category is an important concept that we want in all readers, it makes sense to properly exposes it to save duplication of this special handling.
I understand @kelson42's suggestion about libkiwix but the point of OPDS is to be outside libkiwix, right?

What happens if a ZIM has _category:wikipedia;_category:ted ? This is allowed as those are two unique tags but what is its category then? The first one? the last one? None ? This would probably returns on queries for each of those 2 categories if we are to query using tags. If we chose to have a node/attr for it, then the library creator (libkiwix for now ?) would make a choice and that would settle it.

So, IMO it's more of a design decision than a technical need.

@kelson42
Copy link
Contributor

kelson42 commented Mar 3, 2021

When this request was made, we envisioned we'd have a CMS that would allow customization of those metadata, independently of what's in the ZIM ; to build the library, that itself feeds the OPDS.

I confirm this is the plan, but this should not play a role here. The CMS generates the library.xml which then feeds Kiwix Serve. If no library, then the information is extracted from the ZIM metadata themselves. So, either the CMS copies the ZIM metadata without/without transformation, or it comes from the ZIM file directly.

@rgaudin
Copy link
Member Author

rgaudin commented Mar 3, 2021

Yep, category is stored as a tag but it exposure can be different if we'd prefer

@mgautierfr
Copy link
Member

There is several things here. I will write about all of them. We will see how to split that :)

@veloman-yunkan I recommend to read this thread kiwix/libkiwix#131 and associated issues to have a historical point of view on the tag/category situation.

Underscore tags (https://wiki.openzim.org/wiki/Tags)

These are tags added at creation time on zim file to describe them.
They are useful but library system or feed provider may decide to not use them and classify zim another way.

The global idea/plan on kiwix-serve side is to trust the information provided in library.xml and use zim tag as fallback.
This allow other tools (CMS) to generate a catalog/library with specific categorization independently of what is zim files.

In #318 (comment), once must understand library.xml (or sqlite or xapian db) when it reads "catalog".

What happens if a ZIM has _category:wikipedia;_category:ted ?

For me it is a "bug" in the zim file. The behavior is undefined. (The same as _pictures:yes;_pictures:no).
For me underscore tag should be unique.
If we want to allow several categories for a zim (I'm not sure we should), I would argue for something like this : _category:wikipedia|ted;_picture:yes

Getting the list of categories.

What is missing is a way to get the list of categories present in a library.
It is possible to filter the library using category but we cannot which category.
Because of that, we have a static list of category in our clients (kiwix-desktop, ...).
It would be nice to have the opds server providing this list (and possible translation). This is the purpose of https://github.com/kiwix/kiwix-tools/issues/317

Exposure of categories.

If we provide a list of category and if we allow library.xml to have different category than zim tags I would agree that we should have a different entry.
(So a <category> node and a category parameter).

But I would not modify the tag feature at all (content and query).
Even if we add a extra support of category on top of tag, users may still want to search for _category:foo tags (and discard what the catalog think about the zim file).

@kelson42
Copy link
Contributor

kelson42 commented Mar 4, 2021

@veloman-yunkan It seems we (@mgautierfr, @rgaudin and me) have reach easily an agreement:

  • We keep using the _category: tag entry to store the categories
  • We need dedicated OPDS category mgmt, in the search (a new possible option category) and in the OPDS output (a new DOM node).

I hope this makes sense to you?

@veloman-yunkan
Copy link
Collaborator

@veloman-yunkan It seems we (@mgautierfr, @rgaudin and me) have reach easily an agreement:

* We keep using the `_category:` tag entry to store the categories

* We need dedicated OPDS category mgmt, in the search (a new possible option `category`) and in the OPDS output (a new DOM node).

I hope this makes sense to you?

@kelson42 I have only one question. In the library XML will category be specified via the _category tag or a new category attribute should be added to the book element?

@kelson42
Copy link
Contributor

kelson42 commented Mar 4, 2021

@veloman-yunkan not change foreseen in library.xml. Like for now, it is defined as a hidden tag.

@mgautierfr
Copy link
Member

no change foreseen in library.xml. Like for now, it is defined as a hidden tag.

Why ? Just moving the category in a new opds node only is pretty useless.

The purpose of moving the category as a first level attribute of a book is to potentially not relying on the tag to have the category.
Taking what is in the tag and putting in a node doesn't provide any improvement. It just add new code to get content from somewhere else (a new node) where the content we read can be (and is) already read somewhere else (in the tag).

If we don't, we will have a new attribute to the book we can loose if we dump it to the library.xml and we can have the opds not displaying the same thing that the library.xml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants