Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename a few tags #485

Closed
kelson42 opened this issue Jan 3, 2019 · 20 comments · Fixed by #949
Closed

Rename a few tags #485

kelson42 opened this issue Jan 3, 2019 · 20 comments · Fixed by #949
Assignees

Comments

@kelson42
Copy link
Collaborator

kelson42 commented Jan 3, 2019

DISCLAIMER: This discussion is only about tags and not about file names or mwoffliner formatting options.

Our content tags suffer of a few weaknesses:

  • They are negative (not easy to understand)
  • They are overloading, for example: nopic mean also novid (implicit logic)
  • Readers rely on them, but they are not protected, starting with _

I propose an other approach to say the same things but differently:

  • All negative tags will be removed and we will have positive ones.
  • _nopic -> ! _with_pictures
  • _novid -> ! _with_videos + _with_audios
  • _nodet -> ! _introduction_only

Remark: This does not solve all the problem around the tags, and even open a new one which is: how specific about the content should we be?

@ISNIT0
Copy link
Contributor

ISNIT0 commented Jan 3, 2019

So an example might be wikipedia_en_all_with_pictures_with_videos_with_audios_introduction_only?
That seems a little excessive don't you think?

@kelson42
Copy link
Collaborator Author

kelson42 commented Jan 3, 2019

@ISNIT0 I guess you don't have read the first sentence of the ticket?!

@tim-moody
Copy link
Contributor

@kelson42 I think this goes in the right direction. It handles the media aspects, but there are also content scope aspects such as all [articles] vs medical vs math, etc.

instead of intro only I would tag full_articles (T/F). (but maybe full/synopsis(introduction_only)/title_only also makes sense)

I would also consider has_pictures (T/F) to reduce the need to deduce state from silence.

@ISNIT0
Copy link
Contributor

ISNIT0 commented Jan 3, 2019

@kelson42 My apologies, I don't think I understand the difference here.

@Jaifroid
Copy link
Collaborator

Jaifroid commented Jan 3, 2019

Are these tags in the catalogue? In the ZIM file metadata?

@kelson42
Copy link
Collaborator Author

kelson42 commented Jan 3, 2019

@Jaifroid they are in the library_zim.xml and will be implemented soon in the OPDS stream (kiwix/kiwix-tools#252).

@mgautierfr
Copy link

I've already commented about tag and propose a solution here : kiwix/libkiwix#131

@kelson42
Copy link
Collaborator Author

It seems that @mgautierfr @automactic @tim-moody are all in favour of a tagging system with the ability to put append a value like =true or =false. Like expressed many times in the past, I have still a doubt if this is a good idea, but I kind of trust your agreement on this to be a good move. So here is an alternative proposal based on your feedbacks:

  • All of the following will be standardised in the openzim spec (probably as a recommendation), so we can build software on this.
  • All negative tags will be removed and we will have positive ones.
  • Following tags (on the right) can have an optional boolean value like =true or =false, nothing written should be assumed like =true
  • _nopic -> ! _has_pictures
  • _novid -> ! _has_videos + _has_audios
  • _nodet -> ! _full_articles=false

@kelson42
Copy link
Collaborator Author

I would also consider has_pictures (T/F) to reduce the need to deduce state from silence.

@tim-moody This is a topic too, I agree. But I want to treat this separately.

@tim-moody
Copy link
Contributor

@kelson42 I wonder if the initial underscore is still necessary, since all tags have it. Are you proposing a delimited list of tags as at the moment or that these tags become individual attributes?

@tim-moody
Copy link
Contributor

There is also the problem of a transition plan. Will we have a time when some zims have one set and of tags and others have another? How is backwards compatibility managed? If you plan to make these tags attributes, you could retain the old list as a tags attribute for a transition period.

@kelson42
Copy link
Collaborator Author

There is also the problem of a transition plan. Will we have a time when some zims have one set and of tags and others have another? How is backwards compatibility managed? If you plan to make these tags attributes, you could retain the old list as a tags attribute for a transition period.

Yes, maybe, need to think about that.

@kelson42
Copy link
Collaborator Author

@kelson42 I wonder if the initial underscore is still necessary, since all tags have it. Are you proposing a delimited list of tags as at the moment or that these tags become individual attributes?

If we want to build software on this, this needs to be somehow reserved - to the opposite of tags without underscores which are free. Don't assume this will stay like this (all tags with underscore), this is not the goal.

@mgautierfr
Copy link

Can we tell that:

  • all tags starting with underscore are boolean tags ? They are false or true and if no value is provided, the default is true? (And why allowing to not provide a value ? Especially if the tags are filled by scripts)
  • Any tag not starting with a underscore must be displayed to the user and the "software" doesn't have to try to understand it ?
  • Any (new) tag starting by an underscore but not known by the (old) software must be ignored ? Or displayed to the user as plain tag ?

Why some tags have a has_ (_has_pictures, _has_videos) and some not (_full_articles) ?

@Jaifroid they are in the library_zim.xml and will be implemented soon in the OPDS stream (kiwix/kiwix-tools#252).

And zim's metadata also no ? This is already the case for current zim/tags.

How is backwards compatibility managed?

I suppose the software would have to guess a correct default if a _feature is missing. Depending of how the old zim where built.

How we will handle categories (wikipedia, wiktionary, ted, gutenberg, ...) from "selection" (all, medical, physics, wp1, tunisie, 1000, 100, 10, ...) ? And zim extension (when it will be implemented) ?

I have read the first sentence, but I still wonder how the zim files will be named :)

@automactic
Copy link
Member

automactic commented Jan 18, 2019

maybe we can do this?

category="wikipedia"
media=set("picture", "image", "video")
topic="medical" or set("medical", "physics") if we plan to introduce multi topic / selection zim

IMO, the problem with tags is it is too broad, too generic. Anything could be in there, thus making it messy. It opens the door for tags to grow and grow as we offer more and more zim files and have more ideas.

A better approach comparing to our current tag only system could be to put tag values describing the same aspect of a zim together.

@stale
Copy link

stale bot commented Jul 11, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@kelson42 kelson42 assigned kelson42 and unassigned automactic Aug 16, 2019
@kelson42
Copy link
Collaborator Author

This has been strongly discussed at Wikimedia hachathon. Here is the result https://wiki.openzim.org/wiki/Tags. This should be implemented quickly.

@tim-moody
Copy link
Contributor

tim-moody commented Aug 17, 2019 via email

@kelson42
Copy link
Collaborator Author

@tim-moody We will start to generate ZIM files with this new tags within 2 weeks.

@tim-moody
Copy link
Contributor

tim-moody commented Aug 17, 2019 via email

@kelson42 kelson42 mentioned this issue Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants