Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Book cover extracting enhancements #48

Closed
vers-one opened this issue Jun 25, 2022 · 1 comment
Closed

Book cover extracting enhancements #48

vers-one opened this issue Jun 25, 2022 · 1 comment
Assignees

Comments

@vers-one
Copy link
Owner

Description

EPUB 2 specification doesn't contain explicit requirements on how book cover should be represented in the OPF schema file. Instead it provides only a vague recommendation to use a <guide>/<reference type="cover"> element mentioning the Chicago Manual of Style as the source of the list of applicable <reference> element types.

Most EPUB 2 books use <meta name="cover" content="..." /> element to define the cover, where the value of the content attribute points to a <manifest>/<item> element of the actual cover image. However there are some books that don't follow this pattern, hence all the hacks and heuristics currently present in the BookCoverReader:

// For non-standard ebooks, we try several other ways...
if (null != coverManifestItem) // we have found the item but there was no corresponding image ...
{
// some ebooks seem to contain more than one item with Id="cover"
// thus we test if there is a second item, and whether that is an image....
coverManifestItem = epubSchema.Package.Manifest.Where(manifestItem => manifestItem.Id.CompareOrdinalIgnoreCase(coverMetaItem.Content)).Skip(1).FirstOrDefault();
if (null != coverManifestItem?.Href && imageContentRefs.TryGetValue(coverManifestItem.Href, out coverImageContentFileRef))
{
return coverImageContentFileRef;
}
}
// we have still not found the item
// 2019-08-20 Hotfix: if coverManifestItem is not found by its Id, then try it with its Href - some ebooks refer to the image directly!
coverManifestItem = epubSchema.Package.Manifest.FirstOrDefault(manifestItem => manifestItem.Href.CompareOrdinalIgnoreCase(coverMetaItem.Content));
if (null != coverManifestItem?.Href && imageContentRefs.TryGetValue(coverManifestItem.Href, out coverImageContentFileRef))
{
return coverImageContentFileRef;
}
// 2019-08-24 if it is still not found, then try to find an Id named cover
coverManifestItem = epubSchema.Package.Manifest.FirstOrDefault(manifestItem => manifestItem.Id.CompareOrdinalIgnoreCase(coverMetaItem.Name));
if (null != coverManifestItem?.Href && imageContentRefs.TryGetValue(coverManifestItem.Href, out coverImageContentFileRef))
{
return coverImageContentFileRef;
}
// 2019-08-24 if it is still not found, then try to find it in the guide
var guideItem = epubSchema.Package.Guide.FirstOrDefault(reference => reference.Title.CompareOrdinalIgnoreCase(coverMetaItem.Name));
if (null != guideItem?.Href && imageContentRefs.TryGetValue(guideItem.Href, out coverImageContentFileRef))
{
return coverImageContentFileRef;
}

EPUB 3 on the other hand does define an explicit requirement for cover images by requesting to specify them via <manifest>/<item properties="cover-image"> elements. EpubReader parses these <manifest>/<item> elements along with their properties attributes correctly but does not currently use this information to obtain a cover image of an EPUB 3 book.

Proposed solution

  1. Remove all hacks from BookCoverReader.
  2. Replace heuristics with more robust algorithms to search for cover images in EPUB 2 books.
  3. Add EPUB 3 cover image support with a fallback to the EPUB 2 cover image extraction implementation if an EPUB 3 cover is not available (to support EPUB 3 books that provide only EPUB 2 covers).
@vers-one
Copy link
Owner Author

Implemented in PR #49.

If someone finds a book with a cover that doesn't get extracted by EpubReader, please post a sample file in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant