Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize the image_url field of the Search API so that it uses regular URLs instead of base64 for all result types #10831

Closed
GPortas opened this issue Sep 10, 2024 · 5 comments · Fixed by #10855
Labels
FY25 Sprint 6 FY25 Sprint 6 GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 30 Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) SPA.Q3.1 Collection page results of all types SPA These changes are required for the Dataverse SPA Type: Feature a feature request User Role: API User Makes use of APIs
Milestone

Comments

@GPortas
Copy link
Contributor

GPortas commented Sep 10, 2024

Overview of the Feature Request

Currently, image_url returns base64 URLs for files and dataverses, while it returns a regular URL for datasets. The goal is to standardize all URLs to use the same format. We have chosen to use regular URLs instead of base64 for all cases.

What kind of user is the feature intended for?
API User

What inspired the request?

#10811 (comment)

What existing behavior do you want changed?

Use regular URLs in image_url fields for all result types of the Search API.

Any brand new behavior do you want to add to Dataverse?

None

Any open or closed issues related to this feature request?

#10811

Are you thinking about creating a pull request for this feature?

Yes

@GPortas GPortas added Type: Feature a feature request User Role: API User Makes use of APIs Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) SPA These changes are required for the Dataverse SPA GREI Re-arch Issues related to the GREI Dataverse rearchitecture labels Sep 10, 2024
@g-saracca g-saracca added the FY25 Sprint 6 FY25 Sprint 6 label Sep 11, 2024
@stevenwinship stevenwinship self-assigned this Sep 13, 2024
@stevenwinship
Copy link
Contributor

@GPortas @pdurbin
Dataverses and Datasets have thumbnail images designated by the /logo at the end of the url. Files do not have these. If the image_url is changed from base64 to the url of the file then the response will have both "url" and "image_url" with the same values.
Are you asking for a new API for files to get a "logo" ( I.e /api/files/{id}/logo) or is the current /api/access/datafile endpoint all you want to see?
Changing the image_url is a break to backward compatibility. And since the access url is already there I'm wondering why this is needed

Here is an example of what the change would look like:
{
"name": "bird.jpg",
"type": "file",
"url": "http://localhost:8080/api/access/datafile/3",
"image_url": "http://localhost:8080/api/access/datafile/3",
"file_id": "3",

{
"name": "test1",
"type": "dataset",
"url": "https://doi.org/10.5072/FK2/ARNIUJ",
"image_url": "http://localhost:8080/api/datasets/2/logo",
"global_id": "doi:10.5072/FK2/ARNIUJ",

@qqmyers
Copy link
Member

qqmyers commented Sep 13, 2024

Not sure I'm following the whole discussion but FWIW: files have thumbnail URLs like https://demo.dataverse.org/api/access/datafile/2378029?imageThumb=true which is what is used within the dataset page file table.

@stevenwinship
Copy link
Contributor

So, are you saying the url and image_url should look like this:

"url": "http://localhost:8080/api/access/datafile/3",
"image_url": "http://localhost:8080/api/access/datafile/3?imageThumb=true",

@qqmyers
Copy link
Member

qqmyers commented Sep 13, 2024

I think that would work. Whether it makes sense to providing a new /logo URL with no parameters, which might let browsers cache the result is perhaps a separate question. (FWIW: If S3 direct storage is used, any URL is going to be a redirect to the S3 object, possibly a signed URL if the file is draft - I don't know how caching works in a case like that.) Perhaps just using the existing URL is enough and we can see if getting these small images is really a performance issue these days. (We're somewhat guessing that replacing the base64 images doesn't slow the existing UI noticeably (if at all) so just doing a small tweak to be able to test that might be a good start.)

@GPortas
Copy link
Contributor Author

GPortas commented Sep 16, 2024

@stevenwinship

This is a backward-incompatible change, but only in comparison to the updates introduced in PR #10811, as image URLs were not being returned via the API before that PR, as mentioned in the PR description:

"The image_url field was already included in the SolrSearchResult JSON, but it wasn’t returned by the API because it was appended only after the Solr query was processed in the SearchIncludeFragment of JSF. Now, the field is set in SearchServiceBean, ensuring it is always returned by the API when an image is available."

So, if no version of Dataverse has been released that returns image_url in the Search API results, there shouldn't be anything to break.

A separate topic would be deciding whether to remove, after switching to regular URLs, the image_url values set in SearchIncludeFragment so that JSF also stops using base64, since regardless of what SearchServiceBean sets, base64 images are set in JSF after the Solr search: https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/search/SearchIncludeFragment.java#L1425

@stevenwinship stevenwinship removed their assignment Sep 17, 2024
@GPortas GPortas added the SPA.Q3.1 Collection page results of all types label Sep 23, 2024
@pdurbin pdurbin added this to the 6.4 milestone Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 6 FY25 Sprint 6 GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 30 Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) SPA.Q3.1 Collection page results of all types SPA These changes are required for the Dataverse SPA Type: Feature a feature request User Role: API User Makes use of APIs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants