-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrames for image analysis #220
Comments
I will work on this. |
@JWZ2018 Great. BTW - one end game would be to create something like this in a single e2e pipeline: http://ruebot.net/elxn42.html Fork the repo, start working on a branch - send an initial PR when you're ready and we can discuss iteratively. |
Here's the background on how I make those. |
@lintool |
Yes, that's a good start! |
* Extract Image Links DF API * Add extract image links text * Remove unnecessary comment from test * Add doc comments * Addresses #220
@lintool
|
However, I'm not opposed to poking around for other options... I found this, for example: http://imglib2.net/ Might be a better option, as opposed to messing with JNI. @ruebot has experience with ImageMagick - thoughts? |
You can get the image info with Apache Tika, which we already use in the project with language and mime type extraction. https://tika.apache.org/1.7/formats.html#Image_formats |
* Add Extract Image Details API * Change check for jpeg and fix spacing * Add tiff parser * Use AutoDetectParser and read Numeric fields * Use ComputeImageSize * Hex encode hash and base64 encode image bytes * Fix test * Change df column names
With #226 this is done. Closing. |
Currently, we have RDD-based analytics for image analysis here:
https://archivesunleashed.org/aut/#image-analysis
Let's DataFrame-ify it - that is, build the DataFrame infrastructure that wold support image analysis.
I'm thinking of creating two separate DataFrames:
The text was updated successfully, but these errors were encountered: