-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local scanned images #10
Comments
size estimationsSince the scanned images take a lot of disk space, let's get some statistics on the current actual disk space at Cologne devoted to scanned images. Here is a listing of the 34 publicly available dictionaries, along with the space taken up by the scanned images used in the displays, and the number of image files.
So, all in all there is about 8.2GB of space used and about 41,000 individual scanned images. Notes
|
Github is a viable location for keeping the imagesGithub has this to say about repository size limits (reference]):
None of the image files exceeds 100MB ; average size is 0.2MB per image ( 8217MB / 40984 files). Based on the 'hard limit', all of the images could be kept in one repository (8.2GB < 100GB). |
Proposed 34 repository solutionI think there should be one repository for the images for each dictionary. This would allow If the images for each dictionary were kept in a separate repository, then there would be 34 new repositories, and all but 1 (pwg) would take less space than the 1GB Github recommended repository maximum size, Proposed repository naming conventionThe repository names could be 'scans-xxx', were 'xxx' is one of the 34 (lower-case) dictionary Proposed github project to contain the scan repositoriesIt might be simplest to add the 34 new repositories to the sanskrit-lexicon Github organization. The naming convention 'scans-xxx' would allow easy filtering of the image repositories from among all the sanskrit-lexicon repositories. |
request feedbackPlease provide feedback regarding the above suggestion! In the meantime, I'll set up procedures along the lines indicated above, using one or two dictionaries for the prototyping. |
The images are not going to change frequently. So instead of one repository per dictionary, The installation instruction may give a prompt "Do you want to download dictionary page images for local use? It will take roughly XXX MB of download and YYY MB of disc space." If user says no, we don't download images. If he says yes, we download images. |
I agree images will almost never change. My experience with zip is that images compress very little. If one repository for all dictionaries, then cloning that repository will require a user to download If the user only wants the images of, say, MW dictionary, then he would have to download an additional 7.5GB of unneeded stuff just to get the 500MB of images that he wants. If user actually wants the images for all dictionaries, it will still take a long time -- about the same amount of time/space whether the images are in one repository or 34 repositories. What are the downsides of separate repositories? |
There are no downside of separate repositories, except too many repositories. |
sanskrit-lexicon-scans organizationTo deal with the 'too many repositories' issue, we could put all the image repositories in another Github organization. As an experiment to this end, I've made a 'sanskrit-lexicon-scans' organization. Am currently working to automate process of initializing sanskrit-lexicon-scans/xxx repositories. |
sanskrit-lexicon-scans/accThis repository now exists, and is populated with the images.
Also request feedback on the choice of sanskrit-lexicon-scans organization sanskrit-lexicon-scans/aeAlso populated. Next steps
|
Fantastic!! Please let me look through it and I will try if I could do some of the listed as next steps, I might try csl-websanlexicon as well to see if I understand that enough to make working changes. |
csl-websanlexicon modifiedThe change is very brief. Just in dictinfo.php. Here's how to see the change in action. I'm assuming you already have a local machine or a server set up and populated with Before updating to local images
update local csl-websanlexicon
regenerate cologne/xxx/webInstall the new code at least for xxx=acc and ae.
set up for local scanned images
get local images for acc and ae
test that local images are being used for acc, aeSame steps as under 'Before updating to local images' above. But now, for example, If you were to use your local copy of mw, it would still show images from cologne, since |
csl-apidev needs similar modificationcsl-apidev is another piece that can be run locally. We haven't discussed it yet. |
Great! It works on my local VM:
I searched for 'karma' in MW, it points to: Do we plan to have them all in Git so they can be pulled locally? Thank you! |
Yes. I wanted to get some feedback on the wording of the readme and the choice of license before |
Regarding licence, I prefer GPLv3. |
Readme should give installation instructions for local images. |
Makes sense to me. |
comparison between gplv3 and cc-by-sa.There are several comparisons between these two licenses. From this comparison,
The main reasons I suggested the CC-BY-SA license for these scanned image repositories:
Given the above, I still have a slight preference for CC-BY-SA license for these repositories. Currently our software repositories (e.g. csl-pywork and a couple of others) do not have a license; if we add a license, GPLv3 might be a good choice. Another option would be MIT license. @drdhaval2785 and @gasyoun : In light of these comments, do you have any further thoughts on the choice of license? Do you have a strong preference for the GPLv3 license for these scanned image repositories? |
Based on your comments, I am OK with CC for images and GPLv3 for csl-pywork, apidev andd websanlexicon |
Peter's suggestionI asked Thomas Malten and Peter Scharf their opinion regarding license. Thomas is fine with CC BY-SA. Peter prefers CC BY-NC-SA. His reason:
Here is a link to cc by-nc-sa Here is an excerpt from https://wiki.creativecommons.org/wiki/NonCommercial_interpretation;
My own opinion is that it doesn't matter much. I'm fine to go with cc by-nc-sa. What do others think? |
CC BY-NC-SA is fine to me too. |
Will proceed with the scanned image installations under CC BY-NC-SAThomas also concurs with NC. The other thing that needs to be done (@drdhaval2785 requested above) is installation instructions (i.e. how to use the scanned images in a local installation). I'll make a 'sanskrit-lexicon-scans/documentation' repository, and make a link in the README.MD for each dictionary to the README.md in the documentation repository. |
Scanned images for all dictionariesAll repositories sanskrit-lexicon-scans/xxx have now been populated with the images. sanskrit-lexicon-scans/documentation/README.md exists, but is currently incomplete. Maybe someone else could work on this README.md. If needed, I'll provide some content next week. |
Exactly.
Time to add.
No, no strong preferences. MIT is good as well.
So am I.
It's owned only by you, Jim, right? Thinking about a case of emergency and is why I ask.
@YevgenJohn give it a try? |
I think I 'invited' @drdhaval2785 , @YevgenJohn , and you (@gasyoun ) to the 'team' for the 'sanskrit-lexicon-scans' organization. Did you receive invitation? Although I created the organization, my intent was to have it jointly 'owned' by all 4. Do I need to do something in settings regarding ownership, so that I am not the only 'owner'? |
Thank you very much! I'm trying to make a standalone VM with images, disconnect its network interfaces and see if links to the pictures work (as it won't be able to reach out to Cologne server). Apologies for not contributing to the licenses discussion, as I don't know that subject well enough. |
Only by accident now I see it. Others are here by now.
Yes, for each person you set them to be a non-member, but owner. |
How do I change @drdhaval2785 (and others) from Member to Owner? |
@YevgenJohn Why don't you start an issue regarding this standalone VM. It would be interesting to better understand what is meant by a standalone VM, and how it would be used. |
Absolutely, very good idea! I wonder how much space the VM image would take with all scanned pages uploaded. I just added another disk to the VM to accommodate it. My goal is to provide a ready product linguists can plug in and use (when offline, or if they want to run heavy query which would otherwise slow shared server down, so we can remove upper limit on number of results), as asking them to do Linux commands to set it up locally seems a bit of impractical to me. Thank you! |
Local scanned images have stabilized. Closing the issue. |
This issue is to deal with an enhancement to the local dictionary installation process (as described in the readme.md at csl-pywork/v02.
The feature regards installation of local copies of the scanned images for each dictionary; this feature was mentioned in #6 comments.
The text was updated successfully, but these errors were encountered: