Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Washed CASIA-webface data set identities #30

Open
shimen opened this issue May 29, 2016 · 10 comments
Open

Washed CASIA-webface data set identities #30

shimen opened this issue May 29, 2016 · 10 comments

Comments

@shimen
Copy link

shimen commented May 29, 2016

Hi, is there a list of identities for the Washed CASIA-webface data set? There are just numbers per each identity. I would like to use several databases for training and would like to remove identities that appear at more than one database.

@shimen
Copy link
Author

shimen commented Jun 5, 2016

Any update on the labels for the CASIA-webface data set

@lazydroid
Copy link

@shimen how did you manage to unpack the dataset? I've got 5 files of 734mb each with the names .z01-.z05 and one 340mb .zip file, that does not look like a zip file. if I concatenate all these files, I get a lot of

file #435315:  bad zipfile offset (lseek):  4403961856
file #435316:  bad zipfile offset (lseek):  4403970048
file #435316:  bad zipfile offset (lseek):  4403970048
file #435317:  bad zipfile offset (lseek):  4403986432

any idea how to correctly extract all the files?

thanks in advance!

@lazydroid
Copy link

regarding the identities, I spoke with the author of the original dataset, he said they cannot release identities at this time, but told me to check their web site later. not sure what that's supposed to mean =)

@lazydroid
Copy link

that's what I thought. could you please try:

$ unzip t combined.zip

to see if there are any errors in the archive?
in my case, there are plenty, the archive seems damaged.

On 06/16/2016 03:46 AM, kihyuks wrote:

@lazydroid https://github.com/lazydroid just in case, you can
concatenate them and unzip in ubuntu:
$ cat CASIA-maxpy-clean.z01 CASIA-maxpy-clean.z02 CASIA-maxpy-clean.z03
CASIA-maxpy-clean.z04 CASIA-maxpy-clean.z05 CASIA-maxpy-clean.zip >
combined.zip
$ unzip combined.zip

@kihyuks
Copy link

kihyuks commented Jun 16, 2016

@lazydroid Actually the one that I suggested before only unzip 1/5. You can try this instead:

$ zip -F CASIA-maxpy-clean.zip --out CASIA-maxpy-clean_fix.zip
$ unzip CASIA-maxpy-clean_fix.zip

This gives me around 450K images.

@sidgan
Copy link

sidgan commented Feb 28, 2017

@shimen @lazydroid @kihyuks Could you please provide a link to the washed CASIA dataset.

@lazydroid
Copy link

@sidgan try this: http://www.down20.com/f-170364248744426

I did not make it, I have just googled the link.

@yao5461
Copy link

yao5461 commented Oct 31, 2017

@lazydroid @kihyuks @sidgan I download the dataset, but only 439,532 images, some images missing. Unpack dataset with commands:
$ zip -F CASIA-maxpy-clean.zip --out CASIA-maxpy-clean_fix.zip
$ unzip CASIA-maxpy-clean_fix.zip
Is there any advice?

@Gerkam
Copy link

Gerkam commented Apr 21, 2018

How many photos and classes must be in washed casia web face?

@t1t0n
Copy link

t1t0n commented Aug 31, 2020

You should use this commands to unzip multi-part zip files.
source

zip -s- CASIA-maxpy-clean.zip -O combined.zip
unzip combined.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants