[DataCap Refresh] <3rd> Review of <DSPA Allocator> #274

pandacrypto · 2025-01-20T09:15:29Z

Basic Information

Allocator Type: [Manual]
Paste your JSON ID: [https://github.com/v5 Notary Allocator Application: DSPA notary-governance#1045]
Allocator Verification: [Yes]
Allocator Application
Compliance Report
Previous Audits

First Audit Result: 10 PiB
Second Audit Result: 20 PiB

Current Allocation Distribution

Client Name	Granted DC
MINDACE ACADEMY	1.5 PiB
Human PanGenomics Project - HPGP-1	2 PiB
bharchitects	2.5 PiB
DSPA-Asia	4 PiB
jcphysics	4 PiB
Xin Chuan Pictures	2 PiB
Encyclopedia of DNA Elements (ENCODE)	4 PiB

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/35]

Requested DC: 8 PiB
Currently Granted DC: 1.5 PiB

II. Dataset Completion

https://www.hcleducation.com.sg/en/course#N2
https://www.hcleducation.com.sg/en/course#P4
https://www.hcleducation.com.sg/en/course#S1
https://www.hcleducation.com.sg/en/course#S4

III. Does the list of SPs provided by the client and the SP list updated in the issues match the SP list used for transactions?

Basically consistent and updated in the GitHub process.

IV. Comparison between the number of copies declared by the client and the actual number created: 8 vs 6

V. Please provide a list of SPs used for transactions along with their retrieval rates.

SP Retrieval Rates

SP ID	Retrieval Rate	>75% Retrieval?
f03254061	76.71%	Yes
f03254125	67.44%	No
f03251828	84.41%	Yes
f03254063	74.93%	Yes
f03251827	75.37%	Yes
f03259560	66.23%	No

filecoin-watchdog · 2025-01-20T11:50:25Z

@pandacrypto Could you, please, fill in the template properly?
Do you have any difficulties with the template?

filecoin-watchdog · 2025-01-24T11:09:01Z

@pandacrypto Any updates?

pandacrypto · 2025-01-24T11:14:32Z

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/14]

Requested DC: 5 PiB
Currently Granted DC: 4.5 PiB

II. Dataset Completion: https://github.com/human-pangenomics/hpgp-data

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 5 vs 5

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID	Retrieval Percentage	Exceeds 75% Retrieval Rate?
f03201941new	84.2%	Yes
f03188443	-	Uncertain
f03188440	-	Uncertain
f03254061	76.7%	Yes
f03196399	-	Uncertain
f03254125	67.4%	No
f03251828	84.2%	Yes
f02227496	34.24%	No
f02956073	83.9%	Yes
f03190614	-	Uncertain
f03173124	-	Uncertain
f03228953new	83.5%	Yes
f03190616	-	Uncertain
f03254063	74.9%	No
f03080854	83.9%	Yes
f02815438	62.62%	No
f03251827	75.4%	Yes
f03080852	49.9%	Yes
f03196401	-	Uncertain
f03259560	66.2%	No

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/33]

Requested DC: 12 PiB
Currently Granted DC: 2.5 PiB

II. Dataset Completion: https://bharchitects.com/zh/projects-zh/

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 10 vs 6

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID	Retrieval Percentage	Exceeds 75% Retrieval Rate?
f03254061	83.76%	Yes
f03254125	67.96%	No
f03251828	84.2%	Yes
f03254063	81.83%	Yes
f03251827	81.61%	Yes
f03259560	75.19%	Yes

pandacrypto · 2025-01-24T11:18:25Z

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/11]

Requested DC: 30 PiB
Currently Granted DC: 7.5 PiB

II. Dataset Completion: https://github.com/awslabs/open-data-docs/tree/main/docs/noaa/noaa-gefs-pds

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 10 vs 10

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID	Retrieval Percentage	Exceeds 75% Retrieval Rate?
f03201941	84.2%	Yes
f03188443	-	Uncertain
f03028412new	76.1%	Yes
f03188440	-	Uncertain
f03254061	69.07%	No
f010202	88.5%	Yes
f02227496	34.54%	No
f02956073new	83.9%	Yes
f03190614	-	Uncertain
f03173124	-	Uncertain
f03228953	83.5%	Yes
f03190616	-	Uncertain
f03254063	59.48%	No
f03156617	83.4%	Yes
f03148356	84.2%	Yes
f02815438	67.47%	No

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/8]

Requested DC: 12 PiB
Currently Granted DC: 8.5 PiB

II. Dataset Completion: http://jcphysics.com/

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created:

10 vs 10

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID	Retrieval Percentage	Exceeds 75% Retrieval Rate?
f03201941	84.2%	Yes
f03188443	-	Uncertain
f03028412	76.1%	Yes
f03188440	-	Uncertain
f03254061	19.95%	No
f03196399	-	Uncertain
f03091738new	87.22%	Yes
f010202	87.74%	Yes
f03254125	-	Uncertain
f02953066	18.20%	No
f02837684	83.9%	Yes
f02956073	83.9%	Yes
f03190614	-	Uncertain
f03173124	-	Uncertain
f03228953	83.5%	Yes
f03190616	-	Uncertain
f03254063	0.17%	No
f03080852new	49.9%	No
f03080854	83.9%	Yes
f03156617	83.4%	Yes
f03148356	84.2%	Yes
f03196401	-	Uncertain
f03259560	-	Uncertain

pandacrypto · 2025-01-24T11:23:33Z

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/31]

Requested DC: 10 PiB
Currently Granted DC: 2 PiB

II. Dataset Completion:

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 10 vs 6

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID	Retrieval Percentage	Exceeds 75% Retrieval Rate?
f03254061	86.56%	Yes
f03254125	68.91%	No
f03251828	88.50%	Yes
f03254063	85.85%	Yes
f03251827	87.89%	Yes
f03259560	86.87%	Yes

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/6]

Requested DC: 10 PiB
Currently Granted DC: 8.5 PiB

II. Dataset Completion:

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 5 vs 5

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID	Retrieval Percentage	Exceeds 75% Retrieval Rate?
f03201941	84.2%	Yes
f03188443	-	Uncertain
f03028412	76.1%	Yes
f03188440	-	Uncertain
f03196399	-	Uncertain
f03254061	69.07%	No
f03091738	86.8%	Yes
f03254125	67.4%	No
f02953066	23.61%	No
f02227496	34.54%	No
f02837684	83.9%	Yes
f02956073	83.9%	Yes
f03190614	-	Uncertain
f03173124	-	Uncertain
f03228953	83.5%	Yes
f03190616	-	Uncertain
f03080854	83.9%	Yes
f03080852	49.9%	No
f03254063	59.48%	No
f03196401	-	Uncertain
f03259560	66.2%	No

pandacrypto · 2025-01-24T11:26:12Z

Allocation Summary

Allocator's Notes:
<1. Supports both legacy data clients and new data clients; 2. Facilitates cooperation between new and old data clients and SPs partners.>
Did the allocator timely report any issues or discrepancies that occurred during the application processing?
<Yes, an issue with bot malfunction was reported at Request for Assistance with DSPA-Allocator Recharge Process（Robot malfunction） fidlabs/allocator-tooling#104, but it has since been resolved.>
What steps were taken to minimize unfairness or risky behavior during the allocation process?
<1. For further KYC (Know Your Customer) review, requested company business license information or personal ID for identity verification; 2. Leveraged official bot reports to monitor compliance.>
How do these distributions add value to the Filecoin ecosystem?
<Generally, proactive inquiries are made: Was this dataset previously stored on Filecoin? If so, why is it being stored again? By understanding the data client's response, we aim to grasp the dataset's value to the entire Filecoin ecosystem.>
Please confirm that for each allocation made to a client, you have maintained the standards set in the application and understood the Fil+ guidelines outlined in the application.
——————Yes
Please confirm that you understand that by submitting this Github request, you will be subject to due diligence review, which may require you to return to this issue to provide updates.
——————Yes

pandacrypto · 2025-01-24T11:33:49Z

@pandacrypto Any updates?

Apologies for the delay @filecoin-watchdog . Due to a high volume of data client application submissions and the adoption of a new template by the official side, our team has been manually organizing the information over the past few days. All information has now been submitted for your review. Please feel free to contact us if there are any issues or further details required.

filecoin-watchdog · 2025-01-28T11:13:59Z

@pandacrypto
MINDACEACADEMY #35

Proper KYB with bookkeeping.
The distribution, replication, and retrievability of the data appear to be healthy.
The SP indicated in the original form does not match the report. The client disclosed this in the comments; however, the application should be updated accordingly.
Insufficient information has been provided about the data preparation process. This lack of detail, particularly given the educational nature of the stored dataset, makes it difficult for community members to understand how they can benefit from the stored data.

NationalHuman Genome Research Instititue #14

The SP indicated in the original form does not match the data to which it is sealed. The client disclosed this in the comments; however, the application should be updated accordingly.
Data Set receiving DC in parallel: [DataCap Application] BitsAndBytes - human-pangenomics cryptowhizzard/Fil-A-2#1 Data indicates it could be part of a very large set, portions of which were already stored on the network. There is also a lack of clarity regarding the preparation process, which could provide valuable information to the community on how stored files can be used by other network users.
Mixed results in retrievability. Signs of VPN usage by some service providers, which should trigger verification according to allocator rules.
There are already 6 replicas, despite only 5 being declared.
The requested DataCap is excessive. For a dataset size of 580 TiB with 5 replicas, the client should request 3 PiB, not 5 PiB.
7 out of 20 SPs have retrieval of 0%.

bharchitects #33

Insufficient information provided on the data preparation steps and a lack of clear description regarding the nature of the stored data. This omission limits the community's understanding of its utility and how end users can benefit from the dataset.
The distribution, replication, and retrievability of the data appear healthy. All SPs used were properly disclosed in the main application.

TheNational Oceanic and Atmospheric Administration #11

Data indicates it could be part of a very large set, portions of which were already stored on the network. There is a lack of clarity regarding the preparation process, which could provide information to the community on how stored files can be utilized by other users.
38 declared SPs versus 21 used, with good retrievability, replication, and percentage distribution.

jcphysics #8

Insufficient information has been provided about the data preparation process. This lack of detail, particularly given the educational nature of the stored dataset, makes it difficult for community members to understand how they can benefit from the data.
New SPs were disclosed only in comments, making due diligence very difficult.
Acceptable retrievability; however, some SP data is unavailable.
The requested DataCap is excessive. For a dataset size of 700 TiB with 10 replicas, the client should request 7 PiB, not 12 PiB.
Sum of Unique Data is 1.19PiB instead of declared 700TiB.

HangzhouXinchuan Film and Television Culture Media Co., Ltd. #31

No data sample or evidence provided to explain and justify the requested data size.
SPs with whom the client is cooperating do not match those in the initial application. Some were disclosed in comments, while others were not mentioned at all.
Very good retrieval rate and excellent geo-diversification of SPs.
Sum of Unique Data is 356TiB instead of declared 980TiB.

ENCODEData Coordinating Center #6

“Has been stored before, but not very many times; meaningful data deserves to be stored over and over again.”
Referring to the user's statement: the dataset was stored dozens of times. This, combined with the lack of a clear description of what will be stored and how community users can identify parts of that data, makes the explanation insufficient.
Retrievability could be improved. Currently, 58.82% of storage providers have a retrieval success rate of less than 75%.
Likely VPN usage by some SPs (e.g., f03251828). While acceptable under allocator rules, this should trigger enhanced diligence processes.
14 declared SPs in the original application versus 24 disclosed in the report. All SPs should be updated, and proper bookkeeping should be maintained.
The requested DataCap is excessive. For a dataset size of 1.1PiB with 5 replicas, the client should request 5.5-6 PiB, not 10 PiB.
Sum of Unique Data is 1.99PiB instead of declared 1.1PiB.

Overall Observations

Very good communication and attention to diligence.
Very good performance (retrievability) on most clients.
Improvements can be made in the following areas:
1. Explaining the data preparation steps to enable the community to understand how files (in the case of open data) can be processed.
2. Clearly identify datasets and justify their size and uniqueness.
3. Ensuring SP verification and proper bookkeeping.
4. Checking dataset size and comparing it with the report.

pandacrypto · 2025-02-04T04:01:28Z

Thank you to @filecoin-watchdog for the diligent efforts in meticulously reviewing every client application and providing an exceptionally thorough summary. We also appreciate @filecoin-watchdog's recognition of the overall operation of DSPA Allocator, and we commit to improving on the suggested points during future reviews.

Regarding data preparation and whether data has been stored on Filecoin before: We will enforce stricter requirements for clients to provide detailed steps of data preparation and descriptions of how their datasets contribute value to the entire Filecoin ecosystem, making the process of data preparation and storage more educational. Moreover, DSPA-Allocator will encourage storing new datasets on Filecoin unless the application provides compelling reasons for storing existing ones.
Concerning clearly identifying datasets and proving their size and uniqueness: During our last Datacap refresh, there was adequate communication about dataset sizes (2nd Community Diligence Review of DSPA-Asia Allocator #170 (comment)). In the latest client applications, offline discussions were held with clients to ensure transactions are fully filled with data rather than partially as in previous applications. However, for verifying the size of the dataset from clients, we will follow official guidelines for improvements, such as requesting proof of dataset size or additional samples.
Ensuring SP verification and proper bookkeeping: Clients are required not only to update SPs information timely in GitHub comments but also to reflect these updates accordingly in their original application submissions.

We have a question regarding the statement from @filecoin-watchdog that "Sum of Unique Data is 356TiB instead of declared 980TiB." Could you please clarify how this 356 TiB figure was determined? Understanding this would greatly assist us in enhancing our review capabilities.

filecoin-watchdog · 2025-02-04T14:44:00Z

Regarding data preparation and whether data has been stored on Filecoin before:

Please remember that publicly available open data intended for community retrieval should also include an index as part of the process. This index should enable users to connect sealed data with the original dataset, allowing those who wish to use this backup for computing purposes to do so effectively. You can read more about it here: #125

We have a question regarding the statement from @filecoin-watchdog that "Sum of Unique Data is 356TiB instead of declared 980TiB." Could you please clarify how this 356 TiB figure was determined? Understanding this would greatly assist us in enhancing our review capabilities.

You can add values of Unique Data, which will give you the total amount of unique and sealed data. In this case, all the data is stored across 7 replicas, but the unique data is only approximately 356 TiB.

pandacrypto · 2025-02-04T15:32:30Z

Regarding data preparation and whether data has been stored on Filecoin before:

Please remember that publicly available open data intended for community retrieval should also include an index as part of the process. This index should enable users to connect sealed data with the original dataset, allowing those who wish to use this backup for computing purposes to do so effectively. You can read more about it here: #125

We have a question regarding the statement from @filecoin-watchdog that "Sum of Unique Data is 356TiB instead of declared 980TiB." Could you please clarify how this 356 TiB figure was determined? Understanding this would greatly assist us in enhancing our review capabilities.

You can add values of Unique Data, which will give you the total amount of unique and sealed data. In this case, all the data is stored across 7 replicas, but the unique data is only approximately 356 TiB.

Understood, we have received your reply and will consider your advice.Thank you very much！@filecoin-watchdog @Kevin-FF-USA

pandacrypto assigned Kevin-FF-USA Jan 20, 2025

filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataCap Refresh] <3rd> Review of <DSPA Allocator> #274

[DataCap Refresh] <3rd> Review of <DSPA Allocator> #274

pandacrypto commented Jan 20, 2025 •

edited

Loading

filecoin-watchdog commented Jan 20, 2025 •

edited

Loading

filecoin-watchdog commented Jan 24, 2025

pandacrypto commented Jan 24, 2025 •

edited

Loading

pandacrypto commented Jan 24, 2025 •

edited

Loading

pandacrypto commented Jan 24, 2025

pandacrypto commented Jan 24, 2025 •

edited

Loading

pandacrypto commented Jan 24, 2025 •

edited

Loading

filecoin-watchdog commented Jan 28, 2025

pandacrypto commented Feb 4, 2025 •

edited

Loading

filecoin-watchdog commented Feb 4, 2025

pandacrypto commented Feb 4, 2025

[DataCap Refresh] <3rd> Review of <DSPA Allocator> #274

[DataCap Refresh] <3rd> Review of <DSPA Allocator> #274

Comments

pandacrypto commented Jan 20, 2025 • edited Loading

Basic Information

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/35]

II. Dataset Completion

III. Does the list of SPs provided by the client and the SP list updated in the issues match the SP list used for transactions?

IV. Comparison between the number of copies declared by the client and the actual number created: 8 vs 6

V. Please provide a list of SPs used for transactions along with their retrieval rates.

SP Retrieval Rates

filecoin-watchdog commented Jan 20, 2025 • edited Loading

filecoin-watchdog commented Jan 24, 2025

pandacrypto commented Jan 24, 2025 • edited Loading

pandacrypto commented Jan 24, 2025 • edited Loading

pandacrypto commented Jan 24, 2025

pandacrypto commented Jan 24, 2025 • edited Loading

Allocation Summary

pandacrypto commented Jan 24, 2025 • edited Loading

filecoin-watchdog commented Jan 28, 2025

pandacrypto commented Feb 4, 2025 • edited Loading

filecoin-watchdog commented Feb 4, 2025

pandacrypto commented Feb 4, 2025

pandacrypto commented Jan 20, 2025 •

edited

Loading

filecoin-watchdog commented Jan 20, 2025 •

edited

Loading

pandacrypto commented Jan 24, 2025 •

edited

Loading

pandacrypto commented Jan 24, 2025 •

edited

Loading

pandacrypto commented Jan 24, 2025 •

edited

Loading

pandacrypto commented Jan 24, 2025 •

edited

Loading

pandacrypto commented Feb 4, 2025 •

edited

Loading