Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Refresh] <3rd> Review of <DSPA Allocator> #274

Open
pandacrypto opened this issue Jan 20, 2025 · 11 comments
Open

[DataCap Refresh] <3rd> Review of <DSPA Allocator> #274

pandacrypto opened this issue Jan 20, 2025 · 11 comments
Assignees
Labels
Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@pandacrypto
Copy link

pandacrypto commented Jan 20, 2025

Basic Information

  1. Allocator Type: [Manual]
  2. Paste your JSON ID: [https://github.com/v5 Notary Allocator Application: DSPA notary-governance#1045]
  3. Allocator Verification: [Yes]
  4. Allocator Application
  5. Compliance Report
  6. Previous Audits

Current Allocation Distribution

Client Name Granted DC
MINDACE ACADEMY 1.5 PiB
Human PanGenomics Project - HPGP-1 2 PiB
bharchitects 2.5 PiB
DSPA-Asia 4 PiB
jcphysics 4 PiB
Xin Chuan Pictures 2 PiB
Encyclopedia of DNA Elements (ENCODE) 4 PiB

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/35]

  • Requested DC: 8 PiB
  • Currently Granted DC: 1.5 PiB

II. Dataset Completion

https://www.hcleducation.com.sg/en/course#N2
https://www.hcleducation.com.sg/en/course#P4
https://www.hcleducation.com.sg/en/course#S1
https://www.hcleducation.com.sg/en/course#S4

III. Does the list of SPs provided by the client and the SP list updated in the issues match the SP list used for transactions?

Basically consistent and updated in the GitHub process.

IV. Comparison between the number of copies declared by the client and the actual number created: 8 vs 6

V. Please provide a list of SPs used for transactions along with their retrieval rates.

SP Retrieval Rates

SP ID Retrieval Rate >75% Retrieval?
f03254061 76.71% Yes
f03254125 67.44% No
f03251828 84.41% Yes
f03254063 74.93% Yes
f03251827 75.37% Yes
f03259560 66.23% No
@filecoin-watchdog filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 20, 2025
@filecoin-watchdog
Copy link
Collaborator

filecoin-watchdog commented Jan 20, 2025

@pandacrypto Could you, please, fill in the template properly?
Do you have any difficulties with the template?

@filecoin-watchdog
Copy link
Collaborator

@pandacrypto Any updates?

@pandacrypto
Copy link
Author

pandacrypto commented Jan 24, 2025

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/14]

  • Requested DC: 5 PiB
  • Currently Granted DC: 4.5 PiB

II. Dataset Completion: https://github.com/human-pangenomics/hpgp-data

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

  • Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 5 vs 5

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID Retrieval Percentage Exceeds 75% Retrieval Rate?
f03201941new 84.2% Yes
f03188443 - Uncertain
f03188440 - Uncertain
f03254061 76.7% Yes
f03196399 - Uncertain
f03254125 67.4% No
f03251828 84.2% Yes
f02227496 34.24% No
f02956073 83.9% Yes
f03190614 - Uncertain
f03173124 - Uncertain
f03228953new 83.5% Yes
f03190616 - Uncertain
f03254063 74.9% No
f03080854 83.9% Yes
f02815438 62.62% No
f03251827 75.4% Yes
f03080852 49.9% Yes
f03196401 - Uncertain
f03259560 66.2% No

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/33]

  • Requested DC: 12 PiB
  • Currently Granted DC: 2.5 PiB

II. Dataset Completion: https://bharchitects.com/zh/projects-zh/

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

  • Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 10 vs 6

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID Retrieval Percentage Exceeds 75% Retrieval Rate?
f03254061 83.76% Yes
f03254125 67.96% No
f03251828 84.2% Yes
f03254063 81.83% Yes
f03251827 81.61% Yes
f03259560 75.19% Yes

@pandacrypto
Copy link
Author

pandacrypto commented Jan 24, 2025

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/11]

  • Requested DC: 30 PiB
  • Currently Granted DC: 7.5 PiB

II. Dataset Completion: https://github.com/awslabs/open-data-docs/tree/main/docs/noaa/noaa-gefs-pds

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

  • Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 10 vs 10

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID Retrieval Percentage Exceeds 75% Retrieval Rate?
f03201941 84.2% Yes
f03188443 - Uncertain
f03028412new 76.1% Yes
f03188440 - Uncertain
f03254061 69.07% No
f010202 88.5% Yes
f02227496 34.54% No
f02956073new 83.9% Yes
f03190614 - Uncertain
f03173124 - Uncertain
f03228953 83.5% Yes
f03190616 - Uncertain
f03254063 59.48% No
f03156617 83.4% Yes
f03148356 84.2% Yes
f02815438 67.47% No

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/8]

  • Requested DC: 12 PiB
  • Currently Granted DC: 8.5 PiB

II. Dataset Completion: http://jcphysics.com/

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

  • Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created:

  • 10 vs 10

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID Retrieval Percentage Exceeds 75% Retrieval Rate?
f03201941 84.2% Yes
f03188443 - Uncertain
f03028412 76.1% Yes
f03188440 - Uncertain
f03254061 19.95% No
f03196399 - Uncertain
f03091738new 87.22% Yes
f010202 87.74% Yes
f03254125 - Uncertain
f02953066 18.20% No
f02837684 83.9% Yes
f02956073 83.9% Yes
f03190614 - Uncertain
f03173124 - Uncertain
f03228953 83.5% Yes
f03190616 - Uncertain
f03254063 0.17% No
f03080852new 49.9% No
f03080854 83.9% Yes
f03156617 83.4% Yes
f03148356 84.2% Yes
f03196401 - Uncertain
f03259560 - Uncertain

@pandacrypto
Copy link
Author

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/31]

  • Requested DC: 10 PiB
  • Currently Granted DC: 2 PiB

II. Dataset Completion:

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

  • Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 10 vs 6

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID Retrieval Percentage Exceeds 75% Retrieval Rate?
f03254061 86.56% Yes
f03254125 68.91% No
f03251828 88.50% Yes
f03254063 85.85% Yes
f03251827 87.89% Yes
f03259560 86.87% Yes

I. [https://github.com/pandacrypto/DSPA-Allocator/issues/6]

  • Requested DC: 10 PiB
  • Currently Granted DC: 8.5 PiB

II. Dataset Completion:

III. Do the SP lists provided by the client and those listed in the updated issues match the SP list used for trading?

  • Essentially consistent and updated in the GitHub process.

IV. Comparison of the number of copies declared by the client versus the actual number created: 5 vs 5

V. Please provide a list of SPs used for trading along with their retrieval rates:

SP ID Retrieval Percentage Exceeds 75% Retrieval Rate?
f03201941 84.2% Yes
f03188443 - Uncertain
f03028412 76.1% Yes
f03188440 - Uncertain
f03196399 - Uncertain
f03254061 69.07% No
f03091738 86.8% Yes
f03254125 67.4% No
f02953066 23.61% No
f02227496 34.54% No
f02837684 83.9% Yes
f02956073 83.9% Yes
f03190614 - Uncertain
f03173124 - Uncertain
f03228953 83.5% Yes
f03190616 - Uncertain
f03080854 83.9% Yes
f03080852 49.9% No
f03254063 59.48% No
f03196401 - Uncertain
f03259560 66.2% No

@pandacrypto
Copy link
Author

pandacrypto commented Jan 24, 2025

Allocation Summary

  1. Allocator's Notes:
    <1. Supports both legacy data clients and new data clients; 2. Facilitates cooperation between new and old data clients and SPs partners.>

  2. Did the allocator timely report any issues or discrepancies that occurred during the application processing?
    <Yes, an issue with bot malfunction was reported at Request for Assistance with DSPA-Allocator Recharge Process(Robot malfunction) fidlabs/allocator-tooling#104, but it has since been resolved.>

  3. What steps were taken to minimize unfairness or risky behavior during the allocation process?
    <1. For further KYC (Know Your Customer) review, requested company business license information or personal ID for identity verification; 2. Leveraged official bot reports to monitor compliance.>

  4. How do these distributions add value to the Filecoin ecosystem?
    <Generally, proactive inquiries are made: Was this dataset previously stored on Filecoin? If so, why is it being stored again? By understanding the data client's response, we aim to grasp the dataset's value to the entire Filecoin ecosystem.>

  5. Please confirm that for each allocation made to a client, you have maintained the standards set in the application and understood the Fil+ guidelines outlined in the application.
    ——————Yes

  6. Please confirm that you understand that by submitting this Github request, you will be subject to due diligence review, which may require you to return to this issue to provide updates.
    ——————Yes

@pandacrypto
Copy link
Author

pandacrypto commented Jan 24, 2025

@pandacrypto Any updates?

Apologies for the delay @filecoin-watchdog . Due to a high volume of data client application submissions and the adoption of a new template by the official side, our team has been manually organizing the information over the past few days. All information has now been submitted for your review. Please feel free to contact us if there are any issues or further details required.

@filecoin-watchdog
Copy link
Collaborator

@pandacrypto
MINDACEACADEMY #35

  • Proper KYB with bookkeeping.

  • The distribution, replication, and retrievability of the data appear to be healthy.

  • The SP indicated in the original form does not match the report. The client disclosed this in the comments; however, the application should be updated accordingly.

  • Insufficient information has been provided about the data preparation process. This lack of detail, particularly given the educational nature of the stored dataset, makes it difficult for community members to understand how they can benefit from the stored data.

NationalHuman Genome Research Instititue #14

  • The SP indicated in the original form does not match the data to which it is sealed. The client disclosed this in the comments; however, the application should be updated accordingly.

  • Data Set receiving DC in parallel: [DataCap Application] BitsAndBytes - human-pangenomics cryptowhizzard/Fil-A-2#1 Data indicates it could be part of a very large set, portions of which were already stored on the network. There is also a lack of clarity regarding the preparation process, which could provide valuable information to the community on how stored files can be used by other network users.

  • Mixed results in retrievability. Signs of VPN usage by some service providers, which should trigger verification according to allocator rules.

  • There are already 6 replicas, despite only 5 being declared.

  • The requested DataCap is excessive. For a dataset size of 580 TiB with 5 replicas, the client should request 3 PiB, not 5 PiB.

  • 7 out of 20 SPs have retrieval of 0%.

bharchitects #33

  • Insufficient information provided on the data preparation steps and a lack of clear description regarding the nature of the stored data. This omission limits the community's understanding of its utility and how end users can benefit from the dataset.

  • The distribution, replication, and retrievability of the data appear healthy. All SPs used were properly disclosed in the main application.

TheNational Oceanic and Atmospheric Administration #11

  • Data indicates it could be part of a very large set, portions of which were already stored on the network. There is a lack of clarity regarding the preparation process, which could provide information to the community on how stored files can be utilized by other users.

  • 38 declared SPs versus 21 used, with good retrievability, replication, and percentage distribution.

jcphysics #8

  • Insufficient information has been provided about the data preparation process. This lack of detail, particularly given the educational nature of the stored dataset, makes it difficult for community members to understand how they can benefit from the data.

  • New SPs were disclosed only in comments, making due diligence very difficult.

  • Acceptable retrievability; however, some SP data is unavailable.

  • The requested DataCap is excessive. For a dataset size of 700 TiB with 10 replicas, the client should request 7 PiB, not 12 PiB.

  • Sum of Unique Data is 1.19PiB instead of declared 700TiB.

HangzhouXinchuan Film and Television Culture Media Co., Ltd. #31

  • No data sample or evidence provided to explain and justify the requested data size.

  • SPs with whom the client is cooperating do not match those in the initial application. Some were disclosed in comments, while others were not mentioned at all.

  • Very good retrieval rate and excellent geo-diversification of SPs.

  • Sum of Unique Data is 356TiB instead of declared 980TiB.

ENCODEData Coordinating Center #6

  • “Has been stored before, but not very many times; meaningful data deserves to be stored over and over again.”

  • Referring to the user's statement: the dataset was stored dozens of times. This, combined with the lack of a clear description of what will be stored and how community users can identify parts of that data, makes the explanation insufficient.

  • Retrievability could be improved. Currently, 58.82% of storage providers have a retrieval success rate of less than 75%.

  • Likely VPN usage by some SPs (e.g., f03251828). While acceptable under allocator rules, this should trigger enhanced diligence processes.

  • 14 declared SPs in the original application versus 24 disclosed in the report. All SPs should be updated, and proper bookkeeping should be maintained.

  • The requested DataCap is excessive. For a dataset size of 1.1PiB with 5 replicas, the client should request 5.5-6 PiB, not 10 PiB.

  • Sum of Unique Data is 1.99PiB instead of declared 1.1PiB.

Overall Observations

  • Very good communication and attention to diligence.

  • Very good performance (retrievability) on most clients.

  • Improvements can be made in the following areas:

    1. Explaining the data preparation steps to enable the community to understand how files (in the case of open data) can be processed.

    2. Clearly identify datasets and justify their size and uniqueness.

    3. Ensuring SP verification and proper bookkeeping.

    4. Checking dataset size and comparing it with the report.

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 28, 2025
@pandacrypto
Copy link
Author

pandacrypto commented Feb 4, 2025

Thank you to @filecoin-watchdog for the diligent efforts in meticulously reviewing every client application and providing an exceptionally thorough summary. We also appreciate @filecoin-watchdog's recognition of the overall operation of DSPA Allocator, and we commit to improving on the suggested points during future reviews.

  1. Regarding data preparation and whether data has been stored on Filecoin before: We will enforce stricter requirements for clients to provide detailed steps of data preparation and descriptions of how their datasets contribute value to the entire Filecoin ecosystem, making the process of data preparation and storage more educational. Moreover, DSPA-Allocator will encourage storing new datasets on Filecoin unless the application provides compelling reasons for storing existing ones.

  2. Concerning clearly identifying datasets and proving their size and uniqueness: During our last Datacap refresh, there was adequate communication about dataset sizes (2nd Community Diligence Review of DSPA-Asia Allocator #170 (comment)). In the latest client applications, offline discussions were held with clients to ensure transactions are fully filled with data rather than partially as in previous applications. However, for verifying the size of the dataset from clients, we will follow official guidelines for improvements, such as requesting proof of dataset size or additional samples.

  3. Ensuring SP verification and proper bookkeeping: Clients are required not only to update SPs information timely in GitHub comments but also to reflect these updates accordingly in their original application submissions.

We have a question regarding the statement from @filecoin-watchdog that "Sum of Unique Data is 356TiB instead of declared 980TiB." Could you please clarify how this 356 TiB figure was determined? Understanding this would greatly assist us in enhancing our review capabilities.

Image

@filecoin-watchdog
Copy link
Collaborator

Regarding data preparation and whether data has been stored on Filecoin before:

Please remember that publicly available open data intended for community retrieval should also include an index as part of the process. This index should enable users to connect sealed data with the original dataset, allowing those who wish to use this backup for computing purposes to do so effectively. You can read more about it here: #125

We have a question regarding the statement from @filecoin-watchdog that "Sum of Unique Data is 356TiB instead of declared 980TiB." Could you please clarify how this 356 TiB figure was determined? Understanding this would greatly assist us in enhancing our review capabilities.

You can add values of Unique Data, which will give you the total amount of unique and sealed data. In this case, all the data is stored across 7 replicas, but the unique data is only approximately 356 TiB.

@pandacrypto
Copy link
Author

Regarding data preparation and whether data has been stored on Filecoin before:

Please remember that publicly available open data intended for community retrieval should also include an index as part of the process. This index should enable users to connect sealed data with the original dataset, allowing those who wish to use this backup for computing purposes to do so effectively. You can read more about it here: #125

We have a question regarding the statement from @filecoin-watchdog that "Sum of Unique Data is 356TiB instead of declared 980TiB." Could you please clarify how this 356 TiB figure was determined? Understanding this would greatly assist us in enhancing our review capabilities.

You can add values of Unique Data, which will give you the total amount of unique and sealed data. In this case, all the data is stored across 7 replicas, but the unique data is only approximately 356 TiB.

Understood, we have received your reply and will consider your advice.Thank you very much!@filecoin-watchdog @Kevin-FF-USA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

3 participants