Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v5 Notary Allocator Application: Storify #1070

Closed
shizhigu opened this issue Jan 17, 2024 · 7 comments
Closed

v5 Notary Allocator Application: Storify #1070

shizhigu opened this issue Jan 17, 2024 · 7 comments

Comments

@shizhigu
Copy link

v5 Notary Allocator Application

To apply to be an allocator, organizations will submit one application for each proposed pathway to DataCap. If you will be designing multiple specific pathways, you will need to submit multiple applications.

Please complete the following steps:

1. Fill out the information below and create a new GitHub Issue

  1. Notary Allocator Pathway Name (This can be your name, or the name of your pathway/program. For example "E-Fil+"): Storify Data Fortress

  2. Organization Name: Storify LLC.

  3. On-chain address for Allocator (Provide a NEW unique address. During ratification, you will need to initialize this address on-chain): f13a6ov3nrxllvyqkduwczpyashgu3luivodwyvgq

  4. Country of Operation (Where your organization is legally based): United States of America

  5. Region of Operation (What region will you serve?): North America

  6. Type of Allocator, diligence process: (Automated/programmatic, Market-based, or Manual (human-in-the-loop at some phase): Manual

  7. DataCap requested for allocator for 12 months of activity (This should be an estimate of overall expected activity. Estimate the total amount of DataCap you will be distributing to clients in 12 months, in TiB or PiB): 100 PiB

2. Access allocator application (download to save answers)

Click link below to access a Google doc version of the allocator application that can be used to save your answers if you are not prepared to fully submit the application in Step 3. https://docs.google.com/document/d/1-Ze8bo7ZlIJe8qX0YSFNPTka4CMprqoNB1D6V7WJJjo/copy

3. Submit allocation application

Clink link below to access full allocator questionnaire and officially submit your answers:
https://airtable.com/appvyE0VHcgpAkt4Z/shrQxaAIsD693e1ns

Note: Sections of your responses WILL BE posted back into the GitHub issue tracking your application.
The final section (Additional Disclosures) will NOT be posted to GitHub, and will be maintained by the Filecoin Foundation.
Application information for notaries not accepted and ratified in this round will be deleted.

@ghost
Copy link

ghost commented Jan 21, 2024

Basic Information

1. Notary Allocator Pathway Name:
Storify Data Fortress

2. Organization:
Storify LLC.

3. On Chain Address for Allocator:
f13a6ov3nrxllvyqkduwczpyashgu3luivodwyvgq

4. Country of Operation:
United States of America

5. Region(s) of operation:
North America

6. Type of Allocator:
Manual

7. DataCap requested for allocator for 12 months of activity:
100PiB

8. Is your allocator providing a unique, new, or diverse pathway to DataCap? How does this allocator differentiate itself from other applicants, new or existing?:
We are going to adopt a manual way to allocate dataCaps, just like LDN process but with tighter & scientific rules to reduce abuse or disputes. The main differences lie in the following aspects:

  1. All the clients are required to go through KYC to ensure the authentication of the applicant, including but not limited to utility bills(water, electricity, and cable) or business licenses, etc.,
  2. Increase the amount of sample data. On the basis of data ownership, compliance, and data size proof, every client has to provide 5% of the total requested dataCap as sample data.
  3. Increase the number of multi-sign allocators or auto-assign random allocators to the application. In this way, it will reduce self-dealing and colluding.
  4. Every client is allowed to submit ONE application at a time.
  5. Set a different allocation range to specific clients based on clear standards.
  6. Set punishment & rewards system for allocators, SPs, and clients.

9. As a member in the Filecoin Community, I acknowledge that I must adhere to the Community Code of Conduct, as well other End User License Agreements for accessing various tools and services, such as GitHub and Slack.:
Acknowledge

Client Diligence

10. Who are your target clients?:
Small-scale developers or data owners, Enterprise Data Clients, Individuals learning about Filecoin, Other (specified above)

11. Describe in as much detail as possible how you will perform due diligence on clients. If you are proposing an automated pathway, what diligence mechanism will you use to determine client eligibility?:
We plan to apply standards based on different applicants. The specific rules are described below:
If you are an old client with a good history. Provide the previous link only, the allocator takes a look of the application and puts the address on the greenlist. No due diligence is required.
If you are an old client with a bad history. An improvement scheme is needed, and the allocator will evaluate the feasibility. If the plan is reasonable, a small amount is granted for a test drive.
If you are a new client as an individual. The Github age must be 6 months old or longer. 10TiB is about to be granted as a start.
If you are a new client as an organization. KYC is a must. Then a general due diligence process must be implemented. The standard due diligence process includes: client authentication, data ownership, and SP plan. It will be elaborated in the relevant section below.

12. Please specify how many questions you’ll ask, and provide a brief overview of the questions.:
I am going to ask 15 questions. Except for the questions in the standard GitHub template. I am going to ask rounds of questions for additional information, in a bid to prevent any fraud jeopardizing the community. The questions can not be exhaustive including the following aspects:
Client identity information: organization name, business address, business license or KYB screenshot.
Data validity: data ownership, data size, data type, and content, sample data
SP qualifications: how to select SPs
SP distribution plan: SP list, SP location, SP organization
SP management plan: how to ensure SPs selected follow the rules as promised. Otherwise, any recovery plan?
For specific questions I am going to ask, please refer to the link here.
https://docs.google.com/spreadsheets/d/101GJe6tJes29-yJohF9coO3SDxzEJ54H/edit?usp=sharing&ouid=102234962091100491283&rtpof=true&sd=true

13. Will you use a 3rd-party Know your client (KYC) service?:
For organizations, clients must complete KYC offered by 3rd parties like Toggle or Qichacha. They need to submit the results screenshot or link in GitHub as an identity proof.
For individuals, people can choose to upload an ID card or driver's license if they don’t mind. If they think it’s too sensitive, they can share social accounts like Twitter, Ticktok, etc., the more followers, the better. But legal identity proof is preferred, the amount of dataCap will be greater than that of clients who do not share the legal proofs.
Note: for any organization or individual who steals someone else’s identity, it’s strictly prohibited. All clients will agree to the relevant terms about the Disclaimer before submitting the application. Otherwise, she/he will bear all legal responsibilities yourself.

14. Can any client apply to your pathway, or will you be closed to only your own internal clients? (eg: bizdev or self-referral):
Anyone who meets the requirements for applying dataCap is welcomed. In the future, incentivized strategies may be introduced for bringing more new clients to store dataCap on Filecoin. If it’s permitted, some marketing events could be held to educate friends in relevant sectors to store dataCap to Filecoin from other chain or traditional web2 providers.

15. How do you plan to track the rate at which DataCap is being distributed to your clients?:
We are going to use open-source tooling mainly SA bot to track the usage rate of DC at client side. Besides, allocator signing info details collected from Notary Registry or Filecoin chain can be analyzed and displayed in a new UI in real-time in the future. This kind of information should be shared and accessed by all the people, so all allocators are under supervision of the public. That increases transparency and trust.

Data Diligence

16. As an operating entity in the Filecoin Community, you are required to follow all local & regional regulations relating to any data, digital and otherwise. This may include PII and data deletion requirements, as well as the storing, transmit:
Acknowledge

17. What type(s) of data would be applicable for your pathway?:
Public Open Dataset (Research/Non-Profit), Public Open Commercial/Enterprise

18. How will you verify a client’s data ownership? Will you use 3rd-party KYB (know your business) service to verify enterprise clients?:
First, we are going to collect basic info from GitHub application. Then KYB&KYC service like Diro or Toggle is recommended, the client is allowed to choose one from those 3rd party providers. Once the identity of the client is verified. We are going to pay special attention to its business scope. And then confirm whether the relevant data content&type described is generated from business operations or somewhere else.
Second, all the clients are required to agree to some privacy policy which reads the client will bear all legal responsibilities for any misappropriation of other people’s data, the allocators will not be jointly and severally liable.
Besides, we will conduct a retrieval test for the dataset after the first tranche, to further ensure the dataset stored is as described in the application.

19. How will you ensure the data meets local & regional legal requirements?:
We have rich experience in the blockchain sector and always follow the updates about compliance. We are always doing business strictly abiding by the rules and regulations issued by the government. All the staff are well educated and have legal awareness above average people. Furthermore, we have dedicated legal & compliance staff for consulting service and dispute resolution. We will strictly follow the guidelines from Fil+ governance team as always. We will not take action on any vague activities that may breach laws and regulations before consulting our legal staff. Special attention will be paid to sensitive dataset including but not limited to government, intellectual property, private info & img etc; according to different categories, our legal & compliance colleague will work out special terms and conditions for compliance.

20. What types of data preparation will you support or require?:
Clients can use singularity to prepare datasets. We can offer guidelines and tutorials to help them to get started. We require thorough data preparation, including cleaning, standardization, de-identification, anonymization, as well as data enrichment and integration. We support data segmentation processing and provide automated data cleaning tools, data de-identification solutions, and data integration platforms to ensure data quality, security, and compliance with regulatory requirements.

21. What tools or methodology will you use to sample and verify the data aligns with your pathway?:
First, we will apply the toolings of Fil+ governance team. The report from CID bot and retrieval bot will verify whether or not the data stored is aligned with the dataset claimed in application. Manual retrieval is necessary after first tranche. If the data is not as described, client can stop the deal immediately to reduce loss. The SPs should be punished as agreed between client and SP. The SP will be blacklisted since then.

Data Distribution

22. How many replicas will you require to meet programmatic requirements for distribution?:
5+

23. What geographic or regional distribution will you require?:
3+

24. How many Storage Provider owner/operators will you require to meet programmatic requirements for distribution?:
5+

25. Do you require equal percentage distribution for your clients to their chosen SPs? Will you require preliminary SP distribution plans from the client before allocating any DataCap?:
No, we are going to distribute dataCap to SPs based on its history including reputation, the amount of dataset sealed, location, retrieval rate. The overall rule is : the higher the credit score, the amount of dataCap it will be granted. But a single SP will not take over 20% of the whole deal. Before allocating dataCap to SPs, we plan to use a template to collect necessary information about SP. Based on the results, rate the true capability of dataset sealing. SP information template is attached here. SP information includes: SP ID, location, organization, previous history, retrieval success rate etc. According to this information provided, the client will rate SP‘s capability and respect the will of SP, and allocate a reasonable percentage of deal to a specific SP. Grade A - 20% B - 10% C - 5% https://docs.google.com/spreadsheets/d/1ht-iWcxzThR9W3iRYrynX7roTO99aYuR/edit?usp=sharing&ouid=102234962091100491283&rtpof=true&sd=true

26. What tooling will you use to verify client deal-making distribution?:
We are going to use https://datacapstats.io , retrieval bot and CID checker to track everything about allocation. With CID checker and retrieval bot, the statistics of allocation is computed. A/C bot will set the bar based on the collected information. If the client meets all standards, A/C bot will automatically allocate subsequent dataCap to the client. If not, A/C bot will send a warning explaining why the subsequent dataCap is not granted by A/C bot.

27. How will clients meet SP distribution requirements?:
Clients are required to choose qualified SPs to work with. Clients have the right to choose their own SPs if they want. Otherwise, if their SP distribution plan is not as expected or lack of relevant resources. We will share a list of reputable SPs or help them to identify the correct SPs. A high-quality SP must be with a good history.

28. As an allocator, do you support clients that engage in deal-making with SPs utilizing a VPN?:
In theory, VPN use should be banned because a few SPs use VPN to fake their location, which jeopardizes the core principles of diversity and decentralization. But the truth is blockchain activities are on the sanction list of several countries. Like China, people can not access to foreign websites, let alone blockchain activities. We may use tooling like Tracert to locate the true location to avoid cheating behaviors to ensure VPN is used correctly.

DataCap Allocation Strategy

29. Will you use standardized DataCap allocations to clients?:
Yes, standardized

30. Allocation Tranche Schedule to clients::
For new clients or old client with a history
• First: 25TiB
• Second:50TiB
• Third:100TiB
• Fourth: 200TiB
• Max per client overall:500TiB
For old clients with a good reputation and perfect history:
• First: 5% of total requested dataCap
• Second: 10% of total requested dataCap
• Third: 35% of total requested dataCap
• Fourth: 50% of total requested dataCap
• Max per client overall: 5PiB

31. Will you use programmatic or software based allocations?:
No, manually calculated & determined

32. What tooling will you use to construct messages and send allocations to clients?:
Notary registry is adopted to send messages and allocations to clients.

33. Describe the process for granting additional DataCap to previously verified clients.:
SA bot will be used together with A/C bot, when the remaining dataCap from the previous tranche is less than 10%, SA bot will trigger the request for the next tranche. If the previous tranche meets requirements, then A/C bot will allocate the dataCap directly without the need of manual signing.

34. Describe in as much detail as possible the tools used for: • client discoverability & applications • due diligence & investigation • bookkeeping • on-chain message construction • client deal-making behavior • tracking overall allocator health • disput:
client discoverability & applications: Clients apply for dataCap by GitHub like before. 
due diligence & investigation: the client will submit basic information in GitHub application template. Using one 3rd party KYC provider, verify the identity of individuals and organizations. Plus, we will prepare a template of questions to further verify the compliance of the client and distribution plan. 
Bookkeeping: https://datacapstats.io/notaries this page is advised to be improved and accessed to the public. For now, the relevant information is not comprehensive and updated real-time. More fields should be added, like the reason why signing the application. Signing history should be shared among community members. The application number and link should be added. If anyone has disagreement with signing action, a dispute proposal should be allowed to be submitted. Before the website is optimized and completed, an online Google form is recommended as an expedient measure at first.
on-chain message construction: Ledger is used to auth and construct messages.
 client deal-making behavior: SA bot, CID checker, retrieval bot, and A/C bot will be adopted together. SA bot should warn the client to use the dataCap in a reasonable schedule to reduce dataCap abuse and waste. CID checker and retrieval bot should be responsible for reporting the overall performance of storage. Based on the results from bots, A/C bot should take actions against metrics. Messages shall be sent to warn the client to make adjustments in time.
tracking overall allocator health: punishment and rewards strategy should be created for supervising the allocator's actions. For example: If an allocator follows the rules and no dispute is submitted against her/him, he will be awarded more dataCap like 5 PiB or higher; if not, he will be punished for his non-compliance for forfeiting the granted dataCap according different levels of non-compliance.
dispute discussion & resolution: decentralized voting tooling should be used for the resolution of any dispute and disagreement. If someone has a disagreement or something abnormal to report, he should submit a proposal about this issue, including issue description, proof, poll start & end time, actions to be taken, etc., all community members have the right to  vote and finalize the dispute. In this way, all people can participate and transparency and fairness are increased.
community updates & comms: Slack is also used as a primary way to communicate. Because most of the people are using Slack for now. But the responsiveness is not enough. Based on experience in the blockchain space, we are going to create a Discord server for Filecoin for better communication. Discord is more organized and easy to find useful information.

Tools and Bookkeeping

35. Will you use open-source tooling from the Fil+ team?:
Apart from the tools mentioned above, Ledger will be used to sign the application. We choose the manual pathway to allocate dataCap. But we will seek intelligent ways to automize the way of allocating dataCap as much as possible. Including 3rd party KYC provider, random signing allocators, optimized allocator tracking UI in https://datacapstats.io/, auto-allocate subsequent dataCap with A/C bot, also with Decentralized voting tool to handle the disputes. Those measures are all we can think of right now. We will improve the process while implementing.

36. Where will you keep your records for bookkeeping? How will you maintain transparency in your allocation decisions?:
I will use an online google form for our own bookkeeping. The form will include the following items: application link, application number, address, tranche number(avoid signing too often), granted dataCap amount, the reason why signing this application, note(unexpected circumstances)。
This form will open access to everyone. If someone questions why signing an application, she/he can take a look of the decision-making process.
If more information is needed, we will leave our Slack No or Discord handle for contacts.

Risk Mitigation, Auditing, Compliance

37. Describe your proposed compliance check mechanisms for your own clients.:
First, we will record our allocation decisions in google form tracker. This will help ourselves to track the allocation clearly. Before signing the next allocation, we will check in the googleform to avoid any repeated signing or bad history etc., all decisions will be made based on sufficient proof.
Regular check-ins: every two weeks, we will do a check-in for all applications we signed this month. Run command to view the CID report and retrieval report to confirm any non-compliance behavior or not. If the number of SPs, SP location, replication rate or retrieval rate are not as expected, we will leave a comment and suspend signing the next tranche.

38. Describe your process for handling disputes. Highlight response times, transparency, and accountability mechanisms.:
Currently, the dispute resolution tracker is not scientific and the process is not accepted by all the community members. Disputes are not settled well because everyone holds a different opinion.
To ensure fairness and transparency, the decentralized voting tool should be used for dispute resolution. Everyone has a right to vote if you are interested.
Based on guidelines and principles from allocation strategy, if anyone has a dispute against the other party, he can submit a poll including application number, dispute description, proof, solutions. No matter what kind of dispute, this method will settle disputes quickly and convince every community member.
In this way, voting reduces collusion and increases transparency. That makes everybody happy.

39. Detail how you will announce updates to tooling, pathway guidelines, parameters, and process alterations.:
It depends on the severity of the changes. If those are slight changes like tooling updates, parameters, schedule modifications, software updates, we will announce those details in a regular allocator conference and also share it in Slack channel. If those are critical changes, including but limited to allocation process update, adding staking mechanism or automizing the due diligence process etc., we will draft a proposal on github and collect feedback from communities. Set up a period for the proposal, then evaluate the feasibility. In the first half of 2024, we will make announcements on Slack. But we will grow Discord community as well. We will shift from Slack to Discord community gradually.

40. How long will you allow the community to provide feedback before implementing changes?:
Before implementing, we plan to save a month for feedback. The duration will ensure the community can discuss and provide feedback sufficiently.
The basic rules should be in pinned messages in relevant channels. The governance team or volunteer ambassadors will be assigned to specific channels as moderators. Bots will be introduced to manage the chats in real time and reduce the extreme speech.
The frequently asked questions will be prepared as a list and shared in the moderators channel. And the list will be updated on a regular basis like a week or bi-week. Filecoin community values different voices from different backgrounds.

41. Regarding security, how will you structure and secure the on-chain notary address? If you will utilize a multisig, how will it be structured? Who will have administrative & signatory rights?:
The address will be generated by ledger. Signing is used but we consider to increase the number of allocators to 3~4 for signing one tranche. If possible, auto-assigning allocators will be effective to reduce fraud and colluding.

42. Will you deploy smart contracts for program or policy procedures? If so, how will you track and fund them?:
Not for now. But if we work out a feasible auto allocation plan later, no human intervened, smart contracts will be a perfect way to solve all the problems.

Monetization

43. Outline your monetization models for the services you provide as a notary allocator pathway.:
Apart from atomizing the allocating process as much as possible. Preparing a punishment & rewards system is necessary.
We don’t have a detailed plan or formula to calculate the amount of collateral. Consensus should be reached before implementing. It would be great if staking and slashing collateral strategy is prepared for allocators, clients and SPs. It’s a great way to reduce fraud once the cheating cost is increased.

44. Describe your organization's structure, such as the legal entity and other business & market ventures.:
Storify was founded in California, USA. Its entity number is 202251417769. It is mainly engaged in the research and development of distributed storage software and is committed to improving storage efficiency and retrieval, as well as data compression, indexing and query optimization technologies. In addition, improving security of distributed storage software is the other focus where we put focus on. Shizhi Gu has been engaged in distributed software research and development and hardware architecture for many years, and has accumulated rich experience in the industry.

45. Where will accounting for fees be maintained?:
Staking & slashing mechanism is introduced, then the accounting for fees can be maintained on chain. It’s clear and transparent.

Past Experience, Affiliations, Reputation

46. If you've received DataCap allocation privileges before, please link to prior notary applications.:
N/A

47. How are you connected to the Filecoin ecosystem? Describe your (or your organization's) Filecoin relationships, investments, or ownership.:
We participated in Filecoin ecosystem since 2020, we ran a few small nodes at first. Then we participated in slingshot program. And we stored 20PiB dataset on Filecoin, the SP IDs are f0870558 f01106668 f01315096 f01518369 f01889668 f02131801 f02131855 f02131881. We are very optimistic about Filecoin, and will keep investing and want to grow and expand with Filecoin system.

48. How are you estimating your client demand and pathway usage? Do you have existing clients and an onboarding funnel?:
As an experienced SP, we have a lot of friends and acquaintances who have strong demand for data storage. The dataset is generated on a weekly basis. Apart from this, we are going to attract more web2 giants joining Filecoin for storage.

@kevzak
Copy link
Collaborator

kevzak commented Mar 15, 2024

Datacap Request for Allocator

Address

f23cfhgj4q2c475knavu4ulmaiv5wcolgf7vyhola

Datacap Allocated

5PiB

@filplus-bot
Copy link
Collaborator

The request has been signed by a new Root Key Holder

Message sent to Filecoin Network

bafy2bzaced62itydrqz2zbjqkjgpnt4kdsp7nr45tymnqpzd62xptya33qiwa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced62itydrqz2zbjqkjgpnt4kdsp7nr45tymnqpzd62xptya33qiwa

@galen-mcandrew
Copy link
Collaborator

Datacap Request for Allocator

Address

f23cfhgj4q2c475knavu4ulmaiv5wcolgf7vyhola

Datacap Allocated

10PiB

@shizhigu
Copy link
Author

@galen-mcandrew Thank you for your patience in reviewing and giving pertinent comments, we will improve the operation of allocator based on the suggestions, and look forward to seeing our progress next time, thanks again.

@filplus-bot
Copy link
Collaborator

The request has been signed by a new Root Key Holder

Message sent to Filecoin Network

bafy2bzacec6tsfuys6utk66mfequointgmhfe6aqb3xc5vvojvku4pbyyqory

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec6tsfuys6utk66mfequointgmhfe6aqb3xc5vvojvku4pbyyqory

@shizhigu
Copy link
Author

Respected Official Governance Team,

I would like to submit an update regarding our allocator application for Fil+ to ensure greater compliance with the program's regulations.

In January 2024, we submitted our first allocator application. At that time, we allocated data capacity to clients in batches according to the following plan:

The first batch: 5% of the total requested data cap.
The second batch: 10% of the total requested data cap.
The third batch: 35% of the total requested data cap.
The fourth batch: 50% of the total requested data cap.
Maximum allocation per client: 5 PiB

We have since found that the allocations for the third and fourth rounds as well as the maximum limit of 5 PiB per client seem too high. This makes it more challenging to effectively monitor the DC encapsulation process. Observing the operations of most allocators within the community, we've noticed that the highest allocation per round is generally capped at 2 PiB. In light of this, and in pursuit of a more compliant and cautious approach, we now request to adjust our allocation plan as follows:

The first batch: 5% of the total requested data cap.
The second batch: 10% of the total requested data cap.
The third batch: 20% of the total requested data cap.
The fourth batch: 40% of the total requested data cap.
The fifth batch: 80% of the total requested data cap.
Maximum allocation per client per round: 2 PiB

This adjustment aims to provide better control over the allocation process while ensuring adherence to the principles of Fil+. We appreciate your understanding and support in this matter.

Best regards,

Storify Team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants