Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strongest possible reputation system #31

Open
synctext opened this issue Mar 15, 2013 · 44 comments
Open

strongest possible reputation system #31

synctext opened this issue Mar 15, 2013 · 44 comments
Assignees

Comments

@synctext
Copy link
Member

Idea: strongest possible reputation system by combining all existing science.

Problem: our scientists Rahim, Adele, Nitin and Dimitra did not combine their work.
Reason: optimal publication strategy is one clean and simple idea
There are no joint publications, no re-usage of algorithms and little collaboration.

We evaluated in isolation multiple enhancements to make accounting of contributions stronger and incentive compatible:

  • max-flow calculation
  • delete old information to reduce storage cost
  • depletion of often used paths, provides sybil resilience with additional bookkeeping
  • introduce negative trust, these negative edges increase sybil attack resilience
  • friendship as an edge property to strengthen the trust overlay against sybils
  • age as an peer property to strengthen against sybil attack
  • biased random walk, taking into account peer and edge properties
  • double signed signature (proof-of-work)
  • append-only signatures, misreporting attack degrades it to weaker white-washing attack
  • running total account of consumed and contributed resources, degrades misreporting attack
  • transfer encrypted information, only provide decode key after proof-of-work is signed
  • incremental signatures, reduce risk-taking by demanding signatures during transfer
  • use n-hop Max-Path to reduce compulational cost
  • reason from betweenness centrality node
  • use full gossip

Reputation system proposals often assume either:

  1. Tamper-proof and scalable storage of information
  2. Scalable storage of information

Option number 1 is beyond the state-of-the-art, without assuming an honest central server with infinite capacity. Option number 2 is difficult to achieve. For scalability seasons, the storage layer will always be imperfect and will never offer both instant, accurate and complete recall of information. Thus we need to be able to tolerate limited accuracy or coverage in addition to the usual misreporting, lying and sybil creation.

A reputations system needs to determine (assuming background gossip of trust vectors):
 1 Who to talk to
 2 how to reach them
 3 What to say
 4 How much to trust

Existing work is focused on no.4 mostly, while the others have proven to be
much harder to solve. Nobody identified them as a problems and no
solutions exist.
Trust calculation is just 50 lines of code, while no. 1-3 is beyond
the state-of-the-art in epidemic protocols.

Missing topics:
3. "what to say", Efficient synchronisation of BarterCast (subgraph) trust vectors

Existing material:
 1. Who to talk to
    Biased random walk
 2. how to reach them
    Churn resiliance, state preservation, live overlay, NAT traversal, etc
 4. How much to trust
   Meulpolder 2008: Max-flow
   Rahim: various sybil-limiters
   Adele: rewards also effort
   Dimitra: betweenness centrality
   Nitin: sybil-limiters

@ghost ghost assigned synctext Mar 15, 2013
@synctext
Copy link
Member Author

synctext commented Apr 3, 2013

Churn and NAT traversal are critical concepts. All peers keep open connections with several neighbors to fix this. We need an algorithm to traverse such live edges.
Bartercast_edges_online-offline

NielsZeilemaker added a commit that referenced this issue May 13, 2013
Loggers were disabled due to unittests, added disable_existing_loggers =
@synctext synctext added this to the V9.0: Shadow Internet milestone Nov 3, 2014
@synctext
Copy link
Member Author

synctext commented Jan 25, 2016

This will be moved earlier in the roadmap. Multi-chain will evolve with numerous small steps into this envisioned perfect system.

  • append-only datastructure
  • payout to seeders
  • isolated Gumby experiment with various peers
  • visualization of the multi-chain graph
  • crawler for dataset collection
  • stress experiment generating a million multi-chain records.
  • deploy inside Tribler
  • payout to hidden seeders and two intermediary proxies (1 private GB : 5 GB of network load)
  • ratio tracking inside GUI
  • keep-alive to top-N barter partners
  • random walk using edge traversal
  • peers start collecting barter records
  • hidden services biased towards reliable bandwidth providers
  • ratio enforcement on proxy requests and hidden seeding

If we go beyond one million users it is possible the database becomes a bottleneck. We will first analyse, modify and tune the algorithms. If this has proven to be insufficient we evaluate libraries such as: overkill approach for billions of edges or Python graph database (inactive)

@synctext
Copy link
Member Author

branching concept
Two main issues with current multi-chain design is the exclusive nature of signatures and the need for a blocksize. We want to get rid of 1) bloat and 2) blocking.

First is the simple addition of compression. Consider the following use case. Two peers exchange 1 GByte of data. Multichain gets very bloated if each 25MByte requires additional signatures and another sequence number in a new record. We can compress entries by allowing two participants to sign the same record with the same sequence number with the same participants, but with a higher amount.
Blocksize matters less when overwriting is allowed, as previous records are replaced. This sadly removes elegance and simplicity, as signing information with the same sequence number no longer constitutes lying.

Non-blocking.
Sequence numbers are no longer unique in this design modification. Each peer keeps a sequence number, but for concurrent events it is re-used. Instead of calling it a double spending attack, we call it "branch racing". Only re-connected branches are valid. We still want to preserve the single-chain per peer design principle.
20160127_110952

Show above are thee peers A,B, and C with their Multi-chain records. Their multichain is depicted starting at sequence numbers 21, 3, and 8 respectively. For peers A and B two records are replace with a final 300 / 0 units exchanged, final record between A and C is 64 / 8 units. The key point is that the final sequence number for the B-C record is 5, merging it back into a single main chain.

Requirement for attack resilience: merging back does not require anybodies cooperation. So use a weld-node: point where total up/ down is updated and Merkle hashe of: main branch and side branch is taken. Only signed by owner.

@pimotte
Copy link
Contributor

pimotte commented Jan 27, 2016

In my opinion, we should avoid branching at all costs.

The problem we are solving with branching is two-fold: First the blocking issues on timeouts, second the infinite growth of the datastructure.

The second problem can be easily resolved by having some sort of secondary archive multichain. In this multichain, you are allowed to archive a transaction older than X period of time. This transaction will be your new genesis block and others are not allowed to ask you for earlier transactions (so you can delete them).

The first problem can be resolved by scaling the moment of signature request by the bandwith available. Incoming signature requests will never cause a problem for you, only for the other party, if you currently have an unanswered signature request outstanding. Hence, a client should only ask for a signature from someone else, if the probability of receiving another request before receiving a response is sufficiently low. (This can be estimated using simple probability/queuing theory).

@pimveldhuisen
Copy link
Member

On archiving:
Does this not make it possible to lie about your history before time X? Or do we intend to base reputation only on the period from now to X? This might be viable if X is sufficiently large, say a few years, but a long history might be useful for Sybil resilience.

On request timing:
A very elegant idea, but I think it is not guaranteed to work. What if there never comes a time when the probability of receiving another request before receiving a response is sufficiently low ?
If a client interacts with N peers at the same time, and these peers are replaced by new peers at a rate of n per minute, the client will need to sign at least n blocks per minute, even if it never signs while a transfer is still ongoing. If the signing round trip takes t minutes, n is limited by 1/t. Let's say a round trip takes two seconds, then n is 30 new peers per minute. There is a sort of load balancing by making busy nodes do the relatively cheap responding and the quiet nodes the relatively expensive requesting, but this n peers per minute limit is still there on average. Question: is this a viable limit?

@pimveldhuisen
Copy link
Member

Alternative concept: half-signed only, or optimistic requesting

Another way of going around the blocking problem is to base the chain on only half signed blocks.

  1. You note down your interaction in a block and sign it.
  2. You can then base your next block on the hash of only the block with your signature. This means there is no blocking.
  3. You request your peer to sign the same interaction in his chain. For him too, there is no blocking.
  4. The peers sends his block back to you, giving you irrefutable evidence of the interaction.

The peer can refuse to create the corresponding block on his chain, but this is no different from him refusing to sign your block. You can add fake blocks to your own chain, but without evidence they are not valid.

@synctext
Copy link
Member Author

synctext commented Feb 4, 2016

ToDo: merge major/minor seq idea, half-signed+revert.

@Captain-Coder
Copy link

In a session with @synctext, we discussed the half-signed only and branching options. He was especially concerned about having the vertical compression (updating transactions) and the difficulty with the reuse of sequence numbers. This could be solved by introducing a minor sequence number.

img_20160204_154521

(Alternatively you could accept a gap in the sequence numbering, as long as it is not decreasing. It's the hashes that provide the real linkage anyway. Same result, you never use a sequence number twice.)
But what do you do if one of the partners suddenly goes offline (or refuses to sign your next block)? Lets say C refuses to sign the last block in this example. @synctext noted that it would be possible to have A update its last block to refer to the last block of the B-C interaction. Resulting in this graph:

img_20160204_155238

This saves you from having to forgo all the credit from interaction B-C. Note that A will only update the block if it is still the head of the chain of A for obvious reasons. If A is not able to cooperate then you have to accept that one of the branches cannot continue. Probably the one with the smallest credit.

I'm thinking that nodes will have to be fair and open about all this parallel stuff and allow that "unmergable" transactions and any replaced/overriden blocks be harvested by a crawler. Because what if C secretly decides to use 3.1 or 4.0? It would look bad if B denied any knowledge of this block.

Also this would still combine with PimV.'s half-signed as default proposal.

@synctext
Copy link
Member Author

synctext commented Feb 4, 2016

design dream 1: if you do not follow the protocol, you never get any of the benefits.
design dream 2: protocol violations by design result in an irrefutable "proof-of-misbehavior".
design dream 3: double spending attacks are easily detected afterwards (prevention==costly).

branching: signing a certificate with the same sequence number (major+minor) constitutes a double spending attack, instant banning. An elegant and easy mechanism. This is the key reason for sequence numbers, they become a coin that can only be spend once or represent 1 truth. Append-only ensures any record freezes past behavior. A single record represents an exact position in the chain, impossible without sequence numbers.

@pimotte
Copy link
Contributor

pimotte commented Feb 8, 2016

The whole problem with having any branching option is that the "design dream 2" mentioned above is lost. The fact that a client has multiple branches means that they cannot be forced to rebase or merge it back into their main branch. This means that there exists a "sidebranch hiding attack", in which a client cannot be forced to divulge a side-branch existing. To make up for this, there would need to be some mechanism that forces user to merge or rebase branches. However, this means that proofs of misbehavior get much more convoluted.

My proposal for the compression is to be fairly lax with sending signing requests, scaling blocks with both the number of interactions with that person and the bandwith of the originator.

If this during the experimental phase turns out to yield too much growth still, or too many unsigned/dropped transactions, because of the block size, I would be in favour of exploring a secondary datastructure where we consolidate some of the information. But I would like to see if the compression is an actual problem before moving towards those kinds of solutions.

@pimveldhuisen
Copy link
Member

For the alpha version of multichain:

  • Do not use branching: the costs seem to outweigh the benefits
  • Non-blocking multichain: base your chain on the hash of only your fields. Collect the other half as evidence
  • Sign a block for the remaining amount on the closure of the circuit

@pimveldhuisen
Copy link
Member

Another mayor flaw that needs fixing before alpha:

  • Currently when a peer receives a block it will sign it, but not subtract the included amounts from the pending totals. Combined with the sign anything policy, this means every MB will be signed twice.

@synctext
Copy link
Member Author

future approach:

  • split mechanism and policy
  • tunnel comunity gives stats to a TrustModule
  • Multichain is a mechanism without a scheduler
  • TrustModule decides to request a signature

@synctext
Copy link
Member Author

As of today we have a design freeze of multi-chain. It is far from feature complete, merely a dry run of a concrete design. Accurate accounting of the totals is still out-of-scope. We will create a stable implementation of this design and include it in Tribler.

@pimveldhuisen pimveldhuisen mentioned this issue Feb 29, 2016
3 tasks
@Captain-Coder
Copy link

I was having a good brainstorming session about the "parity for contributions" problem where the number of relays is not fixed. Here is what I came up with:

Assume a seeder S, who has a proxy infront of him and a rendevous proxy (Ps and Ps'). The downloader (D) only has a rendevous proxy (Pd') to hide behind. In the naive approach of payout on this setup, the seeder asks for his true up and down. Each proxy then tacks on its total and thus the total download cost gets billed to the downloader. There are five downsides to this scheme:

  1. Downloader gets billed for the anonymity of the seeder. Since S can control the number of proxies to use on his side (Ps and Ps' in this case). The D might be confronted with an unexpectedly large multichain bill. D might have assumed S to only use Ps' and not have Ps (or even more of them) in between as well.
  2. The payment risk increases down the chain of proxies. If D decides not to pay the bill because it is abnormally large, Pd' has to absorb all the risk. This can seriously effect the balance of Pd'. Given that this problem exists, any proxy along the chain can decide not to sign for fear of having to absorb the risk. By not signing, a single other node in the network will not work with you anymore, but given the abundance of other nodes, this might be a worthwhile tradeoff against running a large risk. This fear along the proxy chain is the opposite of the trust we want.
  3. S is exposed as a seeder to Ps (but only after the fact), since S will ask for up&down very close what Ps has relayed. This is contrary to the assumptions of the tunnel community, namely that any proxy cannot discriminate between a circuit extension and a circuit initiation. On top of that any node along the chain can make a fair guess as to the length of the circuit leading up to it.
  4. Keep alive. All the peers (S, Ps, Ps', Pd' and D) will have to remain responsive to requests in order for the multichain request to propagate all the way to D to be billed. If a peer disconnects (Ps') the whole chain is broken. Per design Ps only knows S and Ps', so there is no way to route around the defunct Ps' to Pd'.
  5. Seeder fraud incentive. If S sends a multichain request for double (or tripple) his true upload amount, Ps can only assume that the previous node is a proxy who tacked on its up&down. Thus it is undetectable if the seeder is getting paid double or not.

Some initial thoughts on remedies might go as follows:

  • To solve problem 3, S might ask for a random odd multiple of its true upload to disguise that it is a seeder. This is directly contrary to 5, and again D might be confronted with an unexpectedly large bill.
  • Wait for the whole chain, to solve problem 2. For example Ps waits on Ps' before signing the request of S. However this exacerbates problem 4. And in case of none payment of D, all the proxies along the chain might not sign and thus never cooperate again. The only solution for this is to have a proxy absorb the risk, leading us back to problem 2 again.

To me it is obvious that propagating the true cost of downloading along the chain is not feasable nor elegant.

An alternative solution outline that I have thought about is this:
(Note: I haven't fully thought out the implications to identity management for the tunnel community)

The S and D can sign directly to each other through the circuit. Next the relays can sign the relay traffic among themselves (hops Ps-Ps' and Ps'-Pd'). However this shorts Ps and Pd' since they don't get S-Ps and Pd'-D respectively. However they are the inverse of each other, so perhaps a scheme can be devised that "closes the loop" such that Ps and Pd' sign eachothers outstanding up and down. This also disconnects the proxies from the seeder&downloader in multichain. S and D can include themselves in the proxy loop, since they can claim to be just a proxy. If S and D do not include themselves in the proxy loop they indirectly expose themselves to Ps and Pd' respectively as seeder and downloader, atleast there is no public record of this. A downside of this scheme is that the proxies are neutrally balanced, and since there is no public recognition for relaying in this system, there is not much incentive to relay traffic.

To fix this, multichain could be extended with new counters named "relay" and corresponding "relay_total" (or have a secondary multichain instance). The point of this counter would be to give recognition to the relays. However this does not combine perfectly with the proxy loop signing scheme since S and D then also get relay credits. So exposing S and D might be needed on the first and last proxy in the chain, such that they can eliminate them from the proxy signing loop.

This last bit is a rather fundamental problem, what ever scheme is devised S and D can always claim to be or present themselves as just a proxy, that is the point of annonimity. However to prevent them from getting credited (or paid or counted) as a relay, their true role will have to be exposed. The conflict between these two facts needs to be resolved. @synctext proposes to do this by having fixed length proxy chains.

@synctext
Copy link
Member Author

This week the alpha version of multichain got merged. Solid progress, this is 2 months after the idea of an alpha took root. We assume no redesign is needed.. The features for the next Beta 1.0 release are:

  • Limited crawling. Peers start collecting multi-chain records from peers that helped them. Simple linear downloading. Warning: limited crawling rate to avoid attempting to crawl Exit Node Servers.
  • Keep-alives. Add resilience to churn and keep open connections in the multi-chain community to 10 random prior peers which helped us (e.g. upload).
  • Strict checks. Add the @Captain-Coder branch and new database storage structure with strict checking and audits before storage.

Beta 2.0:

  • Live Edges. Get introduced by peers to the people that helped them. Random walk across live edges of the interaction graph. Use the keep-alive feature to traverse the time-ordered interaction graph and collect multi-chain records. Warning: ensure rate control to keep the database from overflowing.

@synctext synctext changed the title BarterCast4 brainstorm: strongest possible reputation system strongest possible reputation system Apr 19, 2016
This was referenced Jan 12, 2017
@synctext
Copy link
Member Author

synctext commented Feb 6, 2017

Reputation systems are linked to both game theory and smart contracts. Our upcoming 'smarter smart contracts' work requires enforcement mechanisms. This is brought together in self-enforcing protocols :

@synctext
Copy link
Member Author

synctext commented Feb 18, 2017

#67 from 2nd April 2013 seems to be the earliest public record of TrustChain / MultiChain / BarterCast3.

@qstokkink
Copy link
Contributor

What are your thoughts on making reputation tradable?

Given my current belief system, I would frame this as follows:

  1. You account a transfer of currency for bandwidth tokens.
  2. This is interpreted a useful work by others (how much currency equates to how much bandwidth should be determined somehow -- this is outside of my current scope).
  3. As a result, your reputation increases.

So, I would say you do not trade reputation directly: instead, you trade currency in such a way that other honest participants increase the reputation score they hold of you. It all depends on the conversion mechanism from heterogeneous units of accounting to homogeneous units of work[1]. Whether it is fair to say this corresponds to meeting the buyer's price, I don't know -- it is probably subjective: paying your own Sybil is most likely not that impressive to others.

In conclusion, in a roundabout way:

Does this make sense?

Yes.

Lastly:

Are you also considering the allocation policy? Or is the scope limited to determining reputation of others?

Given my current 6 page limit, this will be limited to the latter: determining reputation of others.

[1] For example, you may define 10 Mb download equals -10 work, 10 Mb upload equals +10 work, 1 Bitcoin equals +1000 work. This will probably also fluctuate continuously due to complicating temporal factors, like current market prices or the local legal ramifications of uploading.

@synctext
Copy link
Member Author

I'm currently not quite happy yet with the interconnection between the theorems: they do not work to some grand conclusion

So, the hard work now begins of proving some strong property that results from this neat model. For instance, if the number of fraudsters/Sybil is limited to 🔣, it is guaranteed that cooperation will emerge. Or. Even a tiny minority which is honest we prove that the ecosystem eventually converges, they find each other, collaborate, and can sustainably withstand continuous Sybil or fraud attacks of any kind.

@qstokkink
Copy link
Contributor

Small update: I'm now referring to the accounting mechanism as the accountability mechanism instead. That should lead to less confusion.

@devos50
Copy link
Contributor

devos50 commented Jul 16, 2020

An "accounting mechanism" differs from an "accountability mechanism". Accountability, in my opinion, is the extent to which someone can be held responsible/accountable for their actions and behaviour after-the-fact. Accounting, however, is the recording of actions in a community as they are performed. Given these (informal) definitions, the term "accounting mechanism" seems to align better with reputation/trust.

@qstokkink
Copy link
Contributor

I mostly agree, but I believe accounting has the subtle undertone of being a log of things holding value that are countable and transferable, which is not necessarily true for all events in the system. This is why I'm trying to find another term.

The proper terminology would probably be an eventually consistent log technology, but this is not really marketing-compatible 😃

@ichorid ichorid added the Epic label Jul 17, 2020
@synctext
Copy link
Member Author

next step could be a list of deployed reputation systems:

@synctext
Copy link
Member Author

Key related work: A Taxonomy to Express Open Challenges in Trust and Reputation Systems
Lists all the challenges and in-depth analysis: Trust value representation, Information source, Sybil, white washing, slander, etc. Insightful comments on future work in 2012, still unsolved in 2020:

Proposals from the academic community are not always deployable and are usually designed from
scratch. Only in a very few cases do authors build on proposals from others. Hence, there is a
need for a set of sound, accepted principles for building trust and reputation systems. The design
space and limitations of mediated trust and reputation mechanisms should be explored and a set of
design parameters that work best in different settings should be understood. Formal models
of those systems in both monopolistic and competitive settings should be developed.

image

@qstokkink
Copy link
Contributor

Update: the #31 (comment) paper is now published. https://doi.org/10.1145/3428662.3428790

@synctext
Copy link
Member Author

synctext commented Apr 21, 2021

New axiomatic principles:

  • past actions are known, recorded tamper-proof, shared, and widely available.
  • the present is independent, unwritten, and uncertain

inspiration: "Pragmatically: the past is shared; the present is independent", from Jepsen intro to distributed systems
Agent-based model of 'open collaboration' with 3 user types: cooperators, reciprocators, and freeriders. Message: "it" works even when cooperators are a small minority.

@synctext
Copy link
Member Author

synctext commented Sep 21, 2021

Related work (non-axiomatic):

@synctext
Copy link
Member Author

synctext commented Mar 10, 2022

Database support for our axiomatic principles:

  • past actions are known, recorded tamper-proof, shared, and widely available.
  • the present is independent, unwritten, and uncertain

For 2023 I hope our lab is experienced enough to build a new (self-organising) distributed database paradigm @kozlovsky. Something on top of Pony, integral reputation/trust tracking, something like Trustchain for each content insert, everybody runs a "full nodes", and uncompromising permissionless. Like the perfect blend of Bitcoin with ORM. The reputation function of content is used to determine if information is actively suppressed or actively spread Trustworthy_Gossip(suppress,delete,store,collect,spread), see trustworthy gossip.
An inspirational example appeared online this week. Interesting example of what people are trying. Good to identify the weakness, as this will never scale beyond a demo. It can not handle spam, fraud and fake identities. Fake identities are huge, people seem to not always grasp the scale of these attacks. More fake people sign up for Facebook each year, then the amount of real humans!

@qstokkink
Copy link
Contributor

To avoid making the same mistakes twice, two short post-mortems of our previous work:

  • Dispersy store_update_forward principle.
    Status: retired.
    Reason: by integrating the "store, update, forward" principle into all packets, packet handling in Dispersy is 20 times slower at handling packets than IPv8, which does not integrate this principle into its network layer. Secondarily, it became nigh-impossible to escape the principle and make optimized synchronization primitives (e.g., TTL-based packets for MultiChain).
    Lesson: store packets sparingly and purposefully as a result of overlay logic, i.e., do not integrate generic database synchronization into the network layer.
  • IPv8's store, update, forward, vote, purge, delete paradigm.
    Status: never used.
    Reason: so far, existing communities have never fit very well to this principle, requiring boilerplate code. More appropriate (i.e. smaller and faster) ways to synchronize information are usually available (e.g., Trustchain forwarding, GigaChannel synchronization and VSIDS-based channel votes).
    Lesson: a general approach for gossiping (thus far) has never been as useful for Tribler in comparison with approaches tailored to a specific use case.

@synctext
Copy link
Member Author

synctext commented Mar 11, 2022

True! application-specific gossip forwarding functions. Generic approaches never get used or are inefficient.
Maybe same for reputations: application-specific reputation function. Generics just don't work.

@synctext
Copy link
Member Author

synctext commented Mar 2, 2023

3Es of Sybil resistance are: 1) Entry cost 2) Existence cost 3) Exit cost.
Consensus-based approach. May 2015 related work SF Bitcoin Devs: Asynchronous Byzantine Consensus Suitable for Decentralized Networks. Founder of proof-of-useful-work VC-funded startup, filled with former IBM people {dfinity}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

10 participants