Skip to content
This repository has been archived by the owner on Sep 20, 2023. It is now read-only.

Improvements to Event Cache #1947

Closed
nick opened this issue Apr 5, 2019 · 16 comments
Closed

Improvements to Event Cache #1947

nick opened this issue Apr 5, 2019 · 16 comments
Assignees
Labels
dapp discussion enhancement New feature or request help wanted Extra attention is needed javascript P2 Small number of users are affected, major cosmetic issue

Comments

@nick
Copy link
Contributor

nick commented Apr 5, 2019

The Origin marketplace dapp is able to function with only 2 dependencies: an Ethereum node and an IPFS server. Every time a user takes an action such as creating a listing or making an offer, an event is emitted on the Marketplace contract that follows the following structure:

Event(index party, indexd listingId, indexd offerId, ipfsHash)

The first 3 parameters as well as the name of the event are 'indexed', meaning they can be filtered on when requesting data from the Ethereum node. These indexed parameters are known as 'topics'. topic 0 is the hashed event name and topics 1-3 are the parameters marked as 'indexed'.

Having events indexed in this way is a useful property as we can now easily request events pertaining to a certain individual or particular listing or offer.

For example, to retrieve all events related to listing 345, we would issue a query for all Marketplace events with topics *, *, 345, * (any event name, any party, listing 345, and any offer). To retrieve all events created by a particular user, we would query for *, userWallet, *, *. To retrieve all listings created by a particular user we would query for ListingCreated, userWallet, *, *.

In order to build up the current state of a listing or offer, Origin has an event source module which, given a listing ID and/or offer ID will first fetch all related events from the Ethereum node, then pull all related data from IPFS, then construct the current 'state' of the listing or offer by using what is essentially map/reduce.

This presents several problems:

  • There can be a lot of duplicate events, eg requesting events for a particular user or listing can result in the same event being returned twice. It would be better if we could only request data we know we don't already have.
  • Requesting events from long ago is inefficient due to the Ethereum node having to look in every single block to see if any events were emitted on the given contract. In order to combat this, events need to be queried in batches.

In order to address these issues, we have an eventCache module which does the following:

  • Requests all events ever emitted on the Marketplace contract in batches of 20,000 blocks
  • Stores those events in localStorage and optionally IPFS.
  • When the user comes back to the app, events are loaded from localStorage and/or IPFS if available. The last queried block in the cache is compared with the current block number from the Ethereum node, and all events between those block numbers are queried for and appended to what is already cached locally.

Although this has worked well so far, storing every single event locally in the browser is not
scalable. The easy way out would be to simply request data from a centralized server... but we should try to avoid that.

Instead, I think we should make our event cache smarter and more capable. Here's my proposed features of the new event cache:

Store events in IndexedDB
IndexedDB has a higher storage quota that localStorage and could be more performant when querying data as it has SQL-like syntax. Our existing solution marshals the JSON into localStorage then calls filters it by comparing every item, which is inefficient on large datasets.

Be contract agnostic
The existing eventCache has methods for listings and offers, but we can instead try to be contract agnostic so the same cache can be used by other contracts (such as IdentityEvents) and even other projects.

Only request data we need
We should only query events relating to data the user is interested in rather than getting everything.

@micahalcorn micahalcorn added dapp discussion enhancement New feature or request help wanted Extra attention is needed javascript labels Apr 5, 2019
@franckc
Copy link
Contributor

franckc commented Apr 5, 2019

Thanks for starting this discussion ! A few general thoughts:

  • Whatever solution we decide for, I think it's really important to make sure that it will work well on mobile. I'd even push this further and say we should aim to have the DApp being usable on a low-end Android device on 3G network. At my past job we setup an intentionally throttled wifi network to be able to easily test the user experience on a slow network. It was eye opening ! :)
  • IndexedDB is definitely superior than local storage. Is it (yet) universally supported on all devices/browsers we care about ?
  • It would be interesting for us to do some basic projections based on # of listings in the marketplace about amount of data we need to store and expected number of network calls. In particular I'd hate for us to invest time implementing a solution that works for up to 10k listings but breaks past that - in that case we should leapfrog to a solution that gets us to 100k listings now, even if it takes a bit more work.

@mikeshultz mikeshultz self-assigned this Apr 7, 2019
@mikeshultz
Copy link
Contributor

mikeshultz commented Apr 7, 2019

Just starting to look at this. For what it's worth, IndexedDB support is pretty solid.

Have a few questions before I start on this:

  • The eventCache module is part of @origin/graphql. Was that provided as an example, expected to be implemented as part of @origin/eventsource, or is @origin/graphql actually a client-side app?
  • @origin/eventsource appears to be implemented by server-side apps(e.g @origin/graphql(?) and origin-discovery). I assume this needs to be taken into account and there should be some kind of fallback, or is that what IPFS support was intended to be used for?
  • I noticed there were no unit tests? What's the best/simplest way to test the @origin/eventsource package?
  • Caching events shouldn't be a big deal at 100k events. Back of the envelope math suggests ~10MB. However, pulling in listing and offer data from IPFS (e.g. images), could significantly grow this to ridiculous sizes. Should that data be cached, or are we just trying to save requests to JSON-RPC for Ethereum events? IPFS caching might be best left to the IPFS daemon or a gateway.
  • It looks like @origin/graphql is intended to be run both in-browser and with node?

EDIT: Whoops, I think I misunderstood. You want this implemented in the eventCache module of the @origin/graphql, not in eventsource. Those first 3 questions can be ignored, then.

Mobile is a challenge. React-native doesn't support IndexedDB. There is, however, react-native-async-storage.

@nick mentioned setting this up as its own @origin/event-cache package with support for Postgres as well. This should all be doable, it'll just require some mega-abstraction.

@mikeshultz
Copy link
Contributor

Kind of writing up a bit of an API spec to get an idea of what needs to be built. Thoughts?

EventCache API

For the purpose of this document, we'll call the primary class EventCache. This could probably continue to be a function like the current implementation that's appended on as a method of web3.eth.Contract. Either way, the API stays the same and should require no changes from dependents.

Usage

// Example from graphql contract module:
context.marketplace.eventCache = new EventCache({
    contract: context.marketplace,
    fromBlock: epoch,
    web3: context.web3,
    config: context.config
)
  • contract - The Web3.js initiailzied Contract
  • fromBlock - The block number to start from when fetching events
  • web3 - An initialized web3.js Web3 object to use for JSON-RPC calls
  • config - A configuration object. See below:

config object

Example config object provided to constructor:

{
    platform: ['browser', 'mobile', 'nodejs', 'ipfs', 'auto'],
    backend: [EventCacheBackend],
    ipfsEventCache: 'QmO0O0O0base64YOO000000....',
    ipfsGateway: 'http://localhost:8080'
}
  • platform will tell EC which backend to use.
  • backend is an alternative to platform and will override any setting there and can provide any object with the Below API
  • ipfsEventcache: The IPFS hash of the latest known cached results
  • ipfsGateway: The HTTP(S) IPFS gateway to fetch cached results from

NOTE: Notably not an IPFS API. No /add functionality. How will storage happen?

EventCache API

Leaving this pretty much the same as the current implementation to reduce any changes in dependents.

allEvents(eventNames, party, offerIds)

Retrieve all events according to the given args.

listings(listingId, eventName, blockNumber)

Get all listing events, given the parameters

offers(listingIds, offerId, eventNames, notParty, isParty)

Get filtered offers

EventCacheBackend API

This will be the storage interface that EventCache uses to store and fetch events. Many backends will need to be created to support the target storage platforms.

Target Backends

NOTE: Are all these needed immediately, or can they be implemented in stages?

  • ipfs (what uses this?)
  • IndexedDB (frontend)
  • PostgreSQL (backend/infra)
  • react-native-async-storage (mobile)
  • Generic in-memory (what would use?)

setLatestBlock(blockNumber)

Set the latest block number

getLatestBlock()

Get the latest block number known by the backend

addEvent(eventObject)

Add an event to the storage.

eventObject

For the structure of eventObject, see the web3.js event Object.

addBatchEvents(arrayOfEventObjects)

Add a batch of events as an array of event objects(see above). These can be fed directly from a web3.eth.Contract's getPastEvents() call.

get(argMatchObject)

A general-purpose method to fetch events. Since there will be many storage backends, a lot of functionality is hidden by this method. And each backend will probably have a very unique implementation.

argMatchObject

Let's use the ListingCreated event as an example. We want to find any events created by a specific party.

const listingsBy = await backend.get({
    event: 'ListingCreated',
    party: '0x87489293284329adeadfeed000000000000000'
})

@nick
Copy link
Contributor Author

nick commented Apr 8, 2019

Thanks Mike, here are a few comments:

  1. Lets deal with IPFS caching in @origin/ipfs - no need to worry about that in EventCache.
  2. We can pass in the IPFS API endpoint as part of the config too, for posting the cache to IPFS. We may need to do this in batches sometime in the future, eg 1,000 events per IPFS blob.
  3. Don't worry about the react-native backend for now - since the dapp itself will run in a webview that has IndexedDB support I don't think we need react-native support initially
  4. The current EventCache api is pretty confusing, so I'd like to take the opportunity to rework it to be more similar to the web3.js API. I guess this style will only work for contracts like ours that keep indexed parameters in a consistent order. We'd have to pass in the parameter names in the config object, something like this:

orderedParams: [{ name: 'party', type: 'address' }, { name: 'listing', type: 'uint' }, { name: 'offer', type: 'uint' }]

Here's how I'd like to use the API for a Marketplace EventCache:

Get all events for listing 123:
eventCache.getEvents({ listing: 123 })

Get all events for pertaining to user 0xabc...:
eventCache.getEvents({ party: '0xabc...' })

Get all events for offer '123-456' (listing 123, offer 456):
eventCache.getEvents({ listing: 123, offer: 456 })

Get all ListingCreated events for user 0xabc...:
eventCache.getEvents({ event: 'ListingCreated', party: '0xabc...' })

Get all OfferCreated events for listing 123 or 456:
eventCache.getEvents({ event: 'OfferCreated', listing: [123, 456] })

Get all Seller initiated events for listing 123:
eventCache.getEvents({ event: ['OfferAccepted', 'OfferRuling'], listing: 123 })

Here's how IdentityEvents would be used:

orderedParams: [{ name: 'account', type: 'address' }]

identityEventCache.getEvents({ account: '0xabc123...' })

It would be nice if we could also offer a web3.js compatible API, though filters would only work when given a specific event:

Get all ListingCreated events for user 0xabc...:
eventCache.getPastEvents('ListingCreated', filter: { party: '0xabc...' })

Get all OfferCreated events for listing 123 or 456:
eventCache.getPastEvents('OfferCreated', filter: { listingID: [123, 456] })

This is less important and not needed for v1, though worth bearing in mind.

@mikeshultz
Copy link
Contributor

mikeshultz commented Apr 8, 2019

We can skip the params arg, and derive that from the provided web3.eth.Contract and event name should they be needed.

Should be able to mimic the getPastEvents() web3.js method to an extent, however, using topic filters in the eth_getLogs fetch would make keeping track of cached data tricky. Trying to keep track of what is cached and what still needs to be fetched with a large combination of topics and events isn't a simple task. Maybe we can stick with allEvents for now in the JSON-RPC request, but use the provided topic filters for lookup in the cache storage backend? In that case, we don't really need to limit it to indexed params, either. If this eth_getLogs filtering is an original requirement and I misunderstood, let me know.

For backwards compat, should I still include those calls I have above or scrub them all in favor of getPastEvents()?

@nick
Copy link
Contributor Author

nick commented Apr 8, 2019

We can skip the params arg and derive from the provided event names, sure.

We can start off by getting allEvents, however I think that's only a short term solution. If we ever get to millions of events we'll no longer be able to rely on that. Not an issue for right now but worth thinking about while we're in the design phase. Ideally we'd like to get to a point where we're only downloading events needed to show the data the UI needs.

Lets not bother with .listings and .offers and just replace those calls (there aren't too many of them)

@mikeshultz mikeshultz mentioned this issue Apr 9, 2019
4 tasks
@mikeshultz
Copy link
Contributor

If anyone wants to take an early peek to make sure the API looks good, see #1975. So far it has the in-memory and the IndexedDB backends. I'm going to start the Postgres backend either tonight or tomorrow, but had a couple of questions before I do:

  • Is there a preferred postgres/db library I should use? I saw sequelize referenced in the repo, but only in ops/infra.
  • Should this package bother with connection init, or should the consumer of the package provide the connection?
  • If this package does connection init, should it do the DDL, or just assume the schema is already in place?

I generally prefer having the user provide a DSN, then having the package create the necessary schema. That said I'd still rather to stick with the current style of things and stay in-line with what's being done now. I'm also not super familiar with what will be using this package, so I'd only be guessing how it'll be used with the Postgres backend.

@nick
Copy link
Contributor Author

nick commented Apr 10, 2019

👆 @franckc @tomlinton @DanielVF

@tomlinton
Copy link
Contributor

This is looking awesome @mikeshultz.

  1. Yep, go with sequelize.
  2. Sequelize has a config system, but we just always tell it to use the DATABASE_URL env var, see for example. . This is what we use for everything so you can't go wrong with that.
  3. The package should provide its own sequelize migrations. It'll then be up to the packages that use this package to run them.

Great work!

@nick
Copy link
Contributor Author

nick commented Apr 10, 2019

FYI there are now some puppeteer integration tests on the marketplace dapp we can use to test the finished implementation. Can be run with npm test inside dapps/marketplace

@mikeshultz
Copy link
Contributor

@tomlinton Thanks for the info and links!

@nick puppeteer looks nice. Right now my IndexedDB tests are mocked using fake-indexeddb. Would be nice to get actual browser testing.

@micahalcorn micahalcorn added the P2 Small number of users are affected, major cosmetic issue label Apr 11, 2019
@mikeshultz
Copy link
Contributor

@nick I missed your mention of array params in your previous post(that you also just mentioned in discord). The logic to do that would be a bit spicy. For instance, what would it do for this hypothetical request:

eventCache.getPastEvents('OfferCreated', filter: { listingID: [123, 456], account: ['0x..1', '0x...2'] })

Would it match everything as an OR, and run 4 separate queries anyway? AND on all of them would never return results. But maybe you need to AND between the first of each and second of each.

Maybe it would be better for the consumer of this package to just run separate queries for each, anyway, since that's what each backend would be doing anyway (except maybe Postgres)?

Maybe a good compromise would be creating a custom method for this purpose. For example...

eventCache.getOr('OfferCreated', 'listingId', [123, 456])

That way it could be limited to one param and be explicit in its behavior.

@nick
Copy link
Contributor Author

nick commented Apr 12, 2019

The web3.js getPastEvents method will treat arrays as 'OR' I think:

{filter: {myNumber: [12,13]}} means all events where “myNumber” is 12 or 13.

I think if 'event' were passed in, it'd have to work as an 'AND' though... so the following:

eventCache.getEvents({ event: ['OfferCreated', 'OfferFinalized'], party: "0xabc" })

would be event IN ('OfferCreated, 'OfferFinalised') AND party = '0xabc'

eventCache.getEvents({ event: ['OfferCreated', 'OfferFinalized'], party: ["0xabc", "0xdef"] })

would be event IN ('OfferCreated, 'OfferFinalised') AND party IN ('0xabc', '0xdef')

It gets tricker when we want to specify an offer though as the offer ID is dependent on the listing ID. Maybe something like:

eventCache.getEvents({ event: ['OfferCreated', 'OfferFinalized'], or: [{ listingID: "1", offerID: "0" }, { listingID: "1", offerID: "3" }] })

would be event IN ('OfferCreated, 'OfferFinalised') AND ((listingID = "1" AND offerID = "0") OR (listingID = "1" AND offerID = "3"))

@mikeshultz
Copy link
Contributor

hrm, I didn't notice that in the web3 API. This is going to get pretty dirty, especially outside of the postgres backend. I'll see what I can do.

@nick
Copy link
Contributor Author

nick commented Apr 12, 2019

Dexie.js has good filtering support but might be too heavy weight 🤔

@nick
Copy link
Contributor Author

nick commented Apr 24, 2019

Think we can call this done now 🙂

@nick nick closed this as completed Apr 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dapp discussion enhancement New feature or request help wanted Extra attention is needed javascript P2 Small number of users are affected, major cosmetic issue
Projects
None yet
Development

No branches or pull requests

5 participants