ipfs_indexer

An ipfs indexer / search engine built in rust.

Build notes

On Ubuntu you may need to apt install: libssl-dev.

As of libp2p 0.44.0, it seems to require rust nightly: https://stackoverflow.com/questions/69848319/unable-to-specify-edition2021-in-order-to-use-unstable-packages-in-rust

Did init as a "binary" - not sure if this makes sense, or if other people think we should split this into a library bit and an application bit. I suppose we can always change it later as it grows.

Following this guide for libp2p: https://github.com/libp2p/rust-libp2p/blob/master/src/tutorial.rs

Trying to follow best practices from here: https://doc.rust-lang.org/cargo/commands/cargo-init.html

Adding Cargo.lock to version control - it seems like it might be best practice for a binary (app): https://stackoverflow.com/questions/62861623/should-cargo-lock-be-committed-when-the-crate-is-both-a-rust-library-and-an-exec

Building on mac

Requirements: xcode

Run xcode-select --install if you do not have xcode installed, need to update xcode, or run into xcode related build errors

Running with logging output

Run cargo build to build
Run RUST_LOG=info ./target/debug/ipfs_indexer to see logging output (adjust level accordingly)
Run RUST_LOG=info ./target/debug/ipfs_indexer 127.0.0.1:8080 to use your own ipfs gateway instead of ipfs.io

By default runs an endpoing on 0.0.0.0:9090 so you can go to

Running with docker

From the docker directory, run docker-compose up. Currently image is only ~26MB.

CI

I setup a workflow that should run a build at least on push, but doesnt run any tests because I have no idea how test suites work yet for rust.

What Needs to be Done

Discover content to be indexed, add them to the index queue
- Listen in on the gossip protocol Jason working on
- Start from some collection of pages on ipfs.io/ipfs
Implement an index queue processor
- Fetch the ipfs content
- Process the page for more ipfs links, Add those links into the index queue
- Index the pages somehow
  - Ranked keywords by frequency or something?
  - Need to update to support more than just html content (look at header and index files)
  - Update the except to be flexible - for images, it could be a small crop render of the original image, for videos it could be a gif preview render
- Store the index somehow (start with in-memory, then figure out how to do storage later) - Conor working on
  - A hashmap of map[keyword] -> sorted tree where the entries are sorted by keyword frequency and entries contain ipfs hash? - Conor working on
  - Will probably want to think of ejection mechanism sooner than later so we can eject to storage (least recently used? oldest? who knows?)
  - Farther out - need to think about how the store will be designed
- Probably also want to store an excerpt, page title of the page to present to front-end
Implement a backend API which a future front-end can use, and in short term we can use to debug
- search -> search result
  - ordered list of <title, link, excerpts>, possibly grouped by text, images, videos, other
- stats:
  - indexed entries
  - outstanding index queue
  - memory used / free
  - stoage used / free
Implement a front-end which queries the index storage and displays the page title, ipfs/io/ipfs link to the page and excerpt from the browser
Feedback loop from what people click on more often to rank those higher
Might a docker container:
- deploy will auto restart itself on crash (will also make it easy to see consumed memory with docker stats and other tools)
- will be able to deploy with a local ipfs instance all ready to go within the container
- can artificially restrict memory so we can test things like ejection mechanisms
Farther out -> hook into papertrail or some logging service so we can see what's up if it dies
Tests! - Conor working on

Name		Name	Last commit message	Last commit date
Latest commit History 283 Commits
.github/workflows		.github/workflows
docker		docker
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ipfs_indexer

Build notes

Building on mac

Running with logging output

Running with docker

CI

What Needs to be Done

About

Releases

Packages

Contributors 4

Languages

License

compscidr/ipfs_indexer

Folders and files

Latest commit

History

Repository files navigation

ipfs_indexer

Build notes

Building on mac

Running with logging output

Running with docker

CI

What Needs to be Done

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages