Name		Name	Last commit message	Last commit date
parent directory ..
cyto		cyto
data		data
scrapy		scrapy
README.md		README.md
caibg.md		caibg.md
caibg.py		caibg.py
cyto.py		cyto.py

README.md

⛰️ caibg

Input data

Using scrapy web crawl capabilities with two custom spiders, have been crawled the sub domains rifugi-bivacchi and sentieri (prerequisites for running the above commands scrapy python package)

pushd scrapy
scrapy crawl caibg_rifugi   --nolog -O ../data/caibg-rifugi.json
scrapy crawl caibg_sentieri --nolog -O ../data/caibg-sentieri.json
popd
jq -s '.[0] + .[1]' data/caibg_rifugi.json data/caibg_sentieri.json \
   | jq '{"objects": .}' \
   > data/caibg.json

And glue together the results in caibg.json, a perfect graph of links between these two sub domains, ready to be parsed by pgrank

Output data

pgrank data/caibg.json data/caibg.csv

The result is a csv file contains pageranks in the same order of the given input url nodes from json ... That's it. pgrank app only compute intensive tasks, future analysis can be delegated to more friendly framework, such as pandas

To make the results more readable caibg.py creates a summing up markdown table caibg.md

Note

requirements: pandas tabulate

Interactive cytoscape.js network version is available at https://andros21.github.io/pgrank/caibg/

Note

cyto/data.json can be created using cyto.py
requirements: pandas networkx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caibg

caibg

README.md

⛰️ caibg

Input data

Output data

Files

caibg

Directory actions

More options

Directory actions

More options

Latest commit

History

caibg

Folders and files

parent directory

README.md

⛰️ caibg

Input data

Output data