Skip to content

lexingxie/academic-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo contains the scripts to process Microsoft Academic Graph, in order to profile the citation influence and reference heritage of a publication venue (e.g. conferences).

developer workflow to analyze a new conference /venue

  1. prep Paper.db (once for each new version of MAG data) first run prune_papers.ipynb

then import the result to sqlite

sqlite> create table paper_pruned(id TEXT, year INTEGER, venueid TEXT);                

sqlite> .separator ","                                                                   

sqlite> .import ./data_txt/Papers_pruned.txt paper_pruned  

or

sqlite3 Papers.db < paperdb.sql

note: 75M+ papers with unknown venues among 120M in all (jan 2016) 73M+ papers with unknown venues among 126M in all (apr 2016)

  1. get its citings and cited record (~30 mins) python export_citations.py WSDM

prep-step: [This is arleady been done in export_citations.py below] get subset for its published papers:

xlx@braun:/data2/xlx/MicrosoftAcademicGraph$ grep WSDM data_txt/ConferenceSeries.txt
42C7B402 WSDM Web Search and Data Mining
xlx@braun:/data2/xlx/MicrosoftAcademicGraph$ grep 42C7B4025 data_txt/Papers.txt > papers.WSDM.txt

  1. do the necessary joins (can take a few hrs) python construct_citation_table.py MM

About

data amusement on the microsoft academic graph

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages