Skip to content

Extract a table of contents from a research article in pdf format

Notifications You must be signed in to change notification settings

JohannLac/pdfContentsExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Extract PDF Contents

Small python program to extract a table of contents from a research article in pdf format. The sections have to be of the form "x.x title of the section" where x.x denotes the numbering of the current (sub)section.

Use:

./extractPdfContent.py <path/to/pdf>

Based on textract, tested on robotics articles such as this one

Example:

>./extractContects.py article.pdf
Extract text from article.pdf...
Process and find matching patterns...
Result:
----------------------------------------
1 Introduction
2 Related Work
3 Metrics 
  3.1 Performance Metrics
    3.1.1 Mean Execution Time 
    3.1.2 Relative Standard Deviation 
    3.1.3 Mean Path Distance 
    3.1.4 Relative Standard Deviation 
    3.1.5 Path Anomaly 
    3.1.6 Estimated Time of Traversal 
    3.1.7 Success Rate 
  3.2 Experimental Environments
    3.2.1 Environment 1 
    3.2.2 Environment 2 
    3.2.3 Environment 3 
    3.2.4 Environment 4 
    3.2.5 Environment 5 
  3.3 Map Container Specifications
  4.1 Probabilistic RoadMaps 
    4.1.1 Uniform Space Sampling 
    4.1.2 Random Space Sampling 
    4.1.3 Space Sampling with Halton Sequences 
    4.1.4 Uniform Incremental Space Sampling 
    4.2.1 Simple Visibility Graph 
  4.2 Visibility Graphs
    4.2.2 Visibility Graph with Sparse Uniform Sampling
  4.3 Rapidly Exploring Random Trees 
    4.3.1 Standard RRT
    4.3.2 RRT 
    4.3.3 Multiple RRTs 
    4.3.4 Multiple Incremental RRTs 
  4.4 Space Skeletonization
    4.4.1 Generalized Voronoi Diagram 
5 Experimental Results
6 Conclusions 
----------------------------------------

About

Extract a table of contents from a research article in pdf format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages