Skip to content

Commit

Permalink
Merge branch 'master' of github.com:tabulapdf/tabula
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremybmerrill committed Sep 29, 2018
2 parents cc9c0c7 + b2b165e commit ac2b544
Show file tree
Hide file tree
Showing 4 changed files with 57 additions and 25 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ a simple web interface.

**Caveat**: Tabula only works on text-based PDFs, not scanned documents. If you can click-and-drag to select text in your table in a PDF viewer (even if the output is disorganized trash), then your PDF is text-based and Tabula should work.

**Security Concerns?**: Tabula is designed with security in mind. Your PDF and the extracted data *never* touch the net -- when you use Tabula, as long as your browser's URL bar says "localhost" or "127.0.0.1", all processing takes place on your local machine. Tabula does download a list of Tabula versions from our server to alert you if Tabula has been updated (and we use hits to that list to count how often Tabula is being used); it also downloads a few badges and assets from the web.
**Security Concerns?**: Tabula is designed with security in mind. Your PDF and the extracted data *never* touch the net -- when you use Tabula on your local machine, as long as your browser's URL bar says "localhost" or "127.0.0.1", all processing takes place on your local machine. Other than to retrieve a few badges and other static assets, there are two calls that are made from your browser to external machines; one fetches the list of latest Tabula versions from GitHub to alert you if Tabula has been updated, the other makes a call to a stats counter that helps us determine how often various versions of Tabula are being used. If this is a problem, the version check can be disabled by adding `-Dtabula.disable_version_check=1` to the command line at startup, and the stats counter call can be disabled by adding `-Dtabula.disable_notifications=1`. Please note: If you are providing Tabula as a service using a reverse SSL proxy, users [may notice a security warning](https://github.com/tabulapdf/tabula/issues/924) due to our stats counter endpoint being hosted at a non-secure URL, so you may wish to disable the notifications in this scenario.

## Using Tabula

Expand Down Expand Up @@ -137,7 +137,7 @@ Tabula has bindings for JRuby and R. If you end up writing bindings for another
- [tabulizer](https://github.com/leeper/tabulizer) provides [R](https://www.r-project.org/) bindings for tabula-java and is community-supported by @leeper.
- [tabula-js](https://github.com/ezodude/tabula-js) provides [Node.js](https://nodejs.org/en/) bindings for tabula-java; it is community-supported by @ezodude.
- [tabula-py](https://github.com/chezou/tabula-py) provides [Python](https://python.org) bindings for tabula-java; it is community-supported by @chezou.
- [tabula-extractor](https://github.com/tabulapdf/tabula-extractor/) *DEPRECATED* - Provides JRuby bindings for tabula-java
- [tabula-extractor](https://github.com/tabulapdf/tabula-extractor/) *DEPRECATED* - Provides JRuby bindings for tabula-java



Expand Down Expand Up @@ -187,12 +187,12 @@ version of the app.
jruby -G -S rake war
java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar build/tabula.jar
If you intend to develop against an unreleased version of [`tabula-java`](https://github.com/tabulapdf/tabula-java), you need to install its JAR to your local Maven repository. From the directory that contains `tabula-java` source:
mvn install:install-file -Dfile=target/tabula-<version>-SNAPSHOT.jar -DgroupId=technology.tabula -DartifactId=tabula -Dversion=<version>-SNAPSHOT -Dpackaging=jar -DpomFile=pom.xml
Then, adjust the `Jarfile` accordingly.
### Building a packaged application version
Expand Down
25 changes: 17 additions & 8 deletions webapp/static/js/tabula.js
Original file line number Diff line number Diff line change
Expand Up @@ -92,21 +92,26 @@ var TabulaRouter = Backbone.Router.extend({
});


Tabula.getVersion = function(){
Tabula.getSettings = function(){
Tabula.notification = new Backbone.Model({});
Tabula.new_version = new Backbone.Model({});
$.getJSON((base_uri || '/') + "version", function(data){
Tabula.api_version = data["api"];
Tabula.getNotifications();
$.getJSON((base_uri || '/') + "settings", function(data){
Tabula.api_version = data["api_version"];
if(data["disable_version_check"] === false) {
Tabula.getLatestReleaseVersion();
}
if(data["disable_notifications"] === false) {
Tabula.getNotifications();
}

// if(Tabula.api_version.slice(0,3) == "rev"){
// $('#dev-mode-ribbon').show();
// }

})
}
Tabula.getNotifications = function(){
if(localStorage.getItem("tabula-notifications") === false) return;


Tabula.getLatestReleaseVersion = function(){
$.get('https://api.github.com/repos/tabulapdf/tabula/releases',
function(data) {
if (data.length < 1) return;
Expand Down Expand Up @@ -150,6 +155,10 @@ Tabula.getNotifications = function(){
}
}
);
};


Tabula.getNotifications = function(){
$.ajax({
url: 'http://tabula.jeremybmerrill.com/tabula/notifications.jsonp',
dataType: "jsonp",
Expand Down Expand Up @@ -184,7 +193,7 @@ Tabula.getNotifications = function(){


$(function(){
Tabula.getVersion();
Tabula.getSettings();
window.tabula_router = new TabulaRouter();
Backbone.history.start({
pushState: true,
Expand Down
19 changes: 19 additions & 0 deletions webapp/tabula_settings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ module TabulaSettings

########## Defaults ##########
DEFAULT_DEBUG = false
DEFAULT_DISABLE_VERSION_CHECK = false
DEFAULT_DISABLE_NOTIFICATIONS = false

########## Helpers ##########
def self.getDataDir
Expand Down Expand Up @@ -74,6 +76,23 @@ def self.enableDebug
DEFAULT_DEBUG
end

def self.disableVersionCheck
disable_version_check = java.lang.System.getProperty('tabula.disable_version_check')
unless disable_version_check.nil?
return (disable_version_check.to_i > 0)
end

DEFAULT_DISABLE_VERSION_CHECK
end

def self.disableStatsCallback
disable_notifications = java.lang.System.getProperty('tabula.disable_notifications')
unless disable_notifications.nil?
return (disable_notifications.to_i > 0)
end

DEFAULT_DISABLE_NOTIFICATIONS
end

########## Constants that are used around the app, based on settings ##########
DOCUMENTS_BASEPATH = File.join(self.getDataDir, 'pdfs')
Expand Down
30 changes: 17 additions & 13 deletions webapp/tabula_web.rb
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,10 @@ def upload_template(template_file)
selection_count = template_data.size

# write to file and to workspace
Tabula::Workspace.instance.add_template({ "id" => template_id,
Tabula::Workspace.instance.add_template({ "id" => template_id,
"template" => template_data,
"name" => template_name,
"page_count" => page_count,
"name" => template_name,
"page_count" => page_count,
"time" => Time.now.to_i,
"selection_count" => selection_count})
return template_id
Expand All @@ -127,14 +127,14 @@ def upload_template(template_file)
run TabulaJobProgress
end

on "templates" do
on "templates" do
# GET /books/ .... collection.fetch();
# POST /books/ .... collection.create();
# GET /books/1 ... model.fetch();
# PUT /books/1 ... model.save();
# DEL /books/1 ... model.destroy();

on root do
on root do
# list them all
on get do
res.status = 200
Expand All @@ -143,22 +143,22 @@ def upload_template(template_file)
end

# create a template from the GUI
on post do
on post do
template_info = JSON.parse(req.params["model"])
template_name = template_info["name"] || "Unnamed Template #{Time.now.to_s}"
template_id = Digest::SHA1.hexdigest(Time.now.to_s + template_name) # just SHA1 of time isn't unique with multiple uploads
template_filename = template_id + ".tabula-template.json"
file_path = File.join(TabulaSettings::DOCUMENTS_BASEPATH, "..", "templates")
# write to file
# write to file
FileUtils.mkdir_p(file_path)
open(File.join(file_path, template_filename), 'w'){|f| f << JSON.dump(template_info["template"])}
page_count = template_info.has_key?("page_count") ? template_info["page_count"] : template_info["template"].map{|f| f["page"]}.uniq.count
selection_count = template_info.has_key?("selection_count") ? template_info["selection_count"] : template_info["template"].count
Tabula::Workspace.instance.add_template({
"id" => template_id,
"name" => template_name,
"page_count" => page_count,
"time" => Time.now.to_i,
"id" => template_id,
"name" => template_name,
"page_count" => page_count,
"time" => Time.now.to_i,
"selection_count" => selection_count,
"template" => template_info["template"]
})
Expand Down Expand Up @@ -251,8 +251,12 @@ def upload_template(template_file)
res.write(JSON.dump(Tabula::Workspace.instance.list_documents))
end

on 'version' do
res.write JSON.dump({api: $TABULA_VERSION})
on 'settings' do
res.write JSON.dump({
api_version: $TABULA_VERSION,
disable_version_check: TabulaSettings::disableVersionCheck(),
disable_notifications: TabulaSettings::disableNotifications(),
})
end

on 'pdf/:file_id/metadata.json' do |file_id|
Expand Down

0 comments on commit ac2b544

Please sign in to comment.