Skip to content

Ruby document converter, text and image extractor using OpenOffice headless server, pdf_tools and net_pbm (RoR engine)

License

Notifications You must be signed in to change notification settings

itkin/proselytism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Proselytism

Document converter, text and image extractor using OpenOffice headless server (JOD or PYOD converter), pdf_tools and net_pbm

Handled formats for document conversion : odt, doc, rtf, sxw, docx, txt, html, htm, wps, pdf

Note

This gem has been originally written as a RoR 3.2 engine running on Ruby 1.8.7.

It is framework agnostic and has been tested on Ubuntu and MacOSX.

Installation

Install the required external librairies :

# aptitude install netpbm
# aptitude install xpdf
# aptitude install libreoffice

Add this line to your application's Gemfile:

gem 'proselytism'

Note : for ruby 1.9 use the branch 1.9

gem 'proselytism', :git => "git://github.com/itkin/proselytism.git", :branch => "1.9"

And then execute:

$ bundle

##Configuration

  • With a YAML config file:
rails g proselytism:config

As a Rails engine, Proselytism automatically load /config/proselytism.yml (if the file exists) and set its config params depending on the current rails env.

  • With an initializer (optional for Rails App) :

You can override the configuration file params by adding a custom initializer to /config/initializers . By default Proselytism will log in a separate log file, if you want to use the rails logger

#/config/initializers/proselytism.rb
Proselytism.config do |config|
  config.logger = Rails.logger
end

To generate a full config initializer:

rails g proselytism:initializer

Usage

Proselytism.convert source_file_path, :to => :pdf do |converted_file_path|

end
Proselytism.extract_text source_file_path do |extracted_text|

end
Proselytism.extract_images source_file_path do |image_files_paths|

end

Proselytism creates its converted files in temporary folders.

  • If you pass a block to the method above the folders are automatically deleted after the block is yield, so use or copy the file content within the block
  • If you don't pass a block, the mentioned folder and its content remains permanently, so don't forget to safely remove it yourself
pdf_file_path = Proselytism.convert source_file_path, :to => :pdf
#my code
FileUtils.remove_entry_secure File.dirname(pdf_file_path)

Add your own converters

Add your own converter by extending Proselytism::Converters::Base

  • Your converter will be automatically selected and used related to the params given to the :from and :to methods
  • Add a perform method which
    • calls the execute method with your custom command
    • returns the converted file(s) path(s)

Proselytism::Converters::Base takes care of

  • raising error (if the command execution fail)
  • logging the command output
class MyConverter < Proselytism::Converters::Base
  class Error < parent::Base::Error; end
  
  form :ext1, :ext2
  to :ext3, :ext4

  def perform(origin, options={})
    destination = destination_file_path(origin, options)
    command = "mycommand #{origin} #{destination} 2>&1"
    execute command
    destination
  end
end

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

About

Ruby document converter, text and image extractor using OpenOffice headless server, pdf_tools and net_pbm (RoR engine)

Resources

License

Stars

Watchers

Forks

Packages

No packages published