PdfIndexer

From BITPlan Wiki
Revision as of 15:38, 22 August 2018 by Wf (talk | contribs) (→‎Motivation)
Jump to navigation Jump to search
OsProject
edit
id  pdfindexer
state  
owner  WolfgangFahl
title  Java Library and Tool to Index and search PDF files using Apache Lucene and PDF Box
url  https://github.com/WolfgangFahl/pdfindexer
version  0.0.11
description  
date  2018/08/22
since  
until  

Motivation

In one of our project we were asked to check a few dozen PDF documents for consistency. So we needed a way to cross-reference the documents and find keywords. At the time there was no SimpleGraph project yet and we created a special solution end made it available as OpenSource.

Using in Docker

In Issue #4 peebles asked how the example would be run in a docker container.

open a Java container allowing access to the current directory

# get a fresh version of the PDF Indexer
git clone https://github.com/WolfgangFahl/pdfindexer
# change to the directory
cd pdfindexer
# run a docker Container with OpenJDK Java 8
docker run --rm -it -v $(pwd):/deploy -w /deploy openjdk:8 bash