PdfIndexer

From BITPlan Wiki
Jump to navigation Jump to search
OsProject
edit
id  pdfindexer
state  
owner  WolfgangFahl
title  Java Library and Tool to Index and search PDF files using Apache Lucene and PDF Box
url  https://github.com/WolfgangFahl/pdfindexer
version  0.0.11
description  
date  2018/08/22
since  
until  

Motivation

In one of our project we were asked to check a few dozen PDF documents for consistency. So we needed a way to cross-reference the documents and find keywords. At the time there was no SimpleGraph project yet and we created a special solution end made it available as OpenSource.

Using in Docker

In Issue #4 peebles asked how the example would be run in a docker container.

Start an OpenJDK8 container mounting the pdfindexer directory

# get a fresh version of the PDF Indexer
git clone https://github.com/WolfgangFahl/pdfindexer
# change to the directory
cd pdfindexer
# run a docker Container with OpenJDK Java 8
docker run --rm -it -v $(pwd):/deploy -w /deploy openjdk:8 bash