PDF Extracting


Table of Contents

PDF? No, thanks — just the text, please 
extract the text 
extract the graphics 
paste from pdf 
xpdf text selection 
Can I extract .eps figures from a .pdf file? 
Copying image using acroread from pdf file 
Copying image using acroread from pdf file 
Copying image using acroread from pdf file 
Copying image using acroread from pdf file 
Copying image using acroread from pdf file 

PDF? No, thanks — just the text, please 

http://www.csun.edu/helpdesk/linux.htm

extract the text 

Ever wanted to extract the text out of Portable Document Format (.pdf) file (also known as an Adobe Acrobat file)? You can use Adobe's utilities for this purpose, but chances are your system already contains a neat little utility that can do the job too. It's called pdftotext. The following command extracts the text from report.pdf, and writes it to a file named pdf.txt:

pdftotext report.pdf > report.txt

extract the graphics 

Like to extract the graphics? The pdfimages command works the same kind of magic for the pictures in the file; it writes each of them to a file that's named with a root filename, an automatically appended number, and a suffix that's appropriate for the type of file that's written (by default, Portable Pixmaps or Portable Bitmaps). The following command extracts the images from report.pdf:

pdfimages report.pdf report

The extracted images are named report.001.ppm, report002.ppm, etc.