THBPdf Download Contact Us Buy Online Developerse-mail me

Can Results From A Search By Adobe Reader 9 Or Another Software




Message-ID:<halv2u$5jo$1@news.eternal-september.org>
Subject:

Can Results From A Search By Adobe Reader 9 Or Another Software Be Exported?


Date:Fri, 9 Oct 2009 01:11:08 +0100


Hello,

I'm currently using Adobe Reader 9 (free) to search for common terms within
a folder in which I keep certain PDF files.

Quite frequently, I locate multiple instances of the searched term, which
then results in several hours of noting, by pen on paper, the particulars
involving each term located.

Though I have yet to locate it, does Adobe Reader 9 have a function that can
be enabled to export the results, either one-by one or as a batch (much
preferred!) into TXT or another format?

Failing the above, is there a software available that can accomplish my
objective of saving/exporting results?

As always, any and all constructive suggestions are greatly appreciated.

Martin

BTW, if there is another NG more appropriate for my topic, please provide
its name. TY.







Message-ID:<7j9s0lF34o81iU1@mid.individual.net>
Subject:

Re: Can Results From A Search By Adobe Reader 9 Or Another Software Be Exported?


Date:Fri, 9 Oct 2009 23:32:52 +0100


Martin wrote:
> Hello,
> 
> I'm currently using Adobe Reader 9 (free) to search for common terms within
> a folder in which I keep certain PDF files.
> 
> Quite frequently, I locate multiple instances of the searched term, which
> then results in several hours of noting, by pen on paper, the particulars
> involving each term located.
> 
> Though I have yet to locate it, does Adobe Reader 9 have a function that can
> be enabled to export the results, either one-by one or as a batch (much
> preferred!) into TXT or another format?
> 
> Failing the above, is there a software available that can accomplish my
> objective of saving/exporting results?

It depends on what "particulars" you need to extract/record for each 
hit. Most PDF files contain little structural information that a 
computer can detect, so if you wanted -- for example -- to record the 
structural location (eg chapter 17, section 5, subsection 4) you'd 
probably have to do that by hand no matter what kind of searching you used.

On the other hand, if all you need is some words of context (eg the 
sentence and the page) then it's not hard to do this with a simple 
script. pdftotext, for example, outputs paragraphs as long lines and 
inserts a FF character at each page break, so a few lines of Perl or awk 
can easily find the word you are looking for, counting pages and 
paragraphs as it goes. I did something similar recently to extract a 
5-word-either-side context for a simple search:

> for f in *.pdf; do
> 
>     echo $f
>     pdftotext $f - |\
>         grep -Ein '(^\|text)' |\
>         awk -F: '{if(substr($2,1,1)=="\f"){++page;line=$1}} \
>            /.*text.*/ {n=split($2,w," ");\
>               for(i=1;i<=n;++i) \
>               if(w[i]~q){print f,"p." page+1,"para",$1-line;s="";\
>                          for(j=i-5;j<=i+5;++j){s=s " " w[j]};\
>                          print s}}' f=$f q="text"
> 
> done

(substitute the word you want for "text"). This formats the output as:

> thesis.pdf p.83 para 12
>  the hierarchy which merely quoted text from an earlier post 
> thesis.pdf p.101 para 12
>  in almost any situation where text has to be typed or

etc.

///Peter




 

|THBPdf| |Download| |Developers|