hiltdeal.blogg.se - Pdfinfo cern cmssw

#PDFINFO CERN CMSSW PDF#
#PDFINFO CERN CMSSW SOFTWARE#

These functions above will be used in my pdf2searchablepdf project here.

The pdfinfo technique in Ocaso's answer below is also very fast-the same as the pdftoppm one. Testing them with the time command in front shows that the strings one is extremely slow, taking ~0.200 sec on a 142 pg pdf, whereas the pdftoppm one is very fast, taking ~0.020 sec or less on the same pdf.

# SUPER SLOW! Putting `time` just in front of the `strings` cmd shows it takes ~0.200 sec on a 142

#PDFINFO CERN CMSSW SOFTWARE#

This information describes the exact setup for the CMS software executable that was used in the data-processing steps and it is provided only for. The first method, however, does report the same number as Adobe. # num_pgs="$(getNumPgsInPdf "path/to/mypdf.pdf")" The data records on this portal keep track of this information, and provide the job-configuration files used in the processing as well as the CMSSW version and the Global Tag for condition data. On Linux, pdfinfo (v0.12.4) does not print the correct number of pages: it says 12,052 while Adobe says 20,131. # Usage (works on ALL PDFs-whether password-protected or not!):

#PDFINFO CERN CMSSW PDF#

Here are a couple wrapper functions to test these: # get the total number of pages in a PDF technique 1. That's it! Wrapper functions and speed testing part with this regular expression ( (*)\.$), then I pipe that to grep again with this regular expression ( *) to find just the number, which is 142 in this case. So, I pipe that stderr msg to stdout with 2>&1, as explained here, then I pipe that to grep to match the (142).

Wrong page range given: the first page (1000000) can not be after the last page (142). How does this work? Well, if you specify a first page which is larger than the pages in the PDF (I specify page number 1000000, which is too large for all known PDFs), it will print the following error to stderr: # for a pdf WITH a password which is `1234` Here is a total hack using pdftoppm, which comes preinstalled on Ubuntu (tested on Ubuntu 18.04 and 20.04 at least): # for a pdf withOUT a password