How do I find the orientation of a PDF using PHP or a Linux script?

I need to scan an uploaded PDF to determine if the pages within are all portrait or if there are any landscape pages. Is there someway I can use PHP or a linux command to scan the PDF for these pages?


(Updated answer -- scroll down...)

You can use either pdfinfo (part of either the poppler-utils or the xpdf-tools) or identify (part of the ImageMagick toolkit).

identify -format "%f  Page %s:   Width: %W -- Height: %H\n" T-VD7.PDF

Example output:

T-VD7.PDF  Page 0:  Width: 595 -- Height: 842
T-VD7.PDF  Page 1:  Width: 595 -- Height: 842
T-VD7.PDF  Page 2:  Width: 1191 -- Height: 842
T-VD7.PDF  Page 11:  Width: 595 -- Height: 421
T-VD7.PDF  Page 12:  Width: 595 -- Height: 842

Or a bit simpler:

identify -format "%s: %Wx%H\n" T-VD7.PDF


0:  595x842
1:  595x842
2:  1191x842
11:  595x421
12:  595x842

Note, how identify uses a zero-based page counting mechanism!

Pages are 'landscape' if their width is bigger than their height. They are neither-nor, if both are equal.

The advantage is that identify lets you tweak the output format quite easily and very extensively.

pdfinfo input.pdf | grep "Page.*size:"

Example output:

Page size:      595.276 x 841.89 pts (A4)

pdfinfo is definitely faster and also more precise than identify, if it comes to multi-page PDFs. (The 13-page PDF I tested this with took identify to 31 seconds to process, whereas pdfinfo needed less than half a second....)

Be warned: by default pdfinfo does report the size for the first page only. To get sizes for all pages (as you may know, there are PDFs which use mixed page sizes as well as mixed orientations), you have to modify the command:

pdfinfo -f 3 -l 13 input.pdf | grep "Page.*size:"

Output now:

Page    1 size: 595.276 x 841.89 pts (A4)
Page    2 size: 595.276 x 841.89 pts (A4)
Page    3 size: 1191 x 842 pts (A3)
Page   12 size: 595 x 421 pts (A5)
Page   13 size: 595.276 x 841.89 pts (A4)

This will print the sizes of page 3 (f irst to report) through page 13 (l ast to report).

Scripting it:
  pdfinfo \
    -f 1 \
    -l 1000 \
     Vergleich-VD7.PDF \
| grep "Page.* size:" \
| \
| while read Page _pageno size _width x _height rest; do 
  [ "$(echo "${_width} / 1"|bc)" -gt "$(echo "${_height} / 1"|bc)" ] \
     && echo "Page $_pageno is landscape..." \
    || echo "Page $_pageno is portrait..."  ; \

(the bc-trick is required because the -gt comparison works for the shell only with integers. Dividing by 1 with bc will take round the possible real values to integers...)


Page 1 is portrait...
Page 2 is portrait...
Page 3 is landscape...
Page 12 is landscape...
Page 13 is portrait...

Update: Using the 'right' pdfinfo to discover page rotations...

My initial answer tooted the horn of pdfinfo. Serenade X says in a comment that his/her problem is to discover rotated pages.

Ok now, here is some additional info which is not yet known widely and therefor has not yet been really absorbed by all pdfinfo users...

As I mentioned, there are two different pdfinfo utilities around:

  1. the one which comes as part of the xpdf-utils package (on some platform also named xpdf-tools).
  2. the one which comes as part of the poppler-utils package (on some platforms also named poppler-tools, and sometimes it is not separated out as a packages but is part of the main poppler package).
Poppler's pdfinfo output

So here is a sample output from Poppler's pdfinfo command. The tested file is a 2-page PDF where the first page is in portrait A4 and the second page is in landscape A4 format:

kp@mbp:~$  pdfinfo -f 1 -l 2 a4portrait+landscape.pdf 
Producer:       GPL Ghostscript 9.05
CreationDate:   Thu Jul 26 14:23:31 2012
ModDate:        Thu Jul 26 14:23:31 2012
Tagged:         no
Form:           none
Pages:          2
Encrypted:      no
Page    1 size: 595 x 842 pts (A4)
Page    1 rot:  0
Page    2 size: 842 x 595 pts (A4)
Page    2 rot:  0
File size:      3100 bytes
Optimized:      no
PDF version:    1.4

Do you see the lines saying Page 1 rot: 0 and Page 1 rot: 0?

Do you notice the lines saying Page 1 size: 595 x 842 pts (A4) and Page 2 size: 842 x 595 pts (A4) and the differences between the two?

XPDF's pdfinfo output

Now let's compare this to the output of XPDF's pdfinfo:

kp@mbp:~$  xpdf-pdfinfo -f 1 -l 2 a4portrait+landscape.pdf 
Producer:       GPL Ghostscript 9.05
CreationDate:   Thu Jul 26 14:23:31 2012
ModDate:        Thu Jul 26 14:23:31 2012
Tagged:         no
Pages:          2
Encrypted:      no
Page    1 size: 595 x 842 pts (A4)
Page    2 size: 842 x 595 pts (A4)
File size:      3100 bytes
Optimized:      no
PDF version:    1.4

You may notice one more difference, if you look closely enough. I won't point my finger to it, and will keep my mouth shut for now... :-)

Poppler's pdfinfo correctly reports rotation of page 2

Next, I rotate the second page of the file by 90 degrees using pdftk (I don't have Adobe Acrobat around):

pdftk \
  a4portrait+landscape.pdf \
  cat 1 2E \
  output a4portrait+landscape---page2-landscaped-by-pdftk.pdf 

Now Poppler's pdfinfo reports this:

kp@mbp:~$  pdfinfo -f 1 -l 2 a4portrait+landscape---page2-landscaped-by-pdftk.pdf 
Creator:        pdftk 1.44 -
Producer:       itext-paulo-155 (
CreationDate:   Thu Jul 26 14:39:47 2012
ModDate:        Thu Jul 26 14:39:47 2012
Tagged:         no
Form:           none
Pages:          2
Encrypted:      no
Page    1 size: 595 x 842 pts (A4)
Page    1 rot:  0
Page    2 size: 842 x 595 pts (A4)
Page    2 rot:  90
File size:      1759 bytes
Optimized:      no
PDF version:    1.4

As you can see, the line Page 2 rot: 90 tells us what we are looking for. XPDF's pdfinfo would essentially report the same info about the changed file as it does about the original one. Of course, it would still correctly capture the changed Creator:, Producer: and *Date: infos, but it would miss the rotated page...

Also note this detail: page 2 originally was designed as a landscape page, which can be seen from the Page 2 size: 842 x 595 pts (A4) info part. However, it shows up in the current PDF as a portrait page, as can be seen by the Page 2 rot: 90 part.

Also note that there are 4 different values that could appear for the rotation info:

  • 0 (no rotation),
  • 90 (rotation to the East, or 90 degrees clockwise),
  • 180 (rotation to the South, tumbled page image, upside-down, or 180 degrees clockwise),
  • 270 (rotation to the West, or 90 degrees counter-clockwise, or 270 degrees clockwise).
Some Background Info

Popper (developed by The Poppler Developers) is a fork of XPDF (developed by Glyph & Cog LLC), that happened around 2005. (As one of their important reason for their forking the Poppler developer at the time gave: Glyph & Cog didn't always provide timely bugfixes for security-related problems...)

Anyway, the Poppler fork for a very long time kept the associated commandline utilities, their commandline parameters and syntax as well as the format of their output compatible to the original (XPDF/Glyph & Cog LLC) ones.

Existing Poppler tools gaining additional features over competing XPDF tools

However, more recently they started to add additional features. Out of the top of my head:

  • pdfinfo now also reports the rotation status of each page (starting with Poppler v0.19.0, released March 1st, 2012).
  • pdffonts now also reports the font encoding for each font (starting with Poppler v0.19.1, released March 15th, 2012).
Poppler tools getting more siblings

The Poppler tools also provide some extra commandline utilities which are not in the original XPDF package (some of which have been added only quite recently):

  • pdftocairo - utility for creating PNG, JPEG, PostScript, EPS, PDF, SVG (using Cairo)
  • pdfseparate - utility to extract PDF pages
  • pdfunite - utility to merge PDF files
  • pdfdetach - utility to list or extract embedded files from a PDF
  • pdftohtml - utility to convert HTML from PDF files

identify which comes with ImageMagick will give you the width and height of a given PDF file (it also requires GhostScript be installed on the system).

$ identify -format "%g\n" FILENAME.PDF

Where 1417 is the width, 1106 is the height, and you (for this purpose) can ignore the +0+0.

Edit: Sorry, I was referring to Mike B's comment on the original question - as he said, after knowing the width and height you can determine if you have a portrait or landscape image (if height > width then portrait else landscape).

Also, the \n added to the -format argument (as suggested by Kurt Pfeifle) will separate each page into its own line. He also mentions the %W and %H format parameters; all the possible format parameters can be found here (there are a lot of them).

Need Your Help

How can I "force" a branch upon the trunk, in the case I can't "reintegrate"?

svn merge branch

We created a branch from the trunk on which a major refactoring was done. Meanwhile, the trunk advanced a few revisions with some fixes. We don't want these changes on the branch, so we don't want ...

adhoc struct/class in C#?

c# generics reflection class anonymous-class

Currently i am using reflection with sql. I find if i want to make a specialize query it is easiest to get the results by creating a new class inheriting from another and adding the 2 members/colum...