OSINT Professional Course

Open Source Intelligence Training • 144 Lessons • 30 Sections

Progress: 0/144 completed

Discovering PDF Documents Associated with Targets

16 min Lesson 7 Search Engine OSINT – Google Search Operators
LESSON 7

Lesson 7: Discovering PDF Documents Associated with Targets

If you were tasked to gather information from a company website, then the file type search operator is going to be vitally important. This search operator will allow you to search for specific file types across the internet, meaning you can tell Google to show you only PDF files, Excel files, Word files, PowerPoint files, etc.

We can combine it with the site: search operator to find a specific file type on a certain website.

Example: To find all indexed PDF files on zsecurity.org, use the query:
site:zsecurity.org filetype:pdf

This search returns only PDF files from that domain (in the example, it showed 82 results). You can then open or download any of these files.

Once you download a PDF, you can analyze its metadata — hidden information embedded in the file that may include:

  • Author name
  • Creator/producer
  • Creation date and time
  • Software used to create the file
  • Sometimes geolocation data (covered in later lessons)

This metadata is not visible when simply opening the PDF in a viewer, but can be extracted using online metadata viewers or tools. Old software versions listed in metadata can sometimes reveal exploitable vulnerabilities.

Another example — targeting a person:
To find PDF files related to the name "rishi cabra", a useful query is:
filetype:pdf "rishi cabra"

Results may include documents where the person contributed or is mentioned. In one case:

  • A document listed contributor names and an email: rishicabra132@gmail.com (matching a previously noted GitHub username)
  • Another PDF from the person's university (SRM) mentioned their name, which cross-references with their LinkedIn profile

These techniques help uncover usernames, emails, affiliations, and other personal or organizational details that are publicly indexed but not immediately obvious.