Gather PDF Authors - Metasploit


This page contains detailed information about how to use the auxiliary/gather/http_pdf_authors metasploit module. For list of all metasploit modules, visit the Metasploit Module Library.

Module Overview


Name: Gather PDF Authors
Module: auxiliary/gather/http_pdf_authors
Source code: modules/auxiliary/gather/http_pdf_authors.rb
Disclosure date: -
Last modification time: 2019-05-29 22:36:50 +0000
Supported architecture(s): -
Supported platform(s): -
Target service / protocol: http, https
Target network port(s): 80, 443, 3000, 8000, 8008, 8080, 8443, 8880, 8888
List of CVEs: -

This module downloads PDF documents and extracts the author's name from the document metadata. This module expects a URL to be provided using the URL option. Alternatively, multiple URLs can be provided by supplying the path to a file containing a list of URLs in the URL_LIST option. The URL_TYPE option is used to specify the type of URLs supplied. By specifying 'pdf' for the URL_TYPE, the module will treat the specified URL(s) as PDF documents. The module will download the documents and extract the authors' names from the document metadata. By specifying 'html' for the URL_TYPE, the module will treat the specified URL(s) as HTML pages. The module will scrape the pages for links to PDF documents, download the PDF documents, and extract the author's name from the document metadata.

Module Ranking and Traits


Module Ranking:

  • normal: The exploit is otherwise reliable, but depends on a specific version and can't (or doesn't) reliably autodetect. More information about ranking can be found here.

Basic Usage


msf > use auxiliary/gather/http_pdf_authors
msf auxiliary(http_pdf_authors) > show targets
    ... a list of targets ...
msf auxiliary(http_pdf_authors) > set TARGET target-id
msf auxiliary(http_pdf_authors) > show options
    ... show and set options ...
msf auxiliary(http_pdf_authors) > exploit

Knowledge Base


This module downloads PDF files and extracts the author's name from the document metadata.

Verification Steps


  1. Start msfconsole
  2. Do: use auxiliary/gather/http_pdf_authors
  3. Do: set URL [URL]
  4. Do: run

Options


URL

The URL of a PDF to analyse.

URL_LIST

File containing a list of PDF URLs to analyze.

OUTFILE

File to store extracted author names.

Scenarios


URL

  msf auxiliary(http_pdf_authors) > set url http://127.0.0.1/test4.pdf
  url => http://127.0.0.1/test4.pdf
  msf auxiliary(http_pdf_authors) > run

  [*] Processing 1 URLs...
  [*] Downloading 'http://127.0.0.1/test4.pdf'
  [*] HTTP 200 -- Downloaded PDF (38867 bytes)
  [+] PDF Author: Administrator
  [*] 100.00% done (1/1 files)

  [+] Found 1 authors: Administrator
  [*] Auxiliary module execution completed

URL_LIST with OUTFILE

  msf auxiliary(http_pdf_authors) > set outfile /root/output
  outfile => /root/output
  msf auxiliary(http_pdf_authors) > set url_list /root/urls
  url_list => /root/urls
  msf auxiliary(http_pdf_authors) > run

  [*] Processing 8 URLs...
  [*] Downloading 'http://127.0.0.1:80/test.pdf'
  [*] HTTP 200 -- Downloaded PDF (89283 bytes)
  [*]  12.50% done (1/8 files)
  [*] Downloading 'http://127.0.0.1/test2.pdf'
  [*] HTTP 200 -- Downloaded PDF (636661 bytes)
  [+] PDF Author: sqlmap developers
  [*]  25.00% done (2/8 files)
  [*] Downloading 'http://127.0.0.1/test3.pdf'
  [*] HTTP 200 -- Downloaded PDF (167478 bytes)
  [+] PDF Author: Evil1
  [*]  37.50% done (3/8 files)
  [*] Downloading 'http://127.0.0.1/test4.pdf'
  [*] HTTP 200 -- Downloaded PDF (38867 bytes)
  [+] PDF Author: Administrator
  [*]  50.00% done (4/8 files)
  [*] Downloading 'http://127.0.0.1/test5.pdf'
  [*] HTTP 200 -- Downloaded PDF (34312 bytes)
  [+] PDF Author: ekama
  [*]  62.50% done (5/8 files)
  [*] Downloading 'http://127.0.0.1/doesnotexist.pdf'
  [*] HTTP 404 -- Downloaded PDF (289 bytes)
  [-] Could not parse PDF: PDF is malformed
  [*]  75.00% done (6/8 files)
  [*] Downloading 'https://127.0.0.1/test.pdf'
  [-] Connection failed: Failed to open TCP connection to 127.0.0.1:443 (Connection refused - connect(2) for "127.0.0.1" port 443)
  [*] Downloading 'https://127.0.0.1:80/test.pdf'
  [-] Connection failed: SSL_connect returned=1 errno=0 state=unknown state: unknown protocol

  [+] Found 4 authors: sqlmap developers, Evil1, Administrator, ekama
  [*] Writing data to /root/output...
  [*] Auxiliary module execution completed

Go back to menu.

Msfconsole Usage


Here is how the gather/http_pdf_authors auxiliary module looks in the msfconsole:

msf6 > use auxiliary/gather/http_pdf_authors

msf6 auxiliary(gather/http_pdf_authors) > show info

       Name: Gather PDF Authors
     Module: auxiliary/gather/http_pdf_authors
    License: Metasploit Framework License (BSD)
       Rank: Normal

Provided by:
  bcoles <[email protected]>

Check supported:
  No

Basic options:
  Name        Current Setting  Required  Description
  ----        ---------------  --------  -----------
  STORE_LOOT  true             no        Store authors in loot
  URL                          no        The target URL
  URL_LIST                     no        File containing a list of target URLs
  URL_TYPE    html             yes       The type of URL(s) specified (Accepted: pdf, html)

Description:
  This module downloads PDF documents and extracts the author's name 
  from the document metadata. This module expects a URL to be provided 
  using the URL option. Alternatively, multiple URLs can be provided 
  by supplying the path to a file containing a list of URLs in the 
  URL_LIST option. The URL_TYPE option is used to specify the type of 
  URLs supplied. By specifying 'pdf' for the URL_TYPE, the module will 
  treat the specified URL(s) as PDF documents. The module will 
  download the documents and extract the authors' names from the 
  document metadata. By specifying 'html' for the URL_TYPE, the module 
  will treat the specified URL(s) as HTML pages. The module will 
  scrape the pages for links to PDF documents, download the PDF 
  documents, and extract the author's name from the document metadata.

Module Options


This is a complete list of options available in the gather/http_pdf_authors auxiliary module:

msf6 auxiliary(gather/http_pdf_authors) > show options

Module options (auxiliary/gather/http_pdf_authors):

   Name        Current Setting  Required  Description
   ----        ---------------  --------  -----------
   STORE_LOOT  true             no        Store authors in loot
   URL                          no        The target URL
   URL_LIST                     no        File containing a list of target URLs
   URL_TYPE    html             yes       The type of URL(s) specified (Accepted: pdf, html)

Advanced Options


Here is a complete list of advanced options supported by the gather/http_pdf_authors auxiliary module:

msf6 auxiliary(gather/http_pdf_authors) > show advanced

Module advanced options (auxiliary/gather/http_pdf_authors):

   Name                  Current Setting                                     Required  Description
   ----                  ---------------                                     --------  -----------
   DOMAIN                WORKSTATION                                         yes       The domain to use for Windows authentication
   DigestAuthIIS         true                                                no        Conform to IIS, should work for most servers. Only set to false for non-IIS servers
   FingerprintCheck      true                                                no        Conduct a pre-exploit fingerprint verification
   HttpClientTimeout                                                         no        HTTP connection and receive timeout
   HttpPassword                                                              no        The HTTP password to specify for authentication
   HttpRawHeaders                                                            no        Path to ERB-templatized raw headers to append to existing headers
   HttpTrace             false                                               no        Show the raw HTTP requests and responses
   HttpTraceColors       red/blu                                             no        HTTP request and response colors for HttpTrace (unset to disable)
   HttpTraceHeadersOnly  false                                               no        Show HTTP headers only in HttpTrace
   HttpUsername                                                              no        The HTTP username to specify for authentication
   SSLVersion            Auto                                                yes       Specify the version of SSL/TLS to be used (Auto, TLS and SSL23 are auto-negotiate) (Accepted: Auto, TLS, SSL23, SSL3, TLS1, TLS1.1, TLS1.2)
   UserAgent             Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)  no        The User-Agent header to use for all requests
   VERBOSE               false                                               no        Enable detailed status messages
   WORKSPACE                                                                 no        Specify the workspace for this module

Auxiliary Actions


This is a list of all auxiliary actions that the gather/http_pdf_authors module can do:

msf6 auxiliary(gather/http_pdf_authors) > show actions

Auxiliary actions:

   Name  Description
   ----  -----------

Evasion Options


Here is the full list of possible evasion options supported by the gather/http_pdf_authors auxiliary module in order to evade defenses (e.g. Antivirus, EDR, Firewall, NIDS etc.):

msf6 auxiliary(gather/http_pdf_authors) > show evasion

Module evasion options:

   Name                          Current Setting  Required  Description
   ----                          ---------------  --------  -----------
   HTTP::header_folding          false            no        Enable folding of HTTP headers
   HTTP::method_random_case      false            no        Use random casing for the HTTP method
   HTTP::method_random_invalid   false            no        Use a random invalid, HTTP method for request
   HTTP::method_random_valid     false            no        Use a random, but valid, HTTP method for request
   HTTP::pad_fake_headers        false            no        Insert random, fake headers into the HTTP request
   HTTP::pad_fake_headers_count  0                no        How many fake headers to insert into the HTTP request
   HTTP::pad_get_params          false            no        Insert random, fake query string variables into the request
   HTTP::pad_get_params_count    16               no        How many fake query string variables to insert into the request
   HTTP::pad_method_uri_count    1                no        How many whitespace characters to use between the method and uri
   HTTP::pad_method_uri_type     space            no        What type of whitespace to use between the method and uri (Accepted: space, tab, apache)
   HTTP::pad_post_params         false            no        Insert random, fake post variables into the request
   HTTP::pad_post_params_count   16               no        How many fake post variables to insert into the request
   HTTP::pad_uri_version_count   1                no        How many whitespace characters to use between the uri and version
   HTTP::pad_uri_version_type    space            no        What type of whitespace to use between the uri and version (Accepted: space, tab, apache)
   HTTP::uri_dir_fake_relative   false            no        Insert fake relative directories into the uri
   HTTP::uri_dir_self_reference  false            no        Insert self-referential directories into the uri
   HTTP::uri_encode_mode         hex-normal       no        Enable URI encoding (Accepted: none, hex-normal, hex-noslashes, hex-random, hex-all, u-normal, u-all, u-random)
   HTTP::uri_fake_end            false            no        Add a fake end of URI (eg: /%20HTTP/1.0/../../)
   HTTP::uri_fake_params_start   false            no        Add a fake start of params to the URI (eg: /%3fa=b/../)
   HTTP::uri_full_url            false            no        Use the full URL for all HTTP requests
   HTTP::uri_use_backslashes     false            no        Use back slashes instead of forward slashes in the uri
   HTTP::version_random_invalid  false            no        Use a random invalid, HTTP version for request
   HTTP::version_random_valid    false            no        Use a random, but valid, HTTP version for request

Go back to menu.

Error Messages


This module may fail with the following error messages:

Check for the possible causes from the code snippets below found in the module source code. This can often times help in identifying the root cause of the problem.

No URL(s) specified


Here is a relevant code snippet related to the "No URL(s) specified" error message:

52:	
53:	  def load_urls
54:	    return [ datastore['URL'] ] unless datastore['URL'].to_s.eql? ''
55:	
56:	    if datastore['URL_LIST'].to_s.eql? ''
57:	      fail_with Failure::BadConfig, 'No URL(s) specified'
58:	    end
59:	
60:	    unless File.file? datastore['URL_LIST'].to_s
61:	      fail_with Failure::BadConfig, "File '#{datastore['URL_LIST']}' does not exist"
62:	    end

File '<URL_LIST>' does not exist


Here is a relevant code snippet related to the "File '<URL_LIST>' does not exist" error message:

56:	    if datastore['URL_LIST'].to_s.eql? ''
57:	      fail_with Failure::BadConfig, 'No URL(s) specified'
58:	    end
59:	
60:	    unless File.file? datastore['URL_LIST'].to_s
61:	      fail_with Failure::BadConfig, "File '#{datastore['URL_LIST']}' does not exist"
62:	    end
63:	
64:	    File.open(datastore['URL_LIST'], 'rb') { |f| f.read }.split(/\r?\n/)
65:	  end
66:	

Could not parse PDF: PDF is malformed (MalformedPDFError)


Here is a relevant code snippet related to the "Could not parse PDF: PDF is malformed (MalformedPDFError)" error message:

70:	    Timeout.timeout(10) do
71:	      reader = PDF::Reader.new data
72:	      return parse reader
73:	    end
74:	  rescue PDF::Reader::MalformedPDFError
75:	    print_error "Could not parse PDF: PDF is malformed (MalformedPDFError)"
76:	    return
77:	  rescue PDF::Reader::UnsupportedFeatureError
78:	    print_error "Could not parse PDF: PDF contains unsupported features (UnsupportedFeatureError)"
79:	    return
80:	  rescue SystemStackError

Could not parse PDF: PDF contains unsupported features (UnsupportedFeatureError)


Here is a relevant code snippet related to the "Could not parse PDF: PDF contains unsupported features (UnsupportedFeatureError)" error message:

73:	    end
74:	  rescue PDF::Reader::MalformedPDFError
75:	    print_error "Could not parse PDF: PDF is malformed (MalformedPDFError)"
76:	    return
77:	  rescue PDF::Reader::UnsupportedFeatureError
78:	    print_error "Could not parse PDF: PDF contains unsupported features (UnsupportedFeatureError)"
79:	    return
80:	  rescue SystemStackError
81:	    print_error "Could not parse PDF: PDF is malformed (SystemStackError)"
82:	    return
83:	  rescue SyntaxError

Could not parse PDF: PDF is malformed (SystemStackError)


Here is a relevant code snippet related to the "Could not parse PDF: PDF is malformed (SystemStackError)" error message:

76:	    return
77:	  rescue PDF::Reader::UnsupportedFeatureError
78:	    print_error "Could not parse PDF: PDF contains unsupported features (UnsupportedFeatureError)"
79:	    return
80:	  rescue SystemStackError
81:	    print_error "Could not parse PDF: PDF is malformed (SystemStackError)"
82:	    return
83:	  rescue SyntaxError
84:	    print_error "Could not parse PDF: PDF is malformed (SyntaxError)"
85:	    return
86:	  rescue Timeout::Error

Could not parse PDF: PDF is malformed (SyntaxError)


Here is a relevant code snippet related to the "Could not parse PDF: PDF is malformed (SyntaxError)" error message:

79:	    return
80:	  rescue SystemStackError
81:	    print_error "Could not parse PDF: PDF is malformed (SystemStackError)"
82:	    return
83:	  rescue SyntaxError
84:	    print_error "Could not parse PDF: PDF is malformed (SyntaxError)"
85:	    return
86:	  rescue Timeout::Error
87:	    print_error "Could not parse PDF: PDF is malformed (Timeout)"
88:	    return
89:	  rescue => e

Could not parse PDF: PDF is malformed (Timeout)


Here is a relevant code snippet related to the "Could not parse PDF: PDF is malformed (Timeout)" error message:

82:	    return
83:	  rescue SyntaxError
84:	    print_error "Could not parse PDF: PDF is malformed (SyntaxError)"
85:	    return
86:	  rescue Timeout::Error
87:	    print_error "Could not parse PDF: PDF is malformed (Timeout)"
88:	    return
89:	  rescue => e
90:	    print_error "Could not parse PDF: Unhandled exception: #{e}"
91:	    return
92:	  end

Could not parse PDF: Unhandled exception: <E>


Here is a relevant code snippet related to the "Could not parse PDF: Unhandled exception: <E>" error message:

85:	    return
86:	  rescue Timeout::Error
87:	    print_error "Could not parse PDF: PDF is malformed (Timeout)"
88:	    return
89:	  rescue => e
90:	    print_error "Could not parse PDF: Unhandled exception: #{e}"
91:	    return
92:	  end
93:	
94:	  def parse(reader)
95:	    # PDF

Here is a relevant code snippet related to the "Found no links to PDF files" error message:

112:	
113:	    if datastore['URL_TYPE'].eql? 'html'
114:	      urls = extract_pdf_links urls
115:	
116:	      if urls.empty?
117:	        print_error 'Found no links to PDF files'
118:	        return
119:	      end
120:	
121:	      print_line
122:	      print_good "Found links to #{urls.size} PDF files:"

Found no authors


Here is a relevant code snippet related to the "Found no authors" error message:

127:	    authors = extract_authors urls
128:	
129:	    print_line
130:	
131:	    if authors.empty?
132:	      print_status 'Found no authors'
133:	      return
134:	    end
135:	
136:	    print_good "Found #{authors.size} authors: #{authors.join ', '}"
137:	

Warning: Truncated author's name at <MAX_LEN> characters


Here is a relevant code snippet related to the "Warning: Truncated author's name at <MAX_LEN> characters" error message:

172:	      pdf.puts file
173:	      author = read pdf
174:	      unless author.blank?
175:	        print_good "PDF Author: #{author}"
176:	        if author.length > max_len
177:	          print_warning "Warning: Truncated author's name at #{max_len} characters"
178:	          authors << author[0...max_len]
179:	        else
180:	          authors << author
181:	        end
182:	      end

Go back to menu.


Go back to menu.

See Also


Check also the following modules related to this module:

Authors


bcoles

Version


This page has been produced using Metasploit Framework version 6.1.36-dev. For more modules, visit the Metasploit Module Library.

Go back to menu.