How to search pdf contents in SQL Server

Actually SQL Server by default does not support content search from PDF files. DMV sys.fulltext_document_types tells us the file types that are supported in your SQL Server instance. To read/index PDF files in SQL Server, we need to load a plugin/class on SQL Server instance. In this post we will see, how can we index keywords from a PDF file and search using fulltext index.

Step 01: Enable FILESTREAM

filestream-enable

Step 02: Create a filestream supported table to store the documents. The table schema is self-descriptive.

Step 03: Check if the SQL Server instance supports PDF; if not then install the same.

If no rows returned, then we need to install the pdf plug-in iFilter. Follow the installation steps.

Note: Consider iFilter v9 for 64 bit SQL Server.

Download link: Adobe iFilter

  • Install the PDFiFilter plugin.
  • Set the environment PATH variable. ‘C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\’
    • RUN → control sysdm.cpl → Advanced → Environment Variables
    • iFilter 9 bin folder path may vary as per your installation.
  • Restart the PC/Server to take effect of env path.
  • Register the plugin with the SQL server full text index by executing the commands below:
  • Now below query must return a row otherwise there is some problem with the iFilter installation and configuration.

Step 04: Enable the full-text feature on the database.

Step 05: Create full-text index catalog.

Step 06: Create full-text index and enable it on the Document table.

Step 08: Insert a few PDF files into the Document table.

Step 09: Search the keywords in PDFs.

 

Prasad Sahoo

Prasad Sahoo, is an enthusiast of Microsoft technologies in general and a passionate database professional, mainly focusing on SQL Server performance tuning and business intelligence. He has contributed a long time of his career working on SQL Server and other RDBMS. If he is not busy with SQL stuff then he must be watching some Hollywood movies.

Leave a Reply

%d bloggers like this: