File indexing or documents feature

Documents feature

One of the gaps in native WordPress search capabilities is file content search. In other words, native WP search can’t search the contents of a file. ElasticProbe covers this gap with its document feature.

Document indexing or file indexing enables the users to search through file contents. ElasticProbe supports PDF and Microsoft Office file formats. The default file format list is:

  • .docx (Current Microsoft Word file format)
  • .doc (Old Microsoft Word file format)
  • .xlsx (Current Microsoft Excel file format)
  • .xls (Old Microsoft Excel file format)
  • .ppt (Old Microsoft PowerPoint file format)
  • .pptx (current Microsoft PowerPoint file format)
  • .pdf (Adobe PDF)
  • .csv (Comma seperated values)
  • .txt (Plain text file)

Extending document fileĀ  indexing

ElasticProbe users can extend the default file format list with a custom code snippet. Check out the available file formats. If you are planning to extend these on your own ES, please be aware that the list depends on the precise version of Apache Tika used on your ES deployment.

To add new file formats to the list of defaults, first check your file formats MIME type online and then use the following code snippet.

				
					add_filter(
    'eprobe_allowed_documents_ingest_mime_types',
    function() {
    			array(
				'pdf'  => 'application/pdf',
				'ppt'  => 'application/vnd.ms-powerpoint',
				'pptx' => 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
				'xls'  => 'application/vnd.ms-excel',
				'xlsx' => 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
				'doc'  => 'application/msword',
				'docx' => 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
				'csv'  => 'text/csv',
				'txt'  => 'text/plain',
			)
    }
);
				
			

Prerequisites

If you are using or planning to use our hosted SaaS service there is nothing to worry about. ElasticProbe’s got you covered. On the other hand, if you are using ElasticProbe plugin with your own ElasticSearch, there are some notes.

To enable ElasticSearch to ingest documents, “ingest attachment” is required. On version 7.x of ElasticSearch this is a plugin that requires manual installation. On version 8.x and more recent version this has been an internal part of ElasticSearch known as “ingest attachment processor”.