8 votes

ExamineFileIndexer

Custom Examine indexer to index any umbraco media nodes. Under the hood it makes use of Apache Tika to extract content and meta data from umbraco media files. Tika can handle the following formats. The package also supports VPP (Virtual path provider) so if your media files are in azure etc it will also index those.

Getting started

This package is supported on Umbraco 7.6.1+.

Installation

ExamineFileIndexer is available from Our Umbraco, NuGet, or as a manual download directly from GitHub.

Our Umbraco repository

PLEASE NOTE THAT THE UMBRACO PACKAGE DOES NOT CONTAIN THE TIKA DLLS, THIS IS DUE TO 10MB FILE SIZE RESTRICTION ON OUR. YOU NEED TO INSTALL THE PACKAGE THEN COPY OVER TIKA DLLS FOUND ON DROP BOX (https://www.dropbox.com/s/1d9wkom2gbaax2h/tika.zip?dl=0).

DOWNLOAD THE ZIP AND EXTRACT THE DLLS TO THE BIN FOLDER OF YOUR SITE AFTER INSTALLING THE UMBRACO PACKAGE.

Usage

After installation your ExamineIndex.config and ExamineSettings.config file will updated. The following entries will be added.

ExamineIndex.config

  <IndexSet SetName="MediaIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/MediaIndexSet">
    <IndexAttributeFields>
      <add Name="id" />
      <add Name="nodeName" />
      <add Name="updateDate" />
      <add Name="writerName" />
      <add Name="path" />
      <add Name="nodeTypeAlias" />
      <add Name="parentID" />
    </IndexAttributeFields>
    <IncludeNodeTypes>
      <add Name="File" />
    </IncludeNodeTypes>
  </IndexSet>

ExamineSettings.config

Under ExamineIndexProviders/providers:

<add name="MediaIndexer" type="Cogworks.ExamineFileIndexer.UmbracoMediaFileIndexer, Cogworks.ExamineFileIndexer" 
extensions=".pdf,.docx" 
umbracoFileProperty="umbracoFile" />

Under ExamineSearchProviders/providers:

<add name="MediaSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" indexSet="MediaIndexSet" 
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" />

By default the following file types will be indexed: pdfdocx. To add other file types to index you need to update ExamineSettings.config:

<add name="MediaIndexer" type="Cogworks.ExamineFileIndexer.UmbracoMediaFileIndexer, Cogworks.ExamineFileIndexer" 
extensions=".pdf,.docx" 
umbracoFileProperty="umbracoFile" />

Update the extensions attribute and add any other file types. They need to be separated by colons (,).

You can also add the image file types eg. .jpgPLEASE NOTE INDEXING IMAGES WILL ONLY ADD EXIF META DATA.

Project owner

The Cogworks

The Cogworks

The has 328 karma points

Project Compatibility

This project is compatible with the following versions as reported by community members who have downloaded this package:
Untested or doesn't work on Umbraco Cloud
7.7.x (untested)
7.6.x (100%)
7.5.x (untested)
7.4.x (untested)
7.3.x (untested)
7.2.x (untested)
7.1.x (untested)
7.0.x (untested)
6.1.x (untested)
6.0.x (untested)

You must login before you can report on package compatibility.

Project Information

  • Project owner: The Cogworks
  • Created: 14/06/2017
  • Current version 1.0.3
  • .net Version 4.5.2
  • License MIT
  • Downloads: 147

External resources