examine pdf search

AB 29 posts 80 karma points

Jun 11, 2014 @ 10:10

Using Umbraco 7

Hi All,

I am putting together a query to return pdfs by name. I followed many tutorials on the subject but I can never seem to return any pdf results. The index looks ok and is present.

Here is my code:

var searcherPdf = ExamineManager.Instance.SearchProviderCollection["PDFSearcher"];
                var searchCriteriaPdf = searcherPdf.CreateSearchCriteria();
                var queryPdf = searchCriteriaPdf.Field("name", Request.QueryString["q"]).Or().Field("nodeName", Request.QueryString["q"]).Compile();
                var searchResultsPdf = searcherPdf.Search(queryPdf);

Are there any obvious mistakes? Can anyone point me to a good example for a config that is know to work?

Thanks

Adam

Copy Link

Dave Woestenborghs 3504 posts 12133 karma points MVP 8x admin c-trib

Jun 11, 2014 @ 12:32

Are the fields name and nodename present in your pdf index ?

Can you post your examine config files ?

Dave

Copy Link

AB 29 posts 80 karma points

Jun 12, 2014 @ 10:01

Hi Dave,

Here are the config files:

<ExamineLuceneIndexSets>
  <!-- The internal index set used by Umbraco back-office - DO NOT REMOVE -->
  <IndexSet SetName="InternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Internal/"/>

  <!-- The internal index set used by Umbraco back-office for indexing members - DO NOT REMOVE -->
  <IndexSet SetName="InternalMemberIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/InternalMember/">
    <IndexAttributeFields>
      <add Name="id" />
      <add Name="nodeName"/>
      <add Name="updateDate" />
      <add Name="writerName" />
      <add Name="loginName" />
      <add Name="email" />
      <add Name="nodeTypeAlias" />
    </IndexAttributeFields>
  </IndexSet>
    
  <IndexSet SetName="PDFIndexSet" IndexPath="~/App_Data/ExamineIndexes/PDFIndexSet/" />
  <!-- Default Indexset for external searches, this indexes all fields on all types of nodes-->
  <IndexSet SetName="ExternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/External/" />
  
</ExamineLuceneIndexSets>

 <ExamineIndexProviders>
    <providers>
      <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
           supportUnpublished="true"
           supportProtected="true"
           analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>

      <add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
           supportUnpublished="true"
           supportProtected="true"
           analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>

        <!-- default external indexer, which excludes protected and unpublished pages-->
        <add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"/>

      <add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF"
         umbracoFileProperty="umbracoFile" />
      <!--<add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF"/>-->

      <!--<add name="ContentSearchIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" />-->
      
    </providers>
  </ExamineIndexProviders>

  <ExamineSearchProviders defaultProvider="ExternalSearcher">
    <providers>
      <add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
           analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
        
      <add name="ExternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
      
      <add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
           analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcard="true"/>

      <add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
      
    </providers>
  </ExamineSearchProviders>

Thanks in advance,

Adam

Copy Link

Dave Woestenborghs 3504 posts 12133 karma points MVP 8x admin c-trib

Jun 12, 2014 @ 10:31

Config seems okay.

According to this post the examine PDF index contains 2 fields that can be returned : http://our.umbraco.org/forum/developers/api-questions/35922-Problems-with-PDFIndexer-PDFSearcher

FileTextContent
__NodeId (yes, that's two underscores)

So you need to update your search query.

Dave

Copy Link

Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib

Jun 12, 2014 @ 10:52

Adam,

Did you look at the index with luke or examine inspector just to see you have data in the index?

Regards

Ismail

Copy Link

Matt Taylor 873 posts 2086 karma points

Sep 08, 2015 @ 08:22

Hello all,

Is PDF searching in the core of Umbraco 7?

I've not seen a great deal about how to use it and seen many OLD posts about people doing it with 3rd party packages so I'm just looking for confirmation before I start trying to make it work, probably by copying the code above.

Kind regards,

Matt

Copy Link

Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib

Sep 08, 2015 @ 08:51

Examine does have pdf indexer and searcher, although you have to add to the config to set it up if you look at above posts with configs you will see how to do it. The other option which is not out of the box is the cogmedia indexer i wrote many moons ago that was more for indexing pdf and other stuff like word / excel to name a few.

Regards

Ismail

Copy Link

Matt Taylor 873 posts 2086 karma points

Sep 08, 2015 @ 09:20

Thanks Ismail,

Yes I've used the cogworks package for PDF indexing in the past. It was good. :-)

This post led me to believe it was finally working in the core but I wasn't clear. I tried using it in the core many moons ago but go nowhere, hence the use of your package.

I'll give it another bash now, on your say so. ;-)

Kind regards,

Matt

Copy Link

is working on a reply...

Flag this post as spam?

Examine pdf search