umbraco examine search result highlighting - Using Umbraco And Getting Started

Thanh Pham 23 posts 140 karma points

Feb 08, 2018 @ 01:36

Umbraco Examine - Search result highlighting

Using Umbraco And Getting Started

Umbraco 7

Hi guys,

I'm trying implement the search result highlighting (like Google) within an Umbraco web app. I followed this https://our.umbraco.org/forum/developers/extending-umbraco/13571-Umbraco-Examine-Search-Results-Highlighting, however it's 8 years old and I want to target multiple fields with fuzzy search so below is my code:

        var stdAnalyzer = new StandardAnalyzer(Version.LUCENE_29);
        var formatter = new SimpleHTMLFormatter();
        var finalQuery = new BooleanQuery();
        var tmpQuery = new BooleanQuery();

        var multiQueryParser = new MultiFieldQueryParser(Version.LUCENE_29, fields, stdAnalyzer);
        var externalIndexSet = Examine.LuceneEngine.Config.IndexSets.Instance.Sets["ExternalIndexSet"];
        var externalSearcher = new IndexSearcher($"{externalIndexSet.IndexDirectory.FullName}\\Index", true);
        var terms = searchTerm.RemoveStopWords().Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

        foreach (var term in terms)
        {
            tmpQuery.Add(multiQueryParser.Parse(term.Replace("~", "") + $@"~{fuzzyScore}"),
                BooleanClause.Occur.SHOULD);
        }
        tmpQuery.Add(multiQueryParser.Parse("noIndex:1"), BooleanClause.Occur.MUST_NOT);

        finalQuery.Add(multiQueryParser.Parse($@"{tmpQuery}"),
            BooleanClause.Occur.MUST);
        finalQuery.Add(multiQueryParser.Parse("__IndexType:content"), BooleanClause.Occur.MUST);


        var hits = externalSearcher.Search(finalQuery, 100);
        var qs = new QueryScorer(finalQuery);
        var highlighter = new Highlighter(formatter, qs);
        var fragmenter = new SimpleFragmenter();
        highlighter.SetTextFragmenter(fragmenter);
        highlighter.SetMaxDocBytesToAnalyze(int.MaxValue);

        foreach (var item in hits.ScoreDocs)
        {                
            var document = externalSearcher.Doc(item.doc);
            var description = document.Get("description");
            var tokenStream = TokenSources.GetTokenStream(externalSearcher.GetIndexReader(), item.doc,
                "description", stdAnalyzer);
            var frags = highlighter.GetBestFragments(tokenStream, description, 10);
        }

        externalSearcher.Dispose();

Everything seems working fine except I can't get token stream regardless how many different methods from different classes I've tried, therefore no frags returned. I then looked at the lucene.net source code here at https://lucenenet.apache.org/docs/3.0.3/df/d43/tokensources8cssource.html and found that the method GetTokenStream will throw an ArgumentException (see image below) if the "description" field I use above is not TermPositionVector. I got exactly this exception when I debugged it. How do I fix this issue?

enter image description here

I use default ExternalSearcher & ExternalIndexSet provided by Umbraco (7.7.6) to index & query content within BackOffice.

Thanks.

TP

Flag this post as spam?

Umbraco Examine - Search result highlighting