Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Sep 30, 2016 @ 09:05
    Ismail Mayat
    1

    Examine query generation issue

    Guys,

    I have the following code:

     BaseSearchProvider searcher;  
    
    string fieldToSearch = "contents";
    
    string HideFromNavigation = "umbracoNaviHide";
    
    var umbraco = new UmbracoHelper(UmbracoContext.Current);
    
    IPublishedContent currentNode = umbraco.TypedContent(Model.Id);
    
    //if we on chinese use use chinese searcher
    if (currentNode.GetCulture().TwoLetterISOLanguageName.ToLower().Contains("zh"))
    {
        searcher = ExamineManager.Instance.SearchProviderCollection["External_CN_Searcher"];
    }
    else
    {
        searcher = ExamineHelper.GetMultiSearcher(new[] { "ExternalIndexer", "PDFIndexer" });
    }
    
    var criteria = searcher.CreateSearchCriteria(BooleanOperation.And);
    
    var searchTerm = string.Empty;
    
    searchTerm = string.IsNullOrEmpty(Request["query"]) ? string.Empty : Request["query"];
    
    searchTerm = searchTerm.MakeSearchQuerySafe();
    
    if (searchTerm == string.Empty)
    {
        <p>Enter search term</p>
    }
    else
    {
    
        int siteRootId = 0;
    
        if (Current.Parent == null)
        {
            siteRootId = Current.Id;
        }
        else
        {
            siteRootId = Current.Parent.Id;
        }
        var examineQuery = criteria.Field("SearchablePath", siteRootId.ToString());
    
        examineQuery.Not().Field(HideFromNavigation, 1.ToString());
    
    
        string[] terms = searchTerm.Split(' ');
    
        examineQuery.And().GroupedOr(new List<string> { fieldToSearch, "FileTextContent" }, terms);
    
        examineQuery.And().OrderByDescending("reviewDate");
    
        var results = searcher.Search(examineQuery.Compile());
        <p>@criteria.ToString()</p>
        if (results.Any())
        {
            <p>You search for ""<strong>@searchTerm</strong>" found @results.Count() results</p>
            @RenderResults(results, umbraco)
        }
        else
        {
            <p>No results found for query @searchTerm</p>
        }
    }
    

    When my site instance is not chinese so its going down and getting a MultiSearcher object I get generated query that looks like:

    +SearchablePath:1068 -umbracoNaviHide:1 +(contents:umbraco FileTextContent:umbraco)
    

    That is correct. However when I am in the chinese site I get the query:

    +(contents:test FileTextContent:test)
    

    It seems to be ignoring the parts of the query that I am passing in using fluent api.

    The multisearcher and searcher from collection are types of BaseSearchProvider. Anyone seen this before?

    Regards

    Ismail

  • Marc Goodson 2126 posts 14217 karma points MVP 8x c-trib
    Oct 03, 2016 @ 21:19
    Marc Goodson
    101

    Hi Ismail

    If you are using the Lucene.net Chinese Analyzer from here:

    https://lucenenet.apache.org/docs/3.0.3/dir_354f6a4a03ec35feea9a4444b3b86ec9.html

    You will see this in the comments of the ChineseFilter:

    https://lucenenet.apache.org/docs/3.0.3/d6/dab/chinesefilter8cssource.html

      /// A {@link TokenFilter} with a stop word table.  
       36     /// <ul>
       37     /// <li>Numeric tokens are removed.</li>
       38     /// <li>English tokens must be larger than 1 char.</li>
       39     /// <li>One Chinese char as one Chinese word.</li>
       40     /// </ul>
    

    So when you are using this analyser any decimal numbers are filtered out of the tokens

    the bit of code that does the filtering:

     switch (char.GetUnicodeCategory(text[0]))
       81                     {
       82                         case UnicodeCategory.LowercaseLetter:
       83                         case UnicodeCategory.UppercaseLetter:
       84                             // English word/token should larger than 1 char.
       85                             if (termLength > 1)
       86                             {
       87                                 return true;
       88                             }
       89                             break;
       90                         case UnicodeCategory.OtherLetter:
       91                             // One Chinese char as one Chinese word.
       92                             // Chinese word extraction to be added later here.
       93                             return true;
       94      
    
               }
    

    So you can hopefully see there is no case for a UnicodeCategory of DecimalDigitNumber

    and you are trying to filter by a decimal digital number in both the case of searchablePath and also umbracoNaviHide, even if they are passed by 'string' each character is parsed in turn - resulting in these tokens not being present in your generated query because their values are stripped by this filter.

    So in theory you can get around this by compiling your own version of ChineseFilter.cs with a case to handle numbers and allow them as tokens...

    switch (Char.GetUnicodeCategory(text[0])) {

                        case UnicodeCategory.LowercaseLetter:
                        case UnicodeCategory.UppercaseLetter:
    
                            // English word/token should larger than 1 character.
                            if (text.Length > 1)
                            {
                                return token;
                            }
                            break;
                        case UnicodeCategory.DecimalDigitNumber:                        
                                return token;                         
                            break;
                        case UnicodeCategory.OtherLetter:
    
                            // One Chinese character as one Chinese word.
                            // Chinese word extraction to be added later here.
    
                            return token;
                    }
    

    and this will allow you to pass 1 to umbracoNaviHide and 1068 to SearchablePath - but to be honest, I've no idea if this is a good thing to do or not :-) - hopefully though it explains the puzzle of what is going on !!!

    regards

    Marc

Please Sign in or register to post replies

Write your reply to:

Draft