Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Donald Kerr 5 posts 75 karma points
    Nov 25, 2015 @ 22:09
    Donald Kerr
    0

    Examine Lucene Search Issue and Character Stripping

    I have constructed a number of indexes e.g. Standard, Whitespace etc. so that I can set up and test searches. I also have Luke running to test against them as well.

    It seems like quite a simple issue. I am searching for a phrase e.g. "This is to the right" and I am using the Whitespace index.

      luceneStringBodyText = "bodyText:(\"This is to the right\")";
      var query = searchCriteria.RawQuery(luceneStringBodyText);
    

    That should find every instance of "This is to the right" in any bodyText record. It does so unless the sentence in the index ends in a period/full stop i.e. "This is to the right." It does not find that phrase.

    In luke, the phrase is found if my search includes the period/full stop but if I omit that, the phrase is not found.

    In the second line of code above, RawQuery() changes the text and strips out the period/full stop if I put it in so there appears to be no way to construct the search phrase to include the period/full stop. RawQuery turns the above search string into: "bodyText:"? ? to ? right"" even if I include a period/fullstop after the last word i.e. "right."

    Notice also that certain words are replaced by question marks/question points.

    There are two questions really:

    1. How can I set up a search to find that phrase?

    2. Is there any way to stop RawQuery() from omitting the period/full stop if I include that at the end of the search phrase? If I can do that then I can sort this with a work around programmatically. I can live with RawQuery() changing certain words to "?" as that does not seem to affect the search.

    Any help appreciated. :)

  • Donald Kerr 5 posts 75 karma points
    Nov 26, 2015 @ 23:24
    Donald Kerr
    0

    Given that there seems to be no easy way to get around RawQuery() stripping out Lucene stop words leaving the likes of "+()" and "?", I have had to strip out those stop words from the search querystring prior to using the querystring search terms as the basis for a Lucene search.

    It has been helpful to use Luke to check that the resultant Lucene search string is sound.

Please Sign in or register to post replies

Write your reply to:

Draft