setup lucenceexamine for hyphenated words - API Questions

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Topic author was deleted

Apr 16, 2014 @ 16:24

Setup Lucence/Examine for hyphenated words

API Questions

So I'm looking for the best setup on Examine/Lucence to do hyphanted words.

Using the StandardAnalyzer (for both indexing/querying), words like M1-1234 are not searched properly when keywords like: M1, M1-, M1-* are input.

Any experience with which Analyzer combos to make this work is appreciated.

Cheers,

Kevin

Copy Link
Comment author was deleted

Apr 16, 2014 @ 16:35

Looks like Whitespace/Whitespace is the trick, but it appears to be case-sensitive. Have to work through that next.

Copy Link
Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib

Apr 16, 2014 @ 16:39
0
Kevin,

Standard will definitely remove - and any other non alphanumeric so m1-1234 will end up in the index as m1 1234.

I am also certain that when you search the searcher under the hood if using standard analyser will take term m1-1234 and give m1 1234 now depending on how you code your search you may just be doing m1 on its own.

What I usually have and eZsearch has something similar is take the term then test does it have a space or in your case - then split it out then do query as groupedor so it becomes in lucene something like +(contents:m1 contents:1234) see below:
```
                //multiple values in one field
            if (qsValue.Contains(" "))
            {
                string[] terms = qsValue.Split(' ');
                queryToBuild.And().GroupedOr(new List<string> { key }, terms);

            }
```
You could change to groupedAnd. I also have the following that i run through my search term before doing query:
```
        /// <summary>
    /// take from http://stackoverflow.com/questions/263081/how-to-make-the-lucene-queryparser-more-forgiving
    /// </summary>
    /// <param name="query"></param>
    /// <returns></returns>
    public static string MakeSearchQuerySafe(this string query)
    {
        var regex = new Regex(@"[^\w\s-]");
        return regex.Replace(query, "");
    }
```
You may want to update and replace with space in the regex? In some old code i also have
```
                    if (qsValue.Contains("-"))
                {
                    queryToBuild = queryToBuild.And().Field(key, qsValue.Escape());
                }
```
However I have no idea why i did it oh Anthony i am sorry for not commenting in enough!!! (Anthony Dang during code reviews used to always tell me off for not commmenting enough doh!!)

Have a play with above see if it gets you any further. However I reckon using makesearchquerysafe with space test and creating groupedor or and should do the trick.

Regards

Ismail
Copy Link
Pushpendra Singh 61 posts 116 karma points

Jul 09, 2014 @ 15:04

0

Ismail,

I am using two analyzer WhitespaceAnalyzer as well as StandardAnalyzer in my Exmine setting config.

My field is present in exmineindex.config.Problem only for alphanumeric search (eg. "test1") in boost field (metakeyword) not in alphabets.

My umbraco version is 4.11.8.

Regards,

Pushpendra singh

Copy Link
Michaela Ivanova 12 posts 104 karma points

Nov 18, 2016 @ 09:07

0

Actually it splits the word by '-'. The problem is not in the Index or the Settings. Try using UmbracoHelper class and the method TypedSearch(keywords, false, "YourSearcher"), note that useWildCards is set to be "false". For more info see the method Search here: https://github.com/umbraco/Umbraco-CMS/blob/dev-v7/src/Umbraco.Web/PublishedContentExtensions.cs

Copy Link
is working on a reply...

Please Sign in or register to post replies

Flag this post as spam?

Setup Lucence/Examine for hyphenated words