examine questions - Extending Umbraco

overflew 87 posts 110 karma points

Jul 26, 2010 @ 13:16

Examine - Questions

Extending Umbraco

Hi, I'm loving the Examine functionality - thanks. A few questions:

Is it possible to run a query on an index set without any parameters?

I'd like to recycle my current search page result setup as a 'browse all', as it has all the templates + pagination set up for it nicely.

So far, "*".MultipleWildcard is disallowed, and misc dummy calls (like an .OrderBy, or .Range("id", 1, 1000000)) to create the IBoolean don't seem to return results..

Do I need to set an additional parameter in the config to allow for ordering by a value?

I'm sure I saw it in a previous (Internet) search, but can't for the life of me find it. Currently the following doesn't modify the search results:

IBooleanOperation query = sc.OrderByDescending(new string[] { "updateDate" });

I'm sure it went in here:

<add Name="updateDate" />

Is there an easy way to perform a 'contains' operation for fields that a comma separated value list?

... or would I have to check for the following: (to ensure it doesn't run over the ends of the ids)

id
,id
id,

Can I check if the nodeId is in a given list?

Given a list of say, 20 ints, can I reduce a query to the nodeIds in that set? (E.g.: searching for 'toast', and wanting to compare it to the list of nodeIds from the relations DB.

I'll also look into implementing 'GatheringNodeData' (mentioned in another post), but would like to know if this is possible...

Any / all help much appreciated.

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Jul 26, 2010 @ 14:03

1) No there is no "get all documents" option, that'd be a performance sink and not how Lucene is designed

2) You need to enable a field as sortable in the config, this is so that tokenization of the field can be taken into account. See this slide from Shannon's CG10 session.

3) Indexing doesn't lend itself to a 'contains' query, in fact Lucene actively discourages a %blah% style query. What you should do is use the Examine events to replace the delimiter character with a space so that you end up with each item in the collection indexed separately, then you can do an explicit query against it.

4) Are you wanting to relate data from a non-Umbraco data source to a record in Examine?

Copy Link

overflew 87 posts 110 karma points

Jul 27, 2010 @ 02:24

Thanks Slace,

For the larger query mentioned in 1, would the .Skip & .Take alleviate any performance issue?

I'm just looking up the GatheringNodeData now - are there any online examples of its implementation with Umbraco?

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Jul 27, 2010 @ 03:28

Skip and Take have built-in performance considerations, hydration of items from the Lucene index doesn't happen until they are required, so if you skip 100 results you wont have them accessed from Lucene.

For GatheringNodeData you should check out shan's code demo, that's all the doco we've got so far (we're a bit behind in the writing of it)

Copy Link

RoboDog 21 posts 41 karma points

Aug 12, 2010 @ 12:34

Hi, can anyone tell me how i make the indexer store text in its orginal case for example a property called name containing "John Doe" currently is stored as "john doe" how can i retain the original casing in the lucene index.

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Aug 12, 2010 @ 14:37

You have to store both a case sensitive and case insensitive data as Lucene isn't really designed for data retrieval.

To do this with Examine you have to attach to the UmbracoExamine.LuceneExamineIndexer.DocumentWriting event (which may have moved into the LuceneEngine with the latest check ins, I'm not 100% sure).

This event is fired in a Lucene-scope as provides you with access to the Lucene Document object as it's being written to, and in which you'll need to add your un-analyzed version of the content.

Here's an example of how we did it in a recent project for showing in search results:

void indexer_DocumentWriting(object sender, DocumentWritingEventArgs e)
{
    var doc = e.Document;

    // Find the title
    string title = !e.Fields.ContainsKey("PageTitle") || string.IsNullOrEmpty(e.Fields["PageTitle"]) ? e.Fields["nodeName"] : e.Fields["PageTitle"];

    // Default content is nothing:
    string content = string.Empty;
    // Unless a description is found:
    if (e.Fields.ContainsKey("Description") && !string.IsNullOrEmpty(e.Fields["Description"]))
    {
        content = e.Fields["Description"];
    }
    // Or BodyContent is found:
    else if (e.Fields.ContainsKey("BodyContent") && !string.IsNullOrEmpty(e.Fields["BodyContent"]))
    {
        content = e.Fields["BodyContent"];
    }

    // Store the title and content with text casing unchanged
    doc.Add(new Field("__PageTitle", title, Field.Store.YES, Field.Index.NOT_ANALYZED));
    doc.Add(new Field("__Content", content, Field.Store.YES, Field.Index.NOT_ANALYZED));
}

And when we display it in the search results we end up with showing the __PageTitle and __Content field, not the 'real' fields.

Check out this article I wrote to better understand the Store and Index concepts: http://www.aaron-powell.com/documents-in-lucene-net

Copy Link

RoboDog 21 posts 41 karma points

Aug 23, 2010 @ 17:09

Hi cant get any of this to work as attaching to events seems to be broken any ideas ?

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Aug 24, 2010 @ 00:43

When/ How are you attaching the events?

I blogged this last night: http://farmcode.org/post/2010/08/23/Text-casing-and-Examine.aspx

Copy Link

Folke Friedrichsen Rügge 13 posts 33 karma points

Sep 14, 2010 @ 15:18

Hi,

I am (still) very new to Umbraco.

As some of the comments say on the umbraco.tv session called "Adding a search to your website", it would be very nice to have some examples on how to "further tweak the results to include page excepts (with possible keyowrd highlight), paging and any other common scenarios that would be required in a search for a site."

I have this exact requirement: showing a text snippet from the "hit" document(s) that contains the word/phrase - even highlightning would be a bonus! :-)

From a rather reliable source a have got a hint regarding IDictionary which should hold the indexed user fields and by which you should be able to access items containing the search term.

I will try this out, but maybe there's a bunch of other/better ways to accomplish this..(?)

Please knock me over with all your great ideas and - even better - examples :-)

Cheers

Folke

Copy Link

Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib

Sep 14, 2010 @ 15:43

Folke,

There are contributions to the lucene project that do some of the above, Highlight which highlights your search term match in result set. Paging you need to write something yourself I have a .net control to do it. I highly recommend the lucene in action edition 2 book that covers alot of the contrib stuff.

Regards

Ismail

Copy Link

Folke Friedrichsen Rügge 13 posts 33 karma points

Sep 14, 2010 @ 16:12

Hi Ismail,

I just might buy that book to really understand Lucene and it's possibilities.

An other thing that buzzles me is the fact that this site specifies the use of operators like "AND", "OR", "+", "-" and a lot more in the search query. But that doesn't seem to work via Examine or have I just totally missed it? If I use an "OR"-operator between two words that I know exist I get a zero hit - like:

byggeri OR tag

gives me 0 hits while the words byggeri and tag (in two seperate searches) both give me hits.

Doesn't Exmamine support these operators or do I just use them ... erh.. a bad/stupid way?

Kind regards

Folke

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Sep 14, 2010 @ 16:49

You would need to use the Lucene.Net contrib library for highlighting, and you may have to implement your own searcher to make it completely work (just overriding the built-in one should be sufficient). I've never had the need on a project so it's not something I've really looked into and since adding the contrib library adds another dependency we wont be doing it out of the box.

As for using the Lucene query syntax you have to pass that into the Fluent API's "RawQuery" method. This is because adding search using proper query-parser syntax adds overhead and it's a fairly advanced task so it's left to a more advanced API call.

Lastly check out this article for how to do paging: http://farmcode.org/post/2010/08/18/Paging-with-Examine.aspx

Copy Link

Folke Friedrichsen Rügge 13 posts 33 karma points

Sep 14, 2010 @ 16:59

Hi Slace,

Thanks a lot for your comments. I actually had the feeling that I needed to "transfer" or parse the query into the Fluent API to make it work. Very nice to have that settled :-)

My "own searcher".. well that would be an interesting (challenging) task for me to accomplish, but I'll give it a go as well with the pagination and have some og my more experienced fellow devs. help me along the way :-)

Thanks a lot! :D

Cheers

Folke

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Sep 14, 2010 @ 17:06

From what I recall of the highlighter it's just a custom analyzer, so you may be able to get away without having to do anything but specify it in the Examine config (I'm not that far into the book yet, Ismail is a lot further than I am, I just skimmed that chapter and then started from page 1 of the book :P).

The only reason I can think you'd need your own searcher is because of how the results are retrieved and returned.

Copy Link

is working on a reply...

Flag this post as spam?

Examine - Questions