Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Niclas Schumacher 67 posts 87 karma points
    Apr 16, 2014 @ 10:35
    Niclas Schumacher
    0

    creating own index with examine

    Hello fellow examiners!
    In my team we are building a site which has its users, outside umbraco, and we need to create a index with these users, but how is this done ? It seems like Examine is build to umbraco, but can i be used outside umbraco content?
    Buttom line is that umbraco dosn't contain enough functionality on the users for us to have our users inside umbraco, so we got our own table in the database with our users, so i can't get umbraco to index these users.

    Does anyone have any experience creating own index with Examine, or should i just go straight at it and lucene lucene directly ? 
    I would like to use examine, so i later on could use multiIndex for searching across both content and users. 

    Can anyone point me in the right direction ? :)

  • Alex Skrypnyk 6131 posts 23950 karma points MVP 7x admin c-trib
    Apr 16, 2014 @ 11:04
    Alex Skrypnyk
    0

    Hi Nicolas,

    Few Weeks ago we did the same functionality. You can do your custom index easily. Example of reading from index: http://stackoverflow.com/questions/20927236/proper-way-to-get-readers-writers-in-lucene-net

    Our class for working with Lucene:

    public static class LuceneSearch
    {
        #region members
    
        private static readonly string _luceneDir = Path.Combine(HttpRuntime.AppDomainAppVirtualPath, "lucene_index");
    
        private static FSDirectory _directoryTemp;
    
        //private static float _minRelevance = GlobalConfig.LuceneFeedsMinRelevance;
    
        private static FSDirectory _directory
        {
            get
            {
                if (_directoryTemp == null) _directoryTemp = FSDirectory.Open(new DirectoryInfo(_luceneDir));
                if (IndexWriter.IsLocked(_directoryTemp)) IndexWriter.Unlock(_directoryTemp);
                var lockFilePath = Path.Combine(_luceneDir, "write.lock");
                if (File.Exists(lockFilePath)) File.Delete(lockFilePath);
                return _directoryTemp;
            }
        }
    
        #endregion members
    
        #region public methods
    
        /// <summary>
        /// Searches the default.
        /// </summary>
        /// <param name="input">The input.</param>
        /// <param name="fieldName">Name of the field.</param>
        /// <returns></returns>
        public static IEnumerable<PostData> SearchDefault(string input, string fieldName = "")
        {
            return string.IsNullOrEmpty(input) ? new List<PostData>() : _search(input, fieldName);
        }
    
        /// <summary>
        /// Searches the specified input.
        /// </summary>
        /// <param name="input">The input.</param>
        /// <param name="fieldName">Name of the field.</param>
        /// <returns></returns>
        public static IEnumerable<PostData> Search(string input, string fieldName = "")
        {
            if (string.IsNullOrEmpty(input)) return new List<PostData>();
    
            var terms = input.Trim().Replace("-", " ").Split(' ').Where(x => !string.IsNullOrEmpty(x)).Select(x => x.Trim() + "*");
            input = string.Join(" ", terms);
    
            return _search(input, fieldName);
        }
    
        /// <summary>
        /// Adds the index of the update lucene.
        /// </summary>
        /// <param name="sampleData">The sample data.</param>
        public static void AddUpdateLuceneIndex(PostData sampleData)
        {
            AddUpdateLuceneIndex(new List<PostData> { sampleData });
        }
    
        /// <summary>
        /// Adds the index of the update lucene.
        /// </summary>
        /// <param name="sampleDatas">The sample datas.</param>
        public static void AddUpdateLuceneIndex(IEnumerable<PostData> sampleDatas)
        {
            // init lucene
            var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
            var writer = new IndexWriter(_directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
    
            if (sampleDatas.Count() == 1 || sampleDatas.Count() != writer.NumDocs())
            {
                writer.Dispose();
                writer = new IndexWriter(_directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
    
                // add data to lucene search index (replaces older entry if any)
                foreach (var sampleData in sampleDatas)
                {
                    _addToLuceneIndex(sampleData, writer);
                }
    
                LoggerHelper.Log(string.Format("{0} items added to Lucene index", sampleDatas.Count()));
            }
            else
            {
                LoggerHelper.Log(string.Format("No changes found for Lucene intex", sampleDatas.Count()));
            }
    
            // close handles
            analyzer.Close();
            writer.Dispose();
        }
    
        /// <summary>
        /// Clears the lucene index record.
        /// </summary>
        /// <param name="record_id">The record_id.</param>
        public static void ClearLuceneIndexRecord(long record_id, PostTypeEnum postType)
        {
            // init lucene
            var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
            using (var writer = new IndexWriter(_directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
            {
                // remove older index entry
    
                var booleanQuery = new BooleanQuery();
    
                var searchQueryId = new TermQuery(new Term("Id", record_id.ToString()));
                var searchQueryType = new TermQuery(new Term("Type", postType.ToString()));
    
                booleanQuery.Add(searchQueryId, Occur.MUST);
                booleanQuery.Add(searchQueryType, Occur.MUST);
    
                writer.DeleteDocuments(booleanQuery);
    
                // close handles
                analyzer.Close();
                writer.Dispose();
            }
        }
    
        /// <summary>
        /// Clears the index of the lucene.
        /// </summary>
        /// <returns></returns>
        public static bool ClearLuceneIndex()
        {
            try
            {
                var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
                using (var writer = new IndexWriter(_directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
                {
                    // remove older index entries
                    writer.DeleteAll();
    
                    // close handles
                    analyzer.Close();
                    writer.Dispose();
                }
            }
            catch (Exception)
            {
                return false;
            }
            return true;
        }
    
        /// <summary>
        /// Optimizes this instance.
        /// </summary>
        public static void Optimize()
        {
            var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
            using (var writer = new IndexWriter(_directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
            {
                analyzer.Close();
                writer.Optimize();
                writer.Dispose();
            }
        }
    
        /// <summary>
        /// Finds the specified search terms.
        /// </summary>
        /// <param name="searchTerms">The search terms.</param>
        /// <returns></returns>
        /// <exception cref="System.ArgumentNullException">searchTerms</exception>
        public static List<PostData> Find(Dictionary<string, string> searchTerms)
        {
            if (searchTerms == null)
                throw new ArgumentNullException("searchTerms");
    
            var result = new List<PostData>();
    
            try
            {
                var directory = FSDirectory.Open(new DirectoryInfo(_luceneDir));
                if (!IndexReader.IndexExists(directory))
                    return null;
    
                var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
    
                var mainQuery = new BooleanQuery();
                foreach (var pair in searchTerms)
                {
                    var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, pair.Key, analyzer);
                    var query = parser.Parse(pair.Value);
    
                    mainQuery.Add(query, Occur.MUST);
                }
    
                using (var searcher = new IndexSearcher(_directory, false))
                {
                    var hits = searcher.Search(mainQuery, null, 1000, Sort.RELEVANCE).ScoreDocs;
    
                    result = _mapLuceneToDataList(hits, searcher).ToList();
    
                    analyzer.Close();
                    searcher.Dispose();
                }
            }
            catch (Exception e)
            {
                LoggerHelper.ErrorLog(e);
            }
    
            return result;
        }
    
        #endregion public methods
    
        #region private methods
    
        /// <summary>
        /// _searches the specified search query.
        /// </summary>
        /// <param name="searchQuery">The search query.</param>
        /// <param name="searchField">The search field.</param>
        /// <returns></returns>
        private static IEnumerable<PostData> _search(string searchQuery, string searchField = "")
        {
            // validation
            if (string.IsNullOrEmpty(searchQuery.Replace("*", "").Replace("?", ""))) return new List<PostData>();
    
            // set up lucene searcher
            using (var searcher = new IndexSearcher(_directory, false))
            {
                var hits_limit = 1000;
                var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
    
                // search by single field
                if (!string.IsNullOrEmpty(searchField))
                {
                    var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, searchField, analyzer);
                    var query = parseQuery(searchQuery, parser);
                    var hits = searcher.Search(query, hits_limit).ScoreDocs;
                    var results = _mapLuceneToDataList(hits, searcher);
                    analyzer.Close();
                    searcher.Dispose();
    
                    return results;
                }
                // search by multiple fields (ordered by RELEVANCE)
                else
                {
                    var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, new[] { "Id", "Summary" }, analyzer);
    
                    var query = parseQuery(searchQuery, parser);
    
                    var hits = searcher.Search(query, null, hits_limit, Sort.RELEVANCE).ScoreDocs;
    
                    var results = _mapLuceneToDataList(hits, searcher);
    
                    analyzer.Close();
                    searcher.Dispose();
    
                    return results;
                }
            }
        }
    
        /// <summary>
        /// Parses the query.
        /// </summary>
        /// <param name="searchQuery">The search query.</param>
        /// <param name="parser">The parser.</param>
        /// <returns></returns>
        private static Query parseQuery(string searchQuery, QueryParser parser)
        {
            Query query;
            try
            {
                query = parser.Parse(searchQuery.Trim());
            }
            catch (ParseException e)
            {
                query = parser.Parse(QueryParser.Escape(searchQuery.Trim()));
    
                LoggerHelper.ErrorLog(e);
            }
    
            return query;
        }
    
        /// <summary>
        /// _adds the index of to lucene.
        /// </summary>
        /// <param name="sampleData">The sample data.</param>
        /// <param name="writer">The writer.</param>
        private static void _addToLuceneIndex(PostData sampleData, IndexWriter writer)
        {
            // remove older index entry
            //var searchQuery = new TermQuery(new Term("Id", sampleData.Id.ToString()));
            //writer.DeleteDocuments(searchQuery);
    
            // add new index entry
            var doc = new Document();
    
            // add lucene fields mapped to db fields
            doc.Add(new Field("Id", sampleData.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
            doc.Add(new Field("Summary", sampleData.Summary, Field.Store.YES, Field.Index.ANALYZED));
    
            // add entry to index
            writer.AddDocument(doc);
        }
    
        /// <summary>
        /// _maps the lucene document to data.
        /// </summary>
        /// <param name="doc">The doc.</param>
        /// <returns></returns>
        private static PostData _mapLuceneDocumentToData(Document doc)
        {
            return new PostData
            {
                Id = Convert.ToInt32(doc.Get("Id")),
                Summary = doc.Get("Summary")
            };
        }
    
        /// <summary>
        /// _maps the lucene to data list.
        /// </summary>
        /// <param name="hits">The hits.</param>
        /// <returns></returns>
        private static IEnumerable<PostData> _mapLuceneToDataList(IEnumerable<Document> hits)
        {
            return hits.Select(_mapLuceneDocumentToData).ToList();
        }
    
        /// <summary>
        /// _maps the lucene to data list.
        /// </summary>
        /// <param name="hits">The hits.</param>
        /// <param name="searcher">The searcher.</param>
        /// <returns></returns>
        private static IEnumerable<PostData> _mapLuceneToDataList(IEnumerable<ScoreDoc> hits, IndexSearcher searcher)
        {
            return hits.Select(hit => _mapLuceneDocumentToData(searcher.Doc(hit.Doc))).ToList();
        }
    
        #endregion private methods
    }
    
  • Alex Skrypnyk 6131 posts 23950 karma points MVP 7x admin c-trib
    Apr 16, 2014 @ 11:07
    Alex Skrypnyk
    0

    In application start we add all our data to the index.

    Thanks, Alex

  • Niclas Schumacher 67 posts 87 karma points
    Apr 16, 2014 @ 11:49
    Niclas Schumacher
    0

    Thanks for the code Alex. I'll give it a look. 
    Was it a umbraco project ?
    How come didn't you use a examine for indexing, or has I understood examine in the wrong way about indexing ? - At the top of my head i could imagine that i should use Lucene for indexing, and afterwards be able to use examine for searching in the index.
     

  • Alex Skrypnyk 6131 posts 23950 karma points MVP 7x admin c-trib
    Apr 16, 2014 @ 12:23
    Alex Skrypnyk
    0

    It was MVC project without Umbraco. Examine is just tool for using Lucene, so you can use it too. Question is what do you prefer )

  • Alex Skrypnyk 6131 posts 23950 karma points MVP 7x admin c-trib
    Apr 16, 2014 @ 12:23
    Alex Skrypnyk
    0

    Examine we used for Umbraco project, and it was not bad ) but little bit hard for first time.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Apr 16, 2014 @ 12:30
    Ismail Mayat
    0

    Niclas,

    To create your own index using examine see https://github.com/Shandem/Examine/tree/master/Projects there is sample project to index a database table you can use that or use same method and provide the data to index however you like. You can then use examine multisearcher. The one thing to note is building custom index with examine you make use of simpledataindexer this will index all items it will not allow you to add / update / remove one item at a time. So for example if you are indexing a database table when you index you will index the whole table and your index is a snapshot of table. If stuff is added to table it will not end up in your index until you rebuild. So if data in table is not updated regularly then you can set up a scheduled to rebuild the index.

    Regards

    Ismail

  • Markus Johansson 1902 posts 5706 karma points MVP c-trib
    Nov 30, 2016 @ 21:48
    Markus Johansson
    0

    Hi! I knep that this is an old TV-redaktionen but if I need to update just one item in the index i.e. when something is changed - is there an alternative to the SimpleDataIndexer-approach?

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Dec 01, 2016 @ 08:12
    Ismail Mayat
    0

    Markus,

    If the index is custom one then you can do https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/78745-custom-examinelucene-index-with-ability-to-insert-update-individual-entries

    However you will need some way to call that method on a change. So in umbraco the reindex or add to index is called after publish event. Here depending on what data you are indexing you will need to after insert / update / delete can single update method. So if the data is in a sql server db table maybe a trigger that uses some .net assembly call (messy but is doable)

    Regards

    Ismail

  • Niclas Schumacher 67 posts 87 karma points
    Apr 16, 2014 @ 14:10
    Niclas Schumacher
    0

    Alex, 

    great to hear.. And once agian thanks for the code, i'll go through and watch your approach!

    Ismail,

    I have seen that project, but what i am hearing you say is that i can't add fields to the index. For instance which group the user is a member of which is umbraco content, when using examine ?  

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Apr 16, 2014 @ 14:13
    Ismail Mayat
    0

    Alex,

    I am confused i thought you had all your members in separate system and I am assuming the groups they belong to? Or do you have users in one place and the groups they belong to in umbraco?

    Regards

    Ismail

  • Niclas Schumacher 67 posts 87 karma points
    Apr 16, 2014 @ 15:01
    Niclas Schumacher
    0

    Ismail,

    I understand you are confused, - 
    We got users in a table outside of umbraco. But they relate to content item inside umbraco.
    I want to be able search in users which are, for instance: related to the content items im related to, so i find users which are indirectly related to me.

     

    a example :
    in umbraco be got some workgroups, where users can write and so on.
    Users are members of these groups.
    I want be able to find users and weight them on the relation i as a user relate to other users.

    so as i see it, i need to create a index of the users, and attach the content ids into the respected columns. (id: 1023, workgroupIds: 1034,1234,1567, unitIds:1111,1543)
    then i find users which are connected to the same workgroups and units as i am, and for instance boost the user in the search if he is. 

    Does this make sense ? 

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Apr 16, 2014 @ 15:03
    Ismail Mayat
    0

    Niclas,

    Ok, so if you look at the code link that is in a loop of db rows, so in that loop you can for the given user get his / her linked content and add that to the index as a field then its searchable.

    Regards

    Ismail

  • Niclas Schumacher 67 posts 87 karma points
    Apr 22, 2014 @ 09:35
    Niclas Schumacher
    0

    Ismail,

    Sorry for the late answer, i've been away in the easter :)

    I'm having major problems adding fields to the nodes. The nodes gets created properly, but there are no data assigned to it, and can't figure out why this is.

     

    foreach (var user in users) {

                    Dictionary<string, string> dic = new Dictionary<string, string>();

                    dic.Add("skillId", string.Join(",", user.Skills));

                    dic.Add("unitId", user.UnitId.ToString());

                    dic.Add("firstName", user.FirstName);

                    dic.Add("lastName", user.LastName);

                    list.Add(ExamineXmlExtensions.ToExamineXml(dic, user.Id, "user"));

                }

                this.AddNodesToIndex((IEnumerable<XElement>)list, "users");

    The index gets created with the id, but nothing from the dictionary gets added..
    Anyone tried having this problem ?

    - Niclas Schumacher 

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Apr 22, 2014 @ 10:19
    Ismail Mayat
    1

    Niclas,

    In the ExamineIndex.config file do you have entry with fields for that index? Here is settings for one my custom ones:

      <IndexSet SetName="directoryIndexSet"
         IndexPath="~/App_Data/ExamineIndexes/directory/">
    <IndexUserFields>
      <add Name="name" EnableSorting="true"/>
      <add Name="nodeName" EnableSorting="true"/>
      <add Name="nodeTypeAlias"/>
      <add Name="directoryType" />
      <add Name="countryName" />
      <add Name="countryId"/>
      <add Name="address1"/>
      <add Name="address2"/>
      <add Name="address3"/>
      <add Name="postCode"/>
      <add Name="emailAddress"/>
      <add Name="webSiteUrl"/>
      <add Name="clientNameForUrl"/>
      <add Name="region"/>
    </IndexUserFields>
    

    If you do not then they will not end up in the index, with custom indexes you have to ensure all fields are in the config file it is not like umbraco indexes where if you do not have fields it will add everything.

    Regards

    Ismail

  • Niclas Schumacher 67 posts 87 karma points
    Apr 25, 2014 @ 09:57
    Niclas Schumacher
    0

    Ismail, 
    That worked great!
    Now i'll try to to create my own indexer, so i gain more control..
    Thanks! 

Please Sign in or register to post replies

Write your reply to:

Draft