Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • James Jackson-South 489 posts 1747 karma points c-trib
    Nov 30, 2015 @ 23:45
    James Jackson-South
    1

    Synchronising Examine indexes.

    Hi all,

    I have the following webapp setup on Azure.

    • Master
    • Slave - (Can be scaled)

    We are using custom merged fields within the external Examine index to search against. It's pretty standard stuff triggered on the GatheringNodeData event.

    BaseIndexProvider baseIndexProvider = ExamineManager.Instance.IndexProviderCollection[SearchConstants.IndexerName];
    
    if (baseIndexProvider != null)
    {
        baseIndexProvider.GatheringNodeData += (sender, e) => this.GatheringNodeData(sender, e, helper);
    }
    

    This is all standard stuff and works.... To a point.

    Since there are multiple websites there are multiple Examine indexes. Master will always stay correctly synchronised since we use the back office in that instance to publish content. Slave will go out of sync resulting in the following messages logged.

     2015-11-28 11:10:34,167 [P8776/D2/T94] WARN  Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache - Could not retrieve media 1449 from Examine index, reverting to looking up media via legacy library.GetMedia method
     2015-11-28 11:10:34,292 [P8776/D2/T94] WARN  Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache - Could not retrieve media 2065 from Examine index, reverting to looking up media via legacy library.GetMedia method
     2015-11-28 11:10:34,292 [P8776/D2/T94] WARN  Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache - Could not retrieve media 2067 from Examine index, reverting to looking up media via legacy library.GetMedia method
     2015-11-28 11:10:34,308 [P8776/D2/T94] WARN  Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedMediaCache - Could not retrieve media 2069 from Examine index, reverting to looking up media via legacy library.GetMedia method
    

    This is happening a lot...

    What I would like to know is what would be the correct way to synchronise the two indexes whenever Master changes the content or media tree.

    I know about the PageCacheRefresher.CacheUpdated event. Is that the best place to do this? We are already invalidating our donut cache via an eventhandler and I've thrown in a little Examine code to test.

        private void PageCacheRefresherCacheUpdated(PageCacheRefresher sender, CacheRefresherEventArgs e)
        {
            MessageType kind = e.MessageType;
    
            if (kind == MessageType.RefreshById || kind == MessageType.RemoveById)
            {
                // Attempt to remove cache by document type alias and template alias.
                int? id = e.MessageObject as int?;
    
                if (id.HasValue)
                {
                    IContentService contentService = ApplicationContext.Current.Services.ContentService;
                    IContent entity = contentService.GetById(id.Value);
    
                    this.ClearOutputCacheItem(this, entity);
    
                    if (kind == MessageType.RemoveById)
                    {
                        // TODO: Test the crap out of this
                        ExamineManager.Instance.IndexProviderCollection[SearchConstants.IndexerName].DeleteFromIndex(
                            id.ToString());
                    }
                    else
                    {
                        // TODO: How do I reindex one item?
                        ExamineManager.Instance.IndexProviderCollection[SearchConstants.IndexerName].RebuildIndex();
                    }
                }
    
            }
            else if (kind == MessageType.RefreshAll)
            {
                // Remove all caches.
                this.OutputCacheManager.RemoveItems();
                ExamineManager.Instance.IndexProviderCollection[SearchConstants.IndexerName].RebuildIndex();
            }
        }
    

    You'll see in my comments that I'm a little unsure that I'm doing the correct thing and I'm really not happy having to rebuild the entire index when I have the id to the node in question. I couldn't figure out a way to clear one item with the objects present in this event.

    Am I on the correct track? If so, could you post a code example indicating the best way to perform synchronisation.

    Cheers!

  • James Jackson-South 489 posts 1747 karma points c-trib
    Dec 02, 2015 @ 01:07
    James Jackson-South
    0

    Ok so it seems I might be able to reindex a single node by converting the IContent using the PackagingService. However I cannot find any documentation on that service nor clarity on what the type parameter is.

    The below is untested. Is this the correct way to do it?

    ExamineManager.Instance
                  .IndexProviderCollection[SearchConstants.IndexerName]
                  .ReIndexNode(entity.ToXml(packagingService), entity.ContentType.Alias);
    

    I pass the package service as follows...

    IPackagingService packagingService = applicationContext.Services.PackagingService;
    PageCacheRefresher.CacheUpdated += (sender, e) => this.PageCacheRefresherCacheUpdated(sender, e, packagingService);
    
  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Dec 02, 2015 @ 12:42
    Ismail Mayat
    0

    James,

    Publishing on master should cause publishing on slaves and triggering of all related events including indexing so in theory should not need to do this manually? Unless it does not work?

    With regards to reindex Matt had to reindex node recently and got the node xml from cache thereby saving db hit. He is digging out some code

  • Matt Brailsford 4123 posts 22194 karma points MVP 9x c-trib
    Dec 02, 2015 @ 12:47
    Matt Brailsford
    2

    I think my scenario was slightly different, as I have a parent / child document setup where when the child node indexes itself, it grabs content from the parent node to index along with it. So what follows is how I force the childnodes to reindex when the parent node gets updated in the cache. Not sure how much it will help, but it might:

    CacheRefresherBase<PageCacheRefresher>.CacheUpdated += (sender, args) =>
        {
            IPublishedContent publishedContent = null;
    
            if (args.MessageType == MessageType.RefreshById)
            {
                publishedContent = UmbracoContext.Current.ContentCache.GetById((int)args.MessageObject);
            }
            else if (args.MessageType == MessageType.RefreshByInstance)
            {
                publishedContent = UmbracoContext.Current.ContentCache.GetById(((IContent)args.MessageObject).Id);
            }
    
            if (publishedContent != null && publishedContent.DocumentTypeAlias == Constants.DocTypeAliases.MyDocTypeAlias)
            {
                // Reindex child nodes
                foreach (var child in publishedContent.Children)
                {
                    var xmlStr = umbraco.library.GetXmlNodeById(child.Id.ToString()).Current.OuterXml;
                    var xml = XElement.Parse(xmlStr);
    
                    ExamineManager.Instance.IndexProviderCollection["ExternalIndexer"]
                        .ReIndexNode(xml, IndexTypes.Content);
                }
            }
        };
    
  • Veronica Burd 76 posts 201 karma points
    Dec 02, 2015 @ 13:30
    Veronica Burd
    0

    Thanks Matt.

    This is really helpful as I have the same scenario as you and was stuck as to how to get the child nodes to refresh in the index.

    Ver

  • Dave Woestenborghs 3504 posts 12133 karma points MVP 8x admin c-trib
    Dec 02, 2015 @ 16:13
    Dave Woestenborghs
    0

    Hi James,

    We are experiencing similar issues. I think you are also using the new flexible loadbalancing introduced in 7.3.

    How this works is that when you make a change to a content or a media item a cacheinstruction is written to the database (table umbracoCacheInstruction).

    On the first request to your slave app it will look in a file in the folder App_Data/TEMP/DistCache/{machineName}/*-lastsynced.txt. This file contains id of the last cache instruction that has run on your slave server. It will then load all cache instructions from the db with a later id then in the file.

    So if you change something on master these changes aren't immediatly executed on the slave instances. They should receive a request first.

    As Examine indexes are updated async, so it's possible that request to your page is executed before the index is updated and hence giving you out of date content.

    And as I see you are also using outputcache your page will be cached and will not reflect the changes until the next time the cache is refreshed.

    We started noticing this when editors were starting to change the focal point on images. The change was immediatly visible on the master server but not on the slaves. To make it visible on the slave servers we need another content change action in the backend (for clearing output cache).

    Still looking for a way on how to solve this.

    Dave

Please Sign in or register to post replies

Write your reply to:

Draft