Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Feb 01, 2016 @ 11:46
    Ismail Mayat
    0

    Examine corruption issues

    Hello,

    We are using Umbraco 7.3.4 and parts of the site use Examine searches extensively. Content updates to the site are occasional so publishes etc are not that frequent. On publish we do tap into examine events to inject fields into the index.

    Since update to Umbraco 7.3.4 we are getting issues with indexes corrupting.

    There is nothing in Umbraco logs also we are using elmah.io and there is nothing in elmah. Site is sitting on azure and we have looked at eventlog also cannot see anything there. We thought initially it may be something to do with app pool restarting and index rebuilding. So we updated the ExamineSettings config RebuildOnAppStart and set that to false. This fixed the issue for about 2 weeks however recently it happened again.

    We are having to rebuild indexes and in some instances restarting app pool then rebuilding indexes to get the search powered functionality back up.

    Has anyone else seen this?

    Regards

    Ismail

  • Matt Brailsford 4123 posts 22194 karma points MVP 9x c-trib
    Feb 01, 2016 @ 11:52
    Matt Brailsford
    1

    Could it be this? http://issues.umbraco.org/issue/U4-6338

    We've had issues with indexes locking which then get out of sync / corrupted requiring app pool to be stopped to free up the lock and reindex. Applying the hot fix mentioned in that issue seems to have resolved it for us so far.

    Matt

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Feb 01, 2016 @ 11:55
    Ismail Mayat
    0

    Matt,

    Site is on azure so not sure if this applies?

    Regards

    Ismail

  • Dave Woestenborghs 3504 posts 12133 karma points MVP 8x admin c-trib
    Feb 01, 2016 @ 12:01
    Dave Woestenborghs
    2

    Hi Ismail,

    We had some issues with examine as well on a azure web app and noticed also performance issues accessing examine.

    After changing some configuration our problems were solved

    Apply the settings on this page under Common load balancing setup : https://our.umbraco.org/documentation/Getting-Started/Setup/Server-Setup/load-balancing/

    Then apply the settings described here : https://our.umbraco.org/documentation/Getting-Started/Setup/Server-Setup/load-balancing/flexible

    Dave

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Feb 02, 2016 @ 08:28
    Ismail Mayat
    0

    Dave,

    Will take a look however we are not load balancing

    Regards

    Ismail

  • Dave Woestenborghs 3504 posts 12133 karma points MVP 8x admin c-trib
    Feb 02, 2016 @ 08:55
    Dave Woestenborghs
    0

    Hi Ismail,

    We only have one web app for the moment and these config changes made a drastic impact on performance.

    Dave

  • James Jackson-South 489 posts 1747 karma points c-trib
    Feb 02, 2016 @ 22:16
    James Jackson-South
    0

    Hi Ismail,

    Are your logs showing errors similar to the ones in this issue?

    http://issues.umbraco.org/issue/U4-7869

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Feb 03, 2016 @ 07:51
    Ismail Mayat
    1

    James,

    Not that I can see well take another peek though. Also we will upgrade to 7.3.7 ASAP. Hopefully that my fix it, Jeavon's investigations look very promising, also his coding skills there are a good reason for his back to back MVP's the guy is a legend!!

    Cheers

    Ismail

  • James Jackson-South 489 posts 1747 karma points c-trib
    Feb 04, 2016 @ 08:10
    James Jackson-South
    1

    Isn't he just!

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 11, 2016 @ 15:40
    Ismail Mayat
    0

    Guys,

    Still getting this issue at least once every 2 weeks. Anyone seen this before?

    I did see this in the umbracolog:

    ERROR Umbraco.Core.UmbracoApplicationBase - An unhandled exception occurred
    

    System.UnauthorizedAccessException: Access to the path 'D:\home\site\wwwroot\App_Data\TEMP\ExamineIndexes\RD000D3A203E7D\External\Index\segments.gen' is denied. at Lucene.Net.Store.SimpleFSDirectory.OpenInput(String name, Int32 bufferSize) at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit) at Lucene.Net.Index.SegmentInfos.Read(Directory directory) at Lucene.Net.Index.SegmentInfos.ReadCurrentVersion(Directory directory) at Lucene.Net.Index.DirectoryReader.IsCurrent() at Lucene.Net.Index.DirectoryReader.DoReopenNoWriter(Boolean openReadOnly, IndexCommit commit) at Examine.LuceneEngine.Providers.LuceneSearcher.ValidateSearcher(Boolean forceReopen) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 359 at Examine.LuceneEngine.Providers.LuceneSearcher.GetSearcher() in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 239 at Examine.LuceneEngine.Providers.BaseLuceneSearcher.Search(ISearchCriteria searchParams, Int32 maxResults) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\BaseLuceneSearcher.cs:line 175 at Application.BusinessLogic.Services.SearchService.SearchTalentByUrl(String type, String urlPart) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.BusinessLogic\Services\SearchService.cs:line 252 at Application.BusinessLogic.Services.TalentDetailService.GetTalentByUrl(String type, String urlPart) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.BusinessLogic\Services\TalentDetailService.cs:line 109 at Application.Web.ContentFinders.TalentContentFinder.TryFindContent(PublishedContentRequest request) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.Web\ContentFinders\TalentContentFinder.cs:line 33 at System.Linq.Enumerable.Any[TSource](IEnumerable1 source, Func2 predicate) at Umbraco.Web.Routing.PublishedContentRequestEngine.FindPublishedContent() at Umbraco.Web.Routing.PublishedContentRequestEngine.FindPublishedContentAndTemplate() at Umbraco.Web.Routing.PublishedContentRequestEngine.PrepareRequest() at Umbraco.Web.UmbracoModule.ProcessRequest(HttpContextBase httpContext) at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

    Regards

    Ismail

  • Sebastiaan Janssen 5044 posts 15475 karma points MVP admin hq
    Mar 11, 2016 @ 16:25
    Sebastiaan Janssen
    0

    I gather from Twitter that you're using Azure? Have you tried setting the Indexers and Searchers to store data on the web worker? You can set it to LocalOnly or Sync. We've seen problems with the default configuration where performance is very poor because the indexes are stored on a remote fileshare. Apart from bad performance it's possible that file locking issues occur because sometimes your Azure site gets moved to a different web worker (server), however sometimes file locks are not released when a move happens and your files will be in use for some time leading to errors like the one above.

    For config options check out the explanations here: http://issues.umbraco.org/issue/U4-5993

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 14, 2016 @ 08:27
    Ismail Mayat
    0

    Sebastiaan,

    We are using Azure. Currently in ExamineSettings.config we have

    <Examine RebuildOnAppStart="false">
    

    We did this because we thought the issue was after doing deploys the index was getting rebuilt and barfing. So looking at http://issues.umbraco.org/issue/U4-5993 we have 2 options:

    1.Sync 2.LocalOnly

    With RebuildOnAppStart set to false if we use either of those 2 options then after a deploy to bin would index get trashed thus requiring manual rebuild?

    Regards

    Ismail

  • Sebastiaan Janssen 5044 posts 15475 karma points MVP admin hq
    Mar 14, 2016 @ 09:16
    Sebastiaan Janssen
    0

    Okay, so you tried to work around what exactly? Do you have a huge index?

    In which case, I think Sync makes the most sense as the only thing it does is copies the existing indexes from the fileshare to your site's temp folder on the web worker. Then it updates both local and fileshare when updates are needed.

    If you don't have a huge index that takes loads of time rebuilding then I would set RebuildOnAppStart to true again and see how it goes, and then use LocalOnly with that.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 14, 2016 @ 10:00
    Ismail Mayat
    0

    Sebastiaan,

    It is a big index takes a while to rebuild. So i will try it with sync. We have staging setup on azure so will try those config updates then slam the site see what happens.

    Cheers

    Ismail

  • Sebastiaan Janssen 5044 posts 15475 karma points MVP admin hq
    Mar 14, 2016 @ 11:03
    Sebastiaan Janssen
    0

    Great, try with Sync and RebuildOnAppStart=false then!

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 14, 2016 @ 14:47
    Ismail Mayat
    0

    Sebastiaan,

    I have turned this on for our staging site. I have also hit with some load and cannot get the indexes to fail over. However what I don't understand is how this swapping to TempStorage will fix the issue.

    We are not load balancing we are on single azure website. So does azure on the hood do some voodoo nas stuff with website folders?

    We just want to confirm this will work before trying it on live as we have a cranky client and do not want to promise any more false dawns.

    Regards

    Ismail

  • Sebastiaan Janssen 5044 posts 15475 karma points MVP admin hq
    Mar 14, 2016 @ 14:53
    Sebastiaan Janssen
    0

    The voodoo is that all of your site's files on Azure live on a fileshare, which means that they don't actually exist on the machine that your website is running from (so the IIS server points to a fileshare to get all the files).

    This means there's a significant lag in actually reading and writing files.

    Examine/Lucene.net obviously doesn't like this lag very much, part of the reason that it's so fast is that it has really fast access to files on disk. If the disk is remote it doesn't have that great access.

    By setting up the TempStorage you force the files to actually move to the temp folder on the IIS server, they not longer just live on the fileshare.

    Hope that makes sense!

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 14, 2016 @ 15:11
    Ismail Mayat
    0

    Aha that makes sense. Cool. We will get that deployed and hopefully should fix this issue.

    Cheers

    Ismail

  • Anthony Dang 1404 posts 2558 karma points MVP 3x c-trib
    Mar 14, 2016 @ 15:25
    Anthony Dang
    0

    Hi Seb

    We looked at the code which uses the temp storage folder. It seems that if the temp storage doesnt exist, then it falls back to the standard one.

    This can be a problem as there will be times when the temp storage doesnt exist, AND the standard index is broken.

    So this "fix" would probably only mitigate the issue, not actually solve it. The real fix is to figure out why the index keeps getting broken.

  • Sebastiaan Janssen 5044 posts 15475 karma points MVP admin hq
    Mar 14, 2016 @ 15:36
    Sebastiaan Janssen
    0

    I agree. And good luck with that. ;-)

    I don't know what you mean "if it doesn't exist". If the indexes exist in the website (App_Data/TEMP/ExamineIndexes) and you set TempStorage to Sync then if the indexes do not exist in the ASP.NET temp folder when the site starts, the indexes will be copied from the fileshare to the web worker's ASP.NET temp folder and be kept in sync with the indexes on the fileshare. If it doesn't exist on the fileshare then.. yeah, you will need to manually build them as RebuildOnAppStart is set to false.

    I'd say try this setup out first. Then consider setting RebuildOnAppStart back to true. This shouldn't constantly rebuild indexes, ONLY when they don't exist yet on app start.

  • Anthony Dang 1404 posts 2558 karma points MVP 3x c-trib
    Mar 14, 2016 @ 15:53
    Anthony Dang
    0

    You're right. We had RebuildOnAppStart = false.

    You said: "if the indexes do not exist in the ASP.NET temp folder when the site starts, the indexes will be copied from the fileshare to the web worker's ASP.NET temp folder"

    So the index is copied? It's not re-generated at the temp storage location?

    The issue we had initially is that the index was corrupt. So what you're saying is the corrupt index will be copied?

  • Sebastiaan Janssen 5044 posts 15475 karma points MVP admin hq
    Mar 14, 2016 @ 16:04
    Sebastiaan Janssen
    0

    What you had before doesn't count any more when you turn Sync on. Yes, they will be copied on app start then when any updates happen they get applied to both the web worker's asp.net temp folder AND the remote fileshare. If your site gets moved to a new web worker (this happens once in a while) the indexes once again first get copied from the fileshare to the web worker's asp.net temp folder and then updates will be applied in both places again.

    If the index does manage to get corrupted on the fileshare then yes, it will be copied in that exact state. Obviously, you could set the setting to LocalOnly but it would mean a rebuild each time the site moves to a different server, which means startup time will be very long when that happens (as you've indicated you have a large amount of content to index).

    So yes, again, it would be great to figure out why indexes get corrupted. I am not help there I'm afraid.

  • Anthony Dang 1404 posts 2558 karma points MVP 3x c-trib
    Mar 14, 2016 @ 17:18
    Anthony Dang
    2

    "If the index does manage to get corrupted on the fileshare then yes, it will be copied in that exact state."

    This is why I drink :(

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 15, 2016 @ 09:01
    Shannon Deminick
    0

    Firstly - it would seem that part of this problem is that you are using the old TempStorage (https://github.com/Shazwazza/UmbracoExamine.TempStorage) which is obsolete because the functionality is included in the Core. Further more, the functionality included in the core has many fixes and works much better. Secondly the legacy TempStorage provider is discontinued and will not longer be developed (i'll make a note of this on GitHub).

    The only way that the index can get actually corrupted - meaning that it is unreadable/openable, for example if there are missing Lucene files is:

    • If someone is mucking around with those files directly - never do this, read or writing
    • If somehow the IIS process is unexpectedly terminated without warning specifically during the exact moment that Lucene is attempting to write files

    Having "Sync" turned on doesn't mean whatsoever that there is more chance that your primary index storage (i.e. non temp storage) is more corruptible.

    I plan on migrating this functionality into the Examine core at some stage whilst allowing more storage options for Azure but I have no time right now. The LocalOnly and Sync options are only available for Umbraco indexers/searchers, these options are not available for any custom indexers/searchers that you may have ... which is part of the reason this functionality needs to be migrated to Examine Core.

    In the meantime, please ensure you are using the Umbraco Core indexers/searchers and not the TempStorage provider.

    And as always, providing steps to reproduce goes a long way (once you are using the Umbraco Core indexers), this includes all information regarding how your environment is setup.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 15, 2016 @ 09:07
    Ismail Mayat
    0

    Shannon,

    I am not using the obsolete package but using what is in core. We do not mess with lucene files directly in any way so rules that one out. The IIS process unexpectedly terminating could be a possibility.

    We are not using any custom indexers just External and internal indexers.

    I will keep an eye on it and if it goes again try and report back a bit more information.

    Regards

    Ismail

  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 15, 2016 @ 09:09
    Lars-Erik Aabech
    0

    Here's the config we have on the site that loses indexed content most often and rebuild from a local server like so. (A couple others, but every six months) I guess it's like it should be?

    <?xml version="1.0"?>
    <!-- 
    Umbraco examine is an extensible indexer and search engine.
    This configuration file can be extended to add your own search/index providers.
    Index sets can be defined in the ExamineIndex.config if you're using the standard provider model.
    
    More information and documentation can be found on CodePlex: http://umbracoexamine.codeplex.com
    -->
    <Examine RebuildOnAppStart="false">
      <ExamineIndexProviders>
        <providers>
          <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               runAsync="true"
               useTempStorage="Sync"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    
          <add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               runAsync="true"
               useTempStorage="Sync"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
    
          <!-- default external indexer, which excludes protected and unpublished pages-->
          <add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
               runAsync="true"
               useTempStorage="Sync"/>
    
        </providers>
      </ExamineIndexProviders>
    
      <ExamineSearchProviders defaultProvider="ExternalSearcher">
        <providers>
          <add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"
               useTempStorage="Sync"
               />
    
          <add name="ExternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               useTempStorage="Sync"
               />
    
          <add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
               enableLeadingWildcard="true"
               useTempStorage="Sync"
               />
    
        </providers>
      </ExamineSearchProviders>
    
    </Examine>
    
  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 15, 2016 @ 09:11
    Lars-Erik Aabech
    0

    @ismail Do you have runAsync="true" btw?
    Strikes me that async combined with IIS process terminating might be the real purp.

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 15, 2016 @ 09:28
    Shannon Deminick
    0

    Just FYI: runAsync="true" is totally unnecessary.

    I need to understand what the actual issue people are seeing here is. 'Corrupted' could mean many things. Ismail is the only one that has posted a stack trace. Is this the exact same issue everyone is seeing?

        System.UnauthorizedAccessException: Access to the path 'D:\home\site\wwwroot\App_Data\TEMP\ExamineIndexes\RD000D3A203E7D\External\Index\segments.gen' is denied. at Lucene.Net.Store.SimpleFSDirectory.OpenInput(String name, Int32 bufferSize) at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit) at Lucene.Net.Index.SegmentInfos.Read(Directory directory) at Lucene.Net.Index.SegmentInfos.ReadCurrentVersion(Directory directory) at Lucene.Net.Index.DirectoryReader.IsCurrent() at Lucene.Net.Index.DirectoryReader.DoReopenNoWriter(Boolean openReadOnly, IndexCommit commit) at Examine.LuceneEngine.Providers.LuceneSearcher.ValidateSearcher(Boolean forceReopen) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 359 at Examine.LuceneEngine.Providers.LuceneSearcher.GetSearcher() in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 239 at Examine.LuceneEngine.Providers.BaseLuceneSearcher.Search(ISearchCriteria searchParams, Int32 maxResults) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\BaseLuceneSearcher.cs:line 175 at Application.BusinessLogic.Services.SearchService.SearchTalentByUrl(String type, String urlPart) in e:\TeamCity\buildAgent3\work\98d835a35a6fb5ea\Source\Application.BusinessLogic\Services\SearchService.cs:line 252 at Application.BusinessLogic.Services.TalentDetailService.GetTalentByUrl(String type, String urlPart) in 
    

    From that stack trace it shows that temp storage is not used since it is trying to access the master index during searching.

    @lars what does loses indexed content mean?

    And when everyone says they are using 'Azure', I'm assuming you mean Azure Apps right? not Azure VMs?

    By now I'm sure you realize that Azure web apps store all files on a remote file share which can cause all sorts of issues. This temp storage technique is currently the only way to resolve issues caused by how their infrastructure is setup. I have discovered that we can persist local files in a different persistent temp storage area on Azure instead of the ASP.Net temp files (which are prone to being cleared when /bin folder changes). The things I'd like to pursue sometime in the future are:

    • Move this temp storage logic into Examine Core
    • Add configuration options to store the temp date into the persistent temp storage area on Azure apps instead of ASP.Net temp storage
    • Port the AzureDirectory blob storage functionality into Examine Core so that it works against Lucene 2.9 so people could have this option as well
    • Test using the logic that exists in AzureDirectory to perform the Sync logic between the main index and the temp index instead of performing a backup/copy as it does now
    • Create an Azure Search Examine provider - Darren already started this IIRC

    I honestly have no idea when I'd find time to do any of this but it's on my ever growing TODO list... if anyone wants to help then lets start the discussion on the Examine GitHub repo.

    Another thing to note is that Azure now supports running your sites 'locally' . I'm assuming they've enabled this feature specifically because people have so many problems running sites from remote file shares.

    https://channel9.msdn.com/Shows/Cloud+Cover/Episode-201-Azure-Web-App-Local-Cache-with-Cory-Fowler

    ​"Local Cache enables your Apps to copy their code to storage on the local VMs running their site. Normally, Apps are run from a network based disk. Changing to storage local to the VM greatly improve the performance of languages like PHP, Node.js, and any other platform that needs to frequently read the files running it."​

  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 15, 2016 @ 09:46
    Lars-Erik Aabech
    0

    @Shannon I'm sure I've sent you a few logs via zendesk with the only exceptions I have. Think I remember seeing the same as Ismail.

    By "losing content" I mean that the index doesn't go corrupt. It goes empty and doesn't rebuild. Might end up with 0 docs, might end up with 5 docs. No apparent exceptions.

    Anyway, I appreciate that this is close to impossible for HQ to tackle, and cross my fingers our collective efforts will solve this sooner or later.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 15, 2016 @ 10:37
    Ismail Mayat
    0

    Shannon,

    We are running azure web app so hopefully the temp storage should fix the problem for us. Lars we are not doing runAsync=true however we have set

    RebuildOnAppStart="false"
    

    Just like you.

    So for us mostly we end up with 0 documents in the index internal and external although we once had it that external had 5 docs we should have 20,000 plus in both.

    Any ways fingers crossed the temp storage will resolve our issue. Will report back with any more issues.

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 15, 2016 @ 10:40
    Shannon Deminick
    0

    To re-iterate some facts:

    • runAsync=true is irrelavent... Examine is always Async, never ever ever ever set this to false. Just remove this setting alltogether and it will run async by default
    • RebuildOnAppStart - if this is false and you have no indexes then you have to manually build them. This setting does one thing: if your index doesn't exist during app startup, it will be created. If you set this to false and your index doesn't exist on app startup, then you will have no index at all and you will need to figure out how you get your index there.
  • Anthony Dang 1404 posts 2558 karma points MVP 3x c-trib
    Mar 15, 2016 @ 10:59
    Anthony Dang
    0

    @Shannon

    The crux of our issue is documents ends up being 0.

    So I guess we need to know what's causing that, and is there a possible fix/workaround?

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 15, 2016 @ 11:09
    Shannon Deminick
    0

    This is before you've used "Sync" temp storage according to Ismail so let us know what the outcome of that is.

    If you might end up with a non corrupt index (unreadable) that is simply empty it's probably due to app restarts. Say for example your site starts and you have no index, Umbraco will start building it. Let's say that at that moment your site restarts - this could be due to any number of things, maybe you are deploying and the file copying is slow so your site is restarting multiple times because multiple /bin files are changing, or multiple config files are changing, etc... Then the index will probably be created but nothing put in it because the site has restarted at that moment.

    If your index is already there, there's no way for it to suddenly end up with 0 docs, the only way this happens is if the index is rebuilt. This can happen if:

    • You rebuild the index manually
    • The indexes don't already exist and you have RebuildOnAppStart =true (which is the default if this setting isn't there)
    • You are scaling out your application and/or you site that is part of a Load Balanced cluster come online after being offline for a long time, in which case it needs to 'cold boot', this means all caches are rebuilt.

    A corrupted index (unreadable) is a different story - these are two different things.

  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 15, 2016 @ 11:16
    Lars-Erik Aabech
    0

    There must be a way for it to suddenly end up with 0 docs.

    We don't rebuild it manually.
    We have RebuildOnAppStart = false
    We only have one server, with fcnMode="single"

    The only way I see it can end up with 0 is the DeleteAll statement in EnsureIndex. Somehow, it must have been called with true for force.

  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 15, 2016 @ 11:20
  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 15, 2016 @ 11:10
    Lars-Erik Aabech
    0

    Right.

    We've got more or less the same numbers as Ismail. (0 or 5, but should have >20.000)

    Just throwing out ideas here after skimming LuceneIndexer.cs:

    • EnsureIndex does DeleteAll if forced. When is it forced?
    • There are try/catches around, but do they really catch all exceptions? Any unhandled exception will make everything go bonkers when async
    • There are commits. How about rollback on any exception?
    • Only commit when done, so nothing happens if IIS process is ripped out from under our feet. (?)
  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 15, 2016 @ 12:33
    Shannon Deminick
    0

    Answers to your questions:

    EnsureIndex does DeleteAll if forced. When is it forced?

    Only here, this is how index rebuild works: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L772

    That said, this logic is unnecessary when rebuilding an existing index since Lucene can re-create an existing index using the same logic found in CreateNewIndex ( https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L732 ) without causing problems with current searches. See: https://github.com/Shazwazza/Examine/issues/37

    There are try/catches around, but do they really catch all exceptions? Any unhandled exception will make everything go bonkers when async

    yes they do during indexing, i've not seen any unhandled exceptions, any errors during indexing will be reported and there's an event

    There are commits. How about rollback on any exception?

    We don't commit if there are errors: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L1545 It could be feasible to perform a rollback (which closes the writer) and then re-open the Writer. I don't think this is going to affect the issue you're seeing but could be worthwhile. If an error occurred though, you'd see it in your logs.

    Only commit when done, so nothing happens if IIS process is ripped out from under our feet. (?)

    Yes, this what happens: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L1545

    Regarding app restarts, see this code here when an indexer is disposed: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneIndexer.cs#L1856

    When the webapp is shutdown it notifies all IRegisteredObject which is the ExamineManager: https://github.com/Shazwazza/Examine/blob/master/Projects/Examine/ExamineManager.cs#L310 which in turn disposes all indexers. If an Index Rebuild is currently happening and the app is shut off, we cannot just lock the app from shutting down and waiting until re-indexing is done since this would cause all sorts of problems. Instead the only option is to lock for a small amount of time (5 seconds) if there is still items in the queue and release when either the items in the queue are processed or the timeout is finished. This the only way I can see you ending up with 0 (or a few) items in your index - a rebuild was issued for one reason or another and the app domain shuts down.

  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 15, 2016 @ 12:39
    Lars-Erik Aabech
    1

    Thanks for the details.
    When I get back to the project in question, I'll create a derived indexer and add all sorts of logging with System.IO.File from all events to see if I can get some more details on when/how it happens.

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 15, 2016 @ 13:20
    Shannon Deminick
    0

    It does log quite a lot if you turn on Debug level logging in log4net

  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 15, 2016 @ 13:25
    Lars-Erik Aabech
    0

    I know, but I don't want debug logging in production. I get more control if I just hack it. :)

    (Oh well, I might as well get to know how to get only examine in debug mode. Lazy towards log4net config.)

  • James Strugnell 84 posts 192 karma points
    Mar 16, 2016 @ 11:55
    James Strugnell
    0

    Just wanted to reference my issue here, which sounds similar:

    https://our.umbraco.org/forum/umbraco-7/using-umbraco-7/75289-azure-website-not-initializing-examine-index-after-scaling

    So my issue occurs after scaling or when Azure moves the host machine from under you, which is what might be happening to you guys when your issue only happens every couple of weeks. I've not resolved our issue yet as it fell down the priority list.

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 17, 2016 @ 09:03
    Shannon Deminick
    2

    Hi all, just thought I'd post up this since it hasn't been documented yet (and i sort of forgot about it): http://issues.umbraco.org/issue/U4-7614

    This is relevant to Azure only and allows for using a different persistent location for locally stored indexes instead of the volatile storage space that is ASP.Net temp files.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 17, 2016 @ 09:49
    Ismail Mayat
    0

    Shannon,

    We are using 7.3.8 is this property tempStorageDirectory present in that version the issue tracker says its due in 7.3.5?

    Also with Sync on our current location of index is

    D:\local\Temporary ASP.NET Files\root\cb0cbdf4\7beae6bd\App_Data\TEMP\ExamineIndexes\RD000D3A207D42\Internal\

    its in appdata not appcode so should not get cleared out after updates to bin?

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 17, 2016 @ 09:56
    Shannon Deminick
    0

    If it says it is due in 7.3.5 then it is... you can always try it or look into the code.

    If you are not using the option I just mentioned, it IS stored in ASP.Net temp files, just like your path says: D:\local\Temporary ASP.NET Files\

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 17, 2016 @ 12:30
    Ismail Mayat
    0

    Shannon,

    I did do rebuild yesterday and indexes were still there so will just leave it for now like this. Keeping an eye on it so far so good.

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 17, 2016 @ 13:07
    Shannon Deminick
    0

    I'll warn you that if you are not using this setting: http://issues.umbraco.org/issue/U4-7614 then at some stage your Local indexes will be cleared out. This is fine if you are using Sync because they will just be re-copied from your master directory but if you are using LocalOnly then they will need to be rebuilt and if you have RebuildOnAppStart=false then that will be a manual job.

    My suggestion if you are using Azure, use the setting mentioned in http://issues.umbraco.org/issue/U4-7614 so that your locally stored indexes are not blown away if you change global.asax, /bin folder (amongst a few others) which will clear your ASP.Net temp files.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 29, 2016 @ 08:32
    Ismail Mayat
    0

    Shannon,

    The indexes died again. I will try tempStorageDirectory attribute see if that helps.

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 29, 2016 @ 09:02
    Shannon Deminick
    0

    Hi Ismail,

    Can you please explain what "indexes died" means?

    Do you mean it's just empty? or is it actually corrupt? Is there logs this time? I'm assuming this is now that you are using "Sync" or "LocalOnly"?

    I really want everyone can be very specific when reporting issues, I don't know if your problem is the same as others and I cannot read minds.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 29, 2016 @ 10:00
    Ismail Mayat
    0

    Shannon,

    Apologies for vague message. It had 0 documents I am using Sync but I do not have tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"

    I am updating to that now hopefully that should fix up everything.

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 29, 2016 @ 10:18
    Shannon Deminick
    2

    Thanks Ismail, that certainly seems very odd and at this point i can only chalk this up to something strange going on with Azure and it's file system. So the index is empty in the normal AppData/TEMP storage? If that is the case I don't think that AzureLocalStorageDirectory will help because when using Sync it still needs to ensure that the master index stored in AppData/TEMP is written correctly and since it somehow get's rewritten as empty that is not good.

    Please keep me updated on this though, i may be wrong and this could 'solve' it. In any case it's better to use this than the ASP.Net temp file location.

    I might have just thought of why indexes get zero'd out. The simplest answer is that Azure has transferred your site to a new web worker with a totally different machine name and if you have your Examine configuration as the default that we ship with with the {machinename} token in your paths, this would of course mean the machine name is now totally different from before and the index no longer exists at that location. Can you confirm this is how you have your index paths configured? If so, then I will have to assume this is probably the cause and something I didn't really consider before!

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Mar 29, 2016 @ 10:36
    Ismail Mayat
    0

    Shannon,

    That is how they are currently configured. Is there anything we can do so that they are not configured this way?

    Regards

    Ismail

  • Dave Woestenborghs 3504 posts 12133 karma points MVP 8x admin c-trib
    Mar 29, 2016 @ 10:41
    Dave Woestenborghs
    0

    We are also having the issues Ismail reported.

    We have this in our config ;

    useTempStorage="Sync" tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"
    

    And have the machine name in the path. We mostly face issues when azure transfers to a new Web worker which seems to happen quite often.

    Dave

  • James Strugnell 84 posts 192 karma points
    Mar 29, 2016 @ 10:33
    James Strugnell
    2

    Hi Shannon,

    I can confirm we have this issue every time Azure moves us to a new host machine (I mentioned this above )

    I keep an eye out for if we've been moved (in the umbracoServer DB table) and if we have I have to manually kick off the indexers. The same goes if manually scale out.

    We are using the LocalOnly setting (as opposed to Sync) so the AppData\Temp path only contains empty folders at the moment.

    James.

  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 29, 2016 @ 10:42
    Lars-Erik Aabech
    0

    This is spot on!

    I can confirm that our "index watcher" has rebuilt indexes at the exact same times as the registeredDate columns in umbracoServer. :)

    Great work, James!

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    May 10, 2016 @ 10:04
    Ismail Mayat
    0

    James,

    When you say keep an eye out for it, how do you mean? Do you poll that table or you have a trigger on it that then does the re index?

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 29, 2016 @ 11:05
    Shannon Deminick
    2

    Ok, this sounds like the issue!! So YES you can fix this, just remove the {machinename}/ token from your paths.

    This is there primarily for load balancers - so people don't have to manually configure this when they wish to load balance. I did not take into account this scenario when working in virtualized cloud appliances where sites are moved from one machine to another.

    I'll update the defaults to not include this and update docs for load balancers who do actually require this setting.

  • Dave Woestenborghs 3504 posts 12133 karma points MVP 8x admin c-trib
    Mar 29, 2016 @ 11:07
    Dave Woestenborghs
    0

    I already updated my environments. Will keep you posted if we keep seeing issues.

    Dave

  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 29, 2016 @ 11:07
    Lars-Erik Aabech
    0

    But shouldn't rebuildOnAppStart trigger and do full rebuild when the site is moved? Seems like it doesn't.

  • James Strugnell 84 posts 192 karma points
    Mar 29, 2016 @ 11:15
    James Strugnell
    0

    Surely Azure should be considered a "Load balanced" environment? What is the fix if you run more than one instance?

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 29, 2016 @ 13:46
    Shannon Deminick
    0

    @James - No Azure websites is not load balanced. There is no more than a single instance of a web worker accessing the db and file system at any given time. This is not load balancing. If you do run more than one instance - this can take several shapes:

    • Scaling out on Azure websites
      • In this case - you would still require the {machinename} token so that your indexes are stored per machine
    • Non Azure websites (i.e. VMs) and/or doing load balancing according to our docs
      • Requiring the {machinename} token would be dependent on how your file system is setup - shared vs replicated. If it is replicate the token is not required

    @Lars - yes rebuildOnAppStart="true" (which is the default if not specfied) should trigger a full rebuild when no index exists. So I agree there is some other issue at play here. When you do scale out, it does rebuild the indexes on the newly created web workers, I've never seen this not happen in all of my tests. I would have to assume something quite specific happens when a site is moved to another worker/machine and is most likely due to something Azure does with it's file systems and controlling how ASP.Net app domains restart. Perhaps in some way Examine attempts to rebuild the index - this starts out by creating an empty index and then maybe based on some timing or whatever Azure is doing in the background it terminates the app domain again leaving an empty index. On the next restart Examine will not rebuild because there's a valid (although empty) index there.

    It would be interesting to add Debug level logging to your Examine loggers so you can see the full log output of Examine during these times. You can do this without changing your global log4net logging level by targeting a specific logger, for example see: https://github.com/umbraco/Umbraco-CMS/blob/dev-v7/src/Umbraco.Web.UI/config/log4net.config under the <!--Here you can change the way logging works for certain namespaces --> text. You could add

    <logger name="UmbracoExamine.DataServices">
        <level value="DEBUG" />
    </logger>
    

    There's certainly a way that we can work around this behavior I just need to determine if it's an Examine change needed or if it should be in UmbracoExamine, but we could:

    • On app start when no indexes are detected and are scheduled to be rebuilt, we create marker files for each index
    • When indexes are rebuilt, these marker files are removed
    • If the app domain is terminated before the rebuild occurs, the marker files will remain
    • During app startup we also check for these marker files, if they exist but a valid index is there, we still rebuild
  • Lars-Erik Aabech 349 posts 1100 karma points MVP 7x c-trib
    Mar 29, 2016 @ 13:55
    Lars-Erik Aabech
    0

    Well, then again I've still got rebuildOnAppStart="false", so that's a faceslap. We'll go back to true. :/
    Soz. (Quite preoccupied these days, this is less than tertiary in brain process)

  • James Strugnell 84 posts 192 karma points
    Mar 29, 2016 @ 14:11
    James Strugnell
    0

    @Shannon - Do you mean to say Azure websites is not load balanced unless you scale out? Since an Azure website can be scaled at any time (if enabled) I would always want to configure an Azure website to work in a load-balanced manner, even if it runs as one instance 90% of the time.

    I haven't specified rebuildOnAppStart so should be using the default of "true". I posted my trace logs on the following forum post previously:

    https://our.umbraco.org/forum/umbraco-7/using-umbraco-7/75289-azure-website-not-initializing-examine-index-after-scaling

    Hope that helps.

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 30, 2016 @ 07:48
    Shannon Deminick
    1

    @James - when you are not scaling out on Azure websites you are definitely not load balancing. If you are scaling out then you are ... BUT, you need to do this in a particular way with a master and slave setup (see docs, NOTE: we do not support having a single Azure website instance that scales). In this case (and based on this problem above), you would need to:

    • Set your Examine config on your master (non-scaled) environment to not have the {machinename} token
    • Set your Examine config on your slave (scale able) environment to have the {machinename} token
  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Mar 30, 2016 @ 07:56
    Shannon Deminick
    1

    I realize though that if you have a scale-able solution setup - you have a master + slave environment - and you only have a single slave (not-scaled), then this potential problem remains since when your slave is moved between machines it will need to rebuild it's indexes since the machine name will have changed... this rebuild should happen automatically. Though there is a chance something like this could happen when sites are transferred to a different machine: https://our.umbraco.org/forum/developers/extending-umbraco/74731-examine-corruption-issues#comment-243531

    I would have to see if I can actually somehow replicate such an issue to see if writing a marker file would actually solve such a problem since we don't actually know if this is a problem or not.

    In the future I would like to backport the AzureDirectory project for Lucene to work with Lucene 2.9 and release under a different fork. Then we can utilize this for more effective scaling with regards to Lucene.

  • James Strugnell 84 posts 192 karma points
    Mar 30, 2016 @ 08:17
    James Strugnell
    0

    Not an easy one. I should re-state that we didn't have this problem until we upgraded from 7.3.1 to 7.3.7, although I've never been sure if the upgraded binaries caused it, or a mistake in my upgrade process.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Apr 01, 2016 @ 12:13
    Jeroen Breuer
    0

    Hello,

    Is there a status update on this issue? We've got an Azure website where scaling is enabled. For now we removed the {machinename} token and everything seems to be working.

    Jeroen

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Apr 01, 2016 @ 13:59
    Shannon Deminick
    0

    Did you read above? https://our.umbraco.org/forum/developers/extending-umbraco/74731-examine-corruption-issues#comment-243584 I can't just remove the token on your scaling worker instance, I won't be able to scale... Unless u have local only turned on

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Apr 01, 2016 @ 14:07
    Jeroen Breuer
    1

    This is our current setup for the external indexer which we use a lot:

    ExamineIndex.config

    <IndexSet SetName="ExternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/External/" />
    

    ExamineSettings.config

    <add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" useTempStorage="Sync"
               tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"/>
    

    If we add the {machinename} token we run into issues where we could lose all index files.

    Jeroen

  • Dave Woestenborghs 3504 posts 12133 karma points MVP 8x admin c-trib
    Apr 01, 2016 @ 14:14
    Dave Woestenborghs
    1

    Hi Jeroen, Shannon,

    Also good to mention is that all editing is done on a seperate environment (VM on azure). There is no content creation on the Web app except for creating members.

    The issues where we ran into is that when Azure changes the machine name the indexes need to be rebuild. If you have the machine name in your index path this will always happen. Without it doesn't.

    And we have some huge indexes. So building them from scratch took a long time that caused other issues for us.

    Dave

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Apr 07, 2016 @ 08:52
    Jeroen Breuer
    0

    Hello,

    Dave explained the environment setup a bit better ;-). So far with this setup it's still best to remove the {machinename} token, but please correct us if we are wrong :-).

    @Shannon you said: "I realize though that if you have a scale-able solution setup - you have a master + slave environment - and you only have a single slave (not-scaled), then this potential problem remains since when your slave is moved between machines it will need to rebuild it's indexes since the machine name will have changed... this rebuild should happen automatically."

    Like Dave said we have huge indexes and building from scratch took too long. So that is not really an option for us.

    Jeroen

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Apr 07, 2016 @ 09:28
    Shannon Deminick
    0

    @jeroen If you are scaling, then perhaps it's better to have 2x scaled out websites setup - and to reiterate - you MUST have {machinename} in the path when you are scaling out. In this setup, you would have 2 instances that are active, one of which should certainly have available indexes, if the other one gets moved to another server and needs to rebuild, Azure should consider that one non responsive and send more requests to the one that does have an index available. Then when you scale out more, yes each one would need to rebuild there as well.

    @all:

    Currently there is no perfect solution to this, you would need to write your own until we can have something in place which may require updates to Examine and/or UmbracoExamine. To solve your problem, here's what you can do:

    • Clone the https://github.com/azure-contrib/AzureDirectory and backport it to work with Lucene 2.9
    • Create sub classes of the examine indexers and searchers shipped with Umbraco and use an AzureDirectory instance instead of the default lucene Directory instance

    What AzureDirectory does is similar to what Umbraco is doing:

    • It stores the master index in Blob storage
    • Only a master server can write to it
    • For each slave server, the blob storage index files are synced to the local machine

    OR

    • There's potentially an ' easy' solution in which you just deal with the fact that the index will need rebuilding on new workers (or if workers are moved between hosts)
    • In this case we can terminate any request coming to the website when we detect an index or the app isn't ready, returning a 503 with a retry-after-header. This should tell the Azure Load Balancer that the node isn't ready and the Azure LB will try again based on it's own timeout.

    OR

    • During startup you could have some logic to detect if there isn't an index on the slave server
    • In that case you could create some sort of REST endpoint on your master server that your slaves could talk to and request a snapshot of the index which they can download and store locally

    OR

    • Attempt to create an Azure Search or Elastic Search Examine provider and host indexes in a centralized place

    OR

    • If it's just the Member index that is the main problem regarding rebuilding - only the back office member search uses the member indexer, I don't believe there is anything else in the core that specifically uses the Member indexer
    • You could disable the member indexer and index members in your own custom way with Azure Search, Elastic Search, etc... and use those APIs to search your member data

    Due to the nature of Azure, the way it structures it's file systems, the way it virtualizes things and moves sites between workers - there isn't a perfect solution that we can simply ship with from an Umbraco core perspective that will solve everyone's particular problems. Not everyone uses Azure, and there would probably be other/different issues with other virtualized hosts that can scale out - and again, a specific solution would probably need to be created for that in one way or another.

    I would certainly enjoy some help with any of this since there is quite a lot of options, work, etc...

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    May 10, 2016 @ 09:47
    Ismail Mayat
    0

    Guys,

    We took out machine name from the config and we turned on tempStorage. It all seemed to be working fine however now we get regular:

     System.IO.DirectoryNotFoundException: Could not find a part of the path 'D:\local\Temporary ASP.NET Files\root\a38b9d33\a911d035\App_Data\TEMP\ExamineIndexes\Internal\segments_7'.
    

    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit) at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPolicy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexDivisor) at UmbracoExamine.UmbracoExamineSearcher.OpenNewReader() at Examine.LuceneEngine.Providers.LuceneSearcher.ValidateSearcher(Boolean forceReopen) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneSearcher.cs:line 307 --- End of inner exception stack trace ---

    Errors. For now i have turned off tempstorage and will see how it behaves. So no machine name in config and auto rebuild is false. My question I guess for Shannon will this still barf as azure does it voodoo? If it does then I am going to have to do 2am index daily rebuilds?

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    May 10, 2016 @ 09:56
    Shannon Deminick
    3

    See above where I mention http://issues.umbraco.org/issue/U4-7614, available in 7.3.5+ specifically for Azure

    Storing index files in asp.net temp storage is slightly dangerous because that storage is volatile (i.e. if you're /bin or global.asax changes it will be cleared out). Also because Azure does some voodoo when moving your site between web workers this also becomes a pain. With U4-7614, it means the local azure storage is less volatile ... though when azure moves your site between web workers your index will still need to rebuild - unless of course you have 'Sync' turned on (which is the whole point of sync).

    Let's re-iterate some facts:

    • Lucene + Azure web apps (Remote file storage) == Awful for performance
    • Lucene files need to be stored locally for good performance == useTempStorage
    • The better option for local Lucene storage with Azure is with U4-7614
    • Having rebuild on startup turned off + LocalOnly (Sync turned off) + useTempStorage == you will not have indexes
    • Having {machinename} in your path on Azure is annoying because Azure moves your site between workers/machines so the path will change. BUT if you are auto-scaling on Azure then you have to have {machinename} there or else everything will explode. But this means that when a site comes online and the index location doesn't exist, it needs to be rebuilt

    In my opinion, if you are using Azure web apps and are NOT auto-scaling, you should use these settings:

    • useTempStorage="Sync"
    • use this feature to store local index files: http://issues.umbraco.org/issue/U4-7614
    • Remove the {machinename} token from your index path
    • RebuildOnAppStart="true" - since this should only happen one time

    If you are using Azure web apps and are load balancing w/ auto-scaling your front-end workers then:

  • Jamie Gilbert 8 posts 39 karma points
    May 11, 2016 @ 13:47
    Jamie Gilbert
    0

    Hi,

    Using Azure web apps and balancing with auto scaling my front end workers, I have the recommended configuration as per your post, however I have a problem-

    When the number of workers scales out the machine name changes and the 'RebuildOnAppStart=true' setting doesn't seem to be working, the indexes arent rebuilt and are broken due to the machinename change.

    Regards, Jamie

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    May 11, 2016 @ 20:54
    Shannon Deminick
    0

    As always, it's difficult to help without knowing version information, etc... There's been reports of 'cold boot' not working effectively on 7.3.x, please ensure you are using the latest version and see if that works. When a server comes online for the first time it 'cold boots', you can set the log4net level to Debug and see if you get log entries for index items being created. If you don't it's probably an issue with the umbraco version you are using.

  • Jamie Gilbert 8 posts 39 karma points
    May 13, 2016 @ 14:50
    Jamie Gilbert
    0

    Hi,

    Using version 7.38 at the moment - limited to this due to Archetype support for now. Is there a known issue with it?

    Cheers, Jamie

  • Chris Foster 1 post 71 karma points
    Jun 05, 2017 @ 13:04
    Chris Foster
    0

    If you are using Azure web apps and are load balancing w/ auto-scaling your front-end workers then:

    • You must have the {machinename} token from your index path

    Can you confirm that this should only be set on the Slave? I followed the instructions from the docs and set it on both master and slave. We have experienced many problems related to indexing and multiple restarts on warm up, often resulting in outages of over 20 minutes. I was about to disable the scaling options until I stumbled across this post.

    Also very much looking forward to having an easy to configure AzureDirectory.

    Many thanks

    Chris

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    May 10, 2016 @ 10:15
    Ismail Mayat
    0

    Shannon,

    I had:

                      <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
              supportUnpublished="true"
              supportProtected="true"
                 useTempStorage="Sync"              
              tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"
              analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    

    So useTempStorage=sync and tempstorage directory was for local azure storage. Using Umbraco 7.3.8 so this should not be using folder like

    D:\local\Temporary ASP.NET Files\root\a38b9d33\a911d035\App_Data\TEMP\ExamineIndexes\Internal
    

    I am confused?

    Regards

    Ismail

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    May 10, 2016 @ 10:40
    Ismail Mayat
    0

    Shannon,

    One thing in the issue tracker you mention:

    NOTE: Just like the useTempStorage option, this would need to be specified for the searcher as well.

    So in my config for the searcher bit i should have:

    useTempStorage="Sync"
           tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"
    

    as well?

    So my whole examinesettings.config should look like:

      <add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
           supportUnpublished="true"
           supportProtected="true"
             useTempStorage="Sync"
           tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"
           analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
    
        <!-- default external indexer, which excludes protected and unpublished pages-->
        <add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" 
             useTempStorage="Sync"
             tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"/>
    </providers>
    

      <add name="ExternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"  useTempStorage="Sync"
           tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"/>
    
      <add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"  useTempStorage="Sync"
           tempStorageDirectory="UmbracoExamine.LocalStorage.AzureLocalStorageDirectory, UmbracoExamine"
           analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcard="true" />
    
    </providers>
    

    Regards

    Ismail

  • Tim 1193 posts 2675 karma points MVP 3x c-trib
    May 10, 2016 @ 14:03
    Tim
    0

    Just to chime in here, we're having identical issues to Ismail. Even though everything's set to use Sync, the Internal Indexer occasionally craps out looking for the segments file in the ASP.Net temp folder, rather than the user temp folder.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    May 10, 2016 @ 14:40
    Ismail Mayat
    0

    Tim,

    I have updated my config so i am now doing rebuild indexes on restart. Have pushed up to my azure staging I am hoping that after the web worker switch which I am assuming is like a restart it will find the empty index and rebuild it?

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    May 10, 2016 @ 15:24
    Shannon Deminick
    0

    Tim, i feel like all of this information is just being lost or people are not finding it and/or people are simply not reading it. Please see: https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/74731-examine-corruption-issues#comment-246649

  • Tim 1193 posts 2675 karma points MVP 3x c-trib
    May 10, 2016 @ 16:01
    Tim
    0

    Hi Shannon,

    We had all of that set, except the rebuild on startup, have re-published and will keep an eye on it and see if the issue happens again.

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 22, 2016 @ 06:39
    Jeroen Breuer
    0

    If you upgrade to Examine 0.1.69-beta it's easier to use AzureDirectory like Shannon described here: https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/74731-examine-corruption-issues#comment-244293

    I've got it working and my Examine indexes are now stored in blob storage. More info about my setup in this topic: https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/78818-using-azuredirectory-with-examine

    Jeroen

  • Tim 1193 posts 2675 karma points MVP 3x c-trib
    Jul 22, 2016 @ 08:30
    Tim
    0

    How's the performance of the blob storage provider?

  • Jeroen Breuer 4908 posts 12265 karma points MVP 4x admin c-trib
    Jul 22, 2016 @ 08:42
Please Sign in or register to post replies

Write your reply to:

Draft