Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Mayfly Media 12 posts 177 karma points
    Aug 20, 2015 @ 09:58
    Mayfly Media
    0

    Large media index takes ages to run

    We have an Umbraco 6.2.5 site within which we are storing approximately 200,000 media items in the Umbraco media library. The images we are storing are all between about 500KB and 1MB in size.

    We use the CogUmbracoExamineMediaIndexer to index our media. When media items are uploaded in bulk (20 at a time) the media indexer kicks off but the site memory usage begins to gradually increase and the CPU maxes out. The process appears to run for many hours before dying down.

    Often the client performs multiple bulk uploads in a day resulting in the indexer running almost continuously.

    A few questions; any ideas why the indexer could be running for so long for only 20 images? Is the high memory and CPU usage a known issue with the indexer?

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Aug 20, 2015 @ 10:05
    Shannon Deminick
    0

    Firstly the performance problem is the data lookups. The actual indexing process is CPU intensive but it is fast.

    I don't know what CogUmbracoExamineMediaIndexer does, i would suspect the problem is part of that. Perhaps for each item it's also doing some queries, or other operations (i.e. if it's analyzing each image that would be really really terrible for performance) and you'll have N+1. My advise would be to start looking there to see what is happening.

    Also, what version of Umbraco are you using as this can greatly change the performance. Older versions of Umbraco don't lookup data in a very efficient manner.

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Aug 20, 2015 @ 10:06
    Shannon Deminick
    0

    Also, i hope you are not rebuilding this index? Adding to the index 20 at a time should be fine... rebuilding it would be quite costly but it should still work unless this CogUmbracoExamineMediaIndexer is doing something it shouldn't under the hood.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Aug 20, 2015 @ 10:07
    Ismail Mayat
    0

    The cogmediaindexer uses tika that is Java and ikvm wrapper around that so that will add up with regards to performance when you have lots of media.

    Regards

    Ismail

  • Shannon Deminick 1524 posts 5269 karma points MVP 2x
    Aug 20, 2015 @ 12:23
    Shannon Deminick
    0

    yikes! I'd suggest that is probably most of the issue here. You'll be spinning up a Java VM for this which has got to be pretty processor heavy, then I assume Tika is going to try to open up all your files to read them, this will probably occupy a lot of memory and CPU.

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Aug 20, 2015 @ 15:49
    Ismail Mayat
    0

    A few other people have had performance issues when indexing to the order of thousands, I only tested with 10 20 documents.

    Are you looking to build some kind of front end searchable image library? Are you using cogmediaindexer to get exif data out of images? If not then do you need cogmediaindexer?

Please Sign in or register to post replies

Write your reply to:

Draft