Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Harriet Lawrie 7 posts 78 karma points
    Feb 07, 2018 @ 12:51
    Harriet Lawrie
    0

    Multilingual Umbraco Website cannot be scraped?

    I have created a multilingual Umbraco website which has 3 domain names pointing to it for each language. The site has gone live and people are starting to share links to it on LinkedIn and other social media. I have metadata in the website which should be picked up when these links are shared. On LinkedIn when the link is shared it has 'coming soon' as the strap-line, which is what was in the holding page months ago suggesting the site isn't being re-scraped.

    I used the Facebook link debugging tool and that was returning a run-time error with a 500 response code.

    My co-worker insists that there is nothing wrong with the DNS and there aren't any errors in the code of the website so I am wondering if anyone has any ideas why the website cannot be scraped?

    It also has another issue where one of the domains sometimes doesn't redirect to it's www. version despite have a redirect on the DNS which may be related.

    Is there some specific Umbraco configuration that I may have missed? Or a bug within Umbraco that may cause this?

    Aside from this issue the website is working fine, it is just these scrapers seem to be unable to hit the website successfully.

    This was also posted on stack overflow https://stackoverflow.com/questions/48664191/multilingual-umbraco-website-cannot-be-scraped

  • Dan Diplo 1554 posts 6205 karma points MVP 5x c-trib
    Feb 07, 2018 @ 13:05
    Dan Diplo
    1

    I very much doubt this has anything to do with Umbraco itself. It will probably be a server/DNS or configuration issue.

    Things I'd check:

    1. Do you have a robots.txt file that could be blocking access to crawlers?

    2. Investigate the 500 error from the Facebug debugging tool. Check the umbraco logs to see what the issue is. Check the URL directly.

    3. If you use Google, check their webmaster tools for any scrape issues.

    4. Try accessing the site from various networks to determine that DNS has resolved correctly (eg. try your home wifi, 4G mobile, office network). There are also plenty of 3rd party websites that will check a site and let you know if there are any connection issues.

    5. Check your HTML is validated - https://validator.w3.org/

    6. Check your IIS logs for more info on connections - usually the user agent string is logged there

Please Sign in or register to post replies

Write your reply to:

Draft