Crawling of Website.

Q. (00:55) My question is about the crawling of our website. We have different numbers from the crawling of the Search Console in our server log. For instance, we have three times the number of the crawling from Google in our server, and in Search Console, we have one of the three bars. Could it be possible that there maybe something wrong as the numbers are different.

  • (1:29) I  think the numbers would always be different, just because of the way that in Search Console we report on all crawls that go through the infrastructure of Googlebot. But that also includes other types of requests. So for example, I think Adsbot also uses the Googlebot infrastructure. Those kinds of things. And they have different user agents. So if you look at your server logs, and only look at the Googlebot user agent that we use for web search, those numbers will never match what we show in that Search Console.

Why Did Discover Traffic Drop?

Q. (03:10) I have a question in mind. We have a website. And from the last three to four months, we have been working on Google Web Stories.  It was going very well. On the last one, the 5th of March, actually, we were having somewhere around 400 to 500 real-time results coming from Google Discover on our Web Stories. But suddenly, we saw an instant drop in our visitors from Google Discover, and our Search Console is not even showing any errors. So what could be a possible reason for that?

  • (3:48) I don’t think there needs to be any specific reason for that, because, especially with Discover, it’s something that we would consider to be additional traffic to a website. And it’s something that can change very quickly. And anecdotally, I’ve seen that from, or I’ve heard that from sites in these Office Hours that sometimes, they get a lot of traffic from Discover and then suddenly it goes away. And then it comes back again. So it doesn’t necessarily mean that you’re doing anything technically wrong. It might just be that, in Discover, things have slightly changed, and then suddenly you get more traffic or suddenly you get less traffic. We do have a bunch of policies that apply to content that we show in Google Discover, and I would double-check those, just to make sure that you’re not accidentally touching on any of those areas. That includes things, I don’t know offhand, but I think something around the lines of clickbaity content, for example, those kinds of things. But I would double-check those guidelines to make sure that you’re all in line there. But even if everything is in line with the guidelines, it can be that suddenly you get a lot of traffic from Discover, and then suddenly, you get a lot less traffic.

Can We Still Fix the robots.txt for Redirects?

Q.  (14:55) We have had a content publishing website since 2009, and we experienced a bad migration in 2020, where we encountered a huge drop in organic traffic. So the question here is that we had a lot of broken links, so we use the 301 redirect to redirect these broken links to the original articles. But what we did in robots.txt, we disallowed these links so that the crawling budget won’t be gone on crawling this for all four pages. So the main thing here, if we fixed all these redirects, we redirected to the same article with the proper name, can we remove these links from the robots.txt, and how much time does it take to actually be considered by Google.

  • (15:53) So if the page is blocked by the robots.txt, we wouldn’t be able to see the redirect. So if you set up a redirect, you would need to remove that block in the robots.txt. With regards to the time that takes, there is no specific time, because we don’t crawl all pages at the same speed. So some pages we may pick up within a few hours, and other pages might take several months to be recrawled. So that’s, I think, kind of tricky. The other thing, I think, is worth mentioning here is, that if this is from a migration that is two years back now, then I don’t think you would get much value out of just making those 404 links show content now. I can’t imagine that would be the reason why a website would be getting significantly less traffic. Mostly, because it’s– unless these pages are the most important pages of your website, but then you would have noticed that. But if these are just generic pages on a bigger website, then I can’t imagine that the overall traffic to a website would drop because they were no longer available.

Some of the Blogs Posts Aren’t Indexed, What Can We Do?

Q. (18:54) My question is a crawling question pertaining to discovered not indexed. We have run a two-sided marketplace since 2013 that’s fairly well established. We have about 70,000 pages, and about 70% of those are generally indexed. And then there’s kind of this budget that crawls the new pages that get created, and those, we see movement on that, so that old pages go out, new pages come in. At the same time, we’re also writing blog entries from our editorial team, and to get those to the top of the queue, we always use this request indexing on those. So they’ll go quicker. We add them to the sitemap, as well, but we find that we write them and then we want them to get in to Google as quickly as possible.  As we’ve kind of been growing over the last year, and we have more content on our site, we’ve seen that that sometimes doesn’t work as well for the new blog entries. And they also sit in this discovered not indexed queue for a longer time. Is there anything we can do to internal links or something? Is it content-based, or do we just have to live with the fact that some of our blogs might not make it into the index?

  • (20:13) Now, I think overall, it’s kind of normal that we don’t index everything on a website. So that can happen to the entries you have on the site and also the blog post on the site. It’s not tied to a specific kind of content. I think using the Inspect URL tool to submit them to indexing is fine. It definitely doesn’t cause any problems. But I would also try to find ways to make those pages as clear as possible that you care about them. So essentially, internal linking is a good way to do that. To really make sure that, from your home page, you’re saying, here are the five new blog posts, and you link to them directly. So that it’s easy for Googlebot when we crawl and index your home page, to see, oh, there’s something new, and it’s linked from the home page. So maybe it’s important. Maybe we should go off and look at that.

Can a Low Page Speed Score Affect the Site’s Ranking?

Q. (27:28) Does low-rating mobile results on Google page speed like LCP, FID, might have affected our website rank after the introduction of the new algorithm last summer? Because we were like the fourth in my city? If I check a web agency or a keyword that we saw after the introduction of this algorithm and go on Google Search Console, we find out that these parameters like LCP, FID for mobile, have a bad rating, like 48, not for desktop, there is 90. So it’s OK. So could this be the problem?

  • (28:24) Could be. It’s hard to say just based on that. So I think there are maybe two things to watch out for. The number that you gave me sounds like the PageSpeed Insights score that is generated, I think, on desktop and mobile. Kind of that number from 0 to 100, I think. We don’t use that in search, for the rankings. We use the Core Web Vitals, where there is LCP FID and CLS, I think. And the metrics that we use are based on what users actually see. So if you go into the Search Console, there’s the Core Web Vitals report. And that should show you those numbers. If it’s within good or bad, kind of in those ranges.

Can Google Crawl Pagination With “View More” Buttons?

Q. (39:51) I recently redesigned my website and changed the way I list my blog posts and other pages from pages one, two, three, four to a View More button. Can Google still crawl the ones that are not shown on the main blog page? What is the best practice? If not, let’s say those pages are not important when it comes to search and traffic, would the whole site as a whole be affected when it comes to how relevant it is for the topic for Google?

  • (40:16) So on the one hand, it depends a bit on how you have that implemented. A View More button could be implemented as a button that does something with JavaScript, and those kinds of buttons, we would not be able to crawl through and actually see more content there. On the other hand, you could also implement a View More button, essentially as a link to page two of those results, or from page two to page three. And if it’s implemented as a link, we would follow it as a link, even if it doesn’t have a label that says page two on it. So that’s, I think, the first thing to double-check. Is it actually something that can be crawled or not? And with regards to if it can’t be crawled, then usually, what would happen here is, we would focus primarily on the blog posts that would be linked directly from those pages. And it’s something where we probably would keep the old blog posts in our index because we’ve seen them and indexed them at some point. But we will probably focus on the ones that are currently there. One way you can help to mitigate this is if you cross-link your blog post as well. So sometimes that is done with category pages or these tag pages that people add. Sometimes, blogs have a mechanism for linking to related blog posts, and all of those kinds of mechanisms add more internal linking to a site and that makes it possible that even if we, initially, just see the first page of the results from your blog, we would still be able to crawl to the rest of your website. And one way you can double-check this is to use a local crawler. There are various third-party crawling tools available. And if you crawl your website, and you see that oh, it only picks up five blog posts, then probably, those are the five blog posts that are findable. On the other hand, if it goes through those five blog posts. And then finds a bunch more and a bunch more, then you can be pretty sure that Googlebot will be able to crawl the rest of the site, as well.

To What Degree Does Google Follow the robots.txt Directives?

Q. (42:34) To what degree does Google honour the robots.txt? I’m working on a new version of my website that’s currently blocked with a robots.txt file and I intend to use robots.txt to block the indexing of some URLs that are important for usability but not for search engines. So I want to understand if that’s OK.

  • (42:49) That’s perfectly fine. So when we recognise disallow entries in a robots.txt file, we will absolutely follow them. The only kind of situation I’ve seen where that did not work is where we were not able to process the robots.txt file properly. But if we can process the robots.txt file properly, if it’s properly formatted, then we will absolutely stick to that when it comes to crawling. Another caveat here is, that usually, we update the robots.txt files, maybe once a day, depending on the website. So if you change your robots.txt file now, it might take a day until it takes effect. With regards to blocking crawling– so you mentioned blocking indexing, but essentially, the robots.txt file would block crawling. So if you blocked crawling of pages that are important for usability but not for search engines, usually, that’s fine. What would happen, or could happen, is that we would index the URL without the content. So if you do a site query for those specific URLs, you would still see it. But if the content is on your crawlable pages, then for any normal query that people do when they search for a specific term on your pages, we will be able to focus on the pages that are actually indexed and crawled, and show those in the search results. So from that point of view, that’s all fine.

If 40% Of Content Is Affiliate, Will Google Consider Site a Deals Website?

Q. (53:27) So does the portion of content created by a publisher matter, and I mean that in the sense of affiliate, or maybe even sponsored context. Context is a Digiday newsletter that went out today that mentioned that publishers were concerned that if you have, let’s say 40%, of your traffic or content as commerce or affiliate, your website will become or considered by Google, a deals website, and then your authority may be dinged a little bit. Is there such a thing that’s happening in the ranking systems algorithmically?

  • (54:05) I don’t think we would have any threshold like that. Partially, because it’s really hard to determine a threshold like that. You can’t, for example, just take the number of pages and say, this is this type of website because it has 50% pages like that. Because the pages can be visible in very different ways. Sometimes, you have a lot of pages that nobody sees. And it wouldn’t make sense to judge a website based on something that, essentially, doesn’t get shown to users.

Sign up for our Webmaster Hangouts today!

GET IN CONTACT TODAY AND LET OUR TEAM OF ECOMMERCE SPECIALISTS SET YOU ON THE ROAD TO ACHIEVING ELITE DIGITAL EXPERIENCES AND GROWTH