No-Index and Crawling
Q. If a website had no index pages at some point, and Google hasn’t picked up the pages after they became indexable, it can be fixed by pushing the pages to get noticed by the system
- (08:18) The person asking the question is concerned with the fact that there are a handful of URLs on his website that at some point had no index tag. A couple of months have passed since the removal of no-index, but Search Console still shows that those pages have a no-index tag from months ago. He resubmitted the sitemap, requested indexing via Search Console, but the pages are still not indexed. John says that sometimes Google is a little bit conservative with regards to submitting indexing requests. If Google sees that a page has a no-index tag for a long period of time, then it usually slows down with crawling of that. That also means that when the page becomes indexable again, Google will pick up crawling again, so it’s essentially that one kind of push that’s needed.
Another thing is that, since Search Console reports on essentially the URLs that Google knows for the website, it might be that the picture looks worse than it actually is. That might be something that could be seen by, for example, looking in the performance report and filtering for that section of the website or the URL patterns to see if that number of high no-index pages in Search Console is basically reporting on pages that weren’t really important and the important pages from those sections are actually indexed.
Sitemap is a good start, but there is another thing that could make everything clearer for Google – internal linking. It is a good idea to make it clear with internal linking that these pages are very important for the website so that Google crawls them a little bit faster. And that can be a temporary internal linking, where, for example, for a couple of weeks, individual products are linked from the homepage. When Google finds that the internal linking has significantly changed, Google will go off and double-check those pages. That could be a temporary approach to pushing things into the index again. It’s not saying that those pages are important across the web, but rather that they’re important pages relative to the website. So if the internal linking is changed significantly, it can happen that other parts of the website that were just barely indexed, drop out at some point, so that’s why changes in the internal linking need to be done on a temporary level and changed back afterwards.
Canonical and Alternate
Q. Rel=“canonical” indicates that the link mentioned is the preferred URL, rel=“alternate” means there are alternate versions of the page as well.
- (14:25) If there’s a page that has rel=“canonical” on it, it essentially means that with the link that is mentioned there is the preferred URL and the rel=“alternate” means that there are alternate versions of the page as well. For example, if there are different language versions of a page, and there is a page in English and a page in French there would be the rel=“alternate” link between those two language versions. It’s not saying that the page where that link is on is the alternate but rather that these are two different versions and one of them is in English, one of them is in French, and for example, they can both be canonical – having that combination is usually fine. The one place to watch out a little bit is that the canonical should not be across languages – so it shouldn’t be that on the French page there is a canonical set to the English version because they’re different pages essentially.
Rel=“canonical” or no-index?
Q. When there are URLs that don’t need to be indexed, the question whether to use rel=“canonical” or no-index depends on whether these pages need to be not shown in search at all or if they need to be most likely not shown in search.
- (16:49) John says that both options, rel=“canonical” and no-index are okay to use for the pages that are not supposed to be indexed. Usually, what he would look at there, is what the strong preference is. If the strong preference is not wanting the content to be shown at all in search, then a no-index tag is the better option. If the preference lies more with everything being combined into one page and if some individual ones show up, it’s not important, but most of them should be combined, then a rel=“canonical” is a better fit. Ultimately, the effect is similar in that it’s likely that the page won’t be shown in search, but with a no-index it’s definitely not shown, then with a rel=“canonical” it’s more likely not to be shown.
Response Time and Crawling Rate
Q. If crawling rate decreases due to some issues, like high response time, it takes a little bit of time for the crawling rate to come back to normal, once the issue is fixed
- (20:25) John says that the system Google has is very responsive in slowing down to make sure it’s not causing any problems, but it’s a little bit slower in ramping back up again. It usually takes more than a few days, maybe a week or longer. There is a way to try and help that: in the Google Search Console Help Center, there’s a link to a form where one can request that someone from the Google team takes a look at the crawling of the website and gives them all the related information, especially if it’s a larger website with lots of URLs to crawl. The Googlebot team sometimes has the time to take action on these kinds of situations and would adjust the crawl rate up manually, if they see that there’s actually the demand on the Google side and that the website has changed. Sometimes it’s a bit faster than the automatic systems, but it’s not guaranteed.
Indexed Pages Drop
Q. Indexed pages drop are usually have to do with Google recognising the website content as irrelevant
- (26:02) The person asking the question has seen that the number of indexed pages has dropped on her website, as well as a drop in the crawling rate. She asks John if the drop in crawling rate could be the cause of indexed pages drop. John says that Google crawling pages less frequently is not related to a drop in indexed pages, indexed pages are still kept in the index – it’s not that the pages expire after a certain time. That wouldn’t be related to the crawl rate unless there are issues where Google receives 404 instead of content. There could be a lot of reasons why indexed pages drop, the main thing that John sees a lot being the quality of these pages. Google’s systems kind of understands that the relevance or quality of the website has gone down and because of that, it decides to index less.
Improving Website Quality
Q. Website’s quality is not some kind of quantifiable indicator – it’s a combination of different factors
- (34:35) Website quality is not really quantifiable in the sense that Google doesn’t have Quality Score for Web Search like it might have for ads. When it comes to Web Search, Google has lots of different algorithms that try to understand the quality of a website, so it’s not just one number. John says, that sometimes he talks with the Search Quality Team to see if there’s some quality metric that they could show, for example, in Search Console. But it’s tricky, because they could create separate quality metrics to show in Search Console, but then that’s not the quality metrics that they could actually use for search, so it’s almost misleading. Also, if they were to show exactly what the quality metric that they use, then on the one hand that opens things up a little bit for abuse, on the other hand, it makes it a lot harder for the teams to work internally on improving these metrics.
Website Framework and Rankings
Q. The way the website is made doesn’t really affect its rankings, as Google processes everything as HTML page
- (36:00) A website can be made with lots of different frameworks and formats and for the most part, Google sees it as normal HTML pages. So if it’s a JavaScript based website, Google will render it and then process it like a normal HTML page. Same thing for when it’s HTML already in the beginning. The different frameworks and CMS’s behind it are usually ignored by Google.
So, for example, if someone changes their framework, it isn’t necessarily reflected in their rankings. If a website starts ranking better after changing its framework, it’s more likely due to the fact that the newer website has different internal linking, different content, or because the website has become significantly faster or slower, or because of some other factors that are not limited to the framework used.
PageSpeed and Lighthouse
Q. PageSpeed Insights and Lighthouse have completely different approaches to a website assessment and pull their data from different sources
- (37:39) PageSpeed and Lighthouse are done completely differently in the sense that PageSpeed Insights is run on a data center somewhere with essentially emulated devices where it tries to act like a normal computer. It has restrictions in place that, for example, make it a little bit slower in terms of internet connection. Lighthouse basically runs on the computer of the person using it, with their internet connection. John thinks that within Chrome, Lighthouse also has some restrictions that it applies to make everything a little bit slower than the computer might be able to do, just to make sure that it’s comparable. Essentially, these two tools run in completely different environments and that’s why often they might have different numbers there.
Bold Text and SEO
Q. Bolding important parts in a paragraph might actually have some effect on the sEO performance of the page
- (40:22) Usually, Google tries to understand what the content is about on a web page and it looks at different things to try to figure out what is actually being emphasised there. That includes things like headings on a page, but it also includes things like what is actually bolded or emphasised within the text on the page. So to some extent that does have a little bit of extra value there in that it’s a clear sign that this page or paragraph is considered to be about a particular topic that is being emphasised in the content. Usually that aligns with what Google thinks the page is about anyway, so it doesn’t change that much.
The other thing is that this is to a large extent relative within the web page. So if someone goes off to make the whole page bold and thinks that Google will view it as the whole page being the most important one, it won’t work. When the whole page is bold, everything has the same level of importance. But if someone takes a handful of sentences or words within the full page and says that these words or sentences are really important and bolds them, then it’s a lot easier for Google to recognise these parts as important and give them a little bit more value.
Google Discover Traffic Drop
Q. There can be different factors affecting traffic drop in Google discover: from technical issues to the content itself
- (47:09) John shares that he gets reports from a lot of people that their Discover traffic is either on or off in a sense that the moment Google algorithms determine it’s not going to show much content from a certain website, basically all of the Discover traffic for that website disappears. Also in the other way, if Google decides to show something from the website in Discover, then suddenly there is a big rush of traffic again.
The kind of issue people usually talk about is on the one hand quality issues, where the quality of the website is not so good. With regards to the individual policies that Google has for Discover – these policies are different from web search ones and the recommendations are different too. John thinks that it applies to things like adult content, clickable content etc, all of which is mentioned in the Health Centre Page that Google has for Discover. Sometimes a lot of websites have a little bit of a mix of all of these kind of things, and as John suspects, sometimes Google algorithms just find a little bit too much and then it decides to be careful with this website.
Response Time
Q. The standard for response time for a website doesn’t really depend on the type of website, but rather on how many URLs need to be crawled
- (50:40) The response time is something that plays into Google’s ability to figure out how much crawling a server can take. Usually, the response time from a practical point of view limits or plays into how many parallel connections would be required to crawl. So if Google wants to crawl 1000 URLs from a website, then the response time to spread that out over the course of a day can be pretty large, whereas if Google wants to crawl a million URLs from a website and a high response time is there, then that means it will end up with a lot of parallel connections to the server. There are some limits with regards to the fact that Google doesn’t want to cause issues on the server, so that’s why response time is very directly connected with the crawl rate.
Sign up for our Webmaster Hangouts today!