Indexing API

Q. Using API for things other than job posting and broadcasting events doesn’t bring any benefits.

  • (06:49) The person asking the question is interested whether he can use API for pages different than job posting and broadcasting events, like, for example, news articles and blogs. John says that people try doing that, but essentially what Google has documented is what it uses the API for. If there isn’t content that falls into those categories, then the API isn’t really going to help. But trying that won’t negatively affect the content.

Unlinked Brand Mention

Q. Unlinked brand mention doesn’t really affect SEO.

  • (12:01) When there is an article, that mentions the website without linking to it, it is not a bad thing. It is only a little inconvenient for users, because they have to search for the website being mentioned. But, otherwise, John says he wouldn’t assume there’s some SEO factor that is trying to figure out where someone is mentioning the website’s name. 

User Reviews and Comments

Q. Non-spammy comments help Google understand the page a little better.

  • (12:58) A really useful thing about the user comments is that oftentimes people will write about the page in their own words, and that gives Google a little bit more information on how it can show this page in the search results. From that point of view, comments are a good thing on a page. Obviously, finding a way to maintain them in a reasonable way is something tricky, because people also spam those comments, and all kinds of crazy stuff happens in them. But overall once a way to maintain comments on a web page is found, that gives the page a little bit more context and helps people who are searching in different ways to also find this content. 

SSL Certificates

Q. Any kind of valid SSL certificate works fine for Google.

  • (14:03) Different types of SSL certificates are not important for SEO, and just free SSL certificates are perfectly fine. The different types of certificates is more a matter of what is wanted with this certificate. From Google point of view, it just watches out for whether the certificate is valid or not.

Multiple Job Postings

Q. Posting the same job posting in different subdomains from the same root domain is fine.

  • (14:47) John assumes that posting the same job posting in different subdomains with the same job posting data structure is perfectly fine, because it’s very common to have the same job posted on different websites, and those different websites could have structured data on them as well. From that point of view, just having it posted different times on the same website or in different subdomains should be perfectly fine as well. However, John mentions that he doesn’t know all the details of all the guidelines around job postings, so it might be that there’s some obscure mention that it should be only listed once on each website. 
  • John also says that usually Google tries to de-dupe different listings, and that is done for all kinds of listings. So if it’s an image or if it’s a web page, or anything else, if Google can recognise that it’s the same primary content, it will try to just show it once. He assumes the same rule applies for Google Jobs.

Internal Duplicate Content

Q. Having the same content as a PDF on the website and as a block article is not viewed as duplicate content.

  • (17:10) The person asking the question has a situation where she has put a piece of content as PDF file on her website and wants to use the same content to present it as an HTML block article on the same website and is worried that it might be viewed as duplicate content. John assures her that Google wouldn’t see it as duplicate content, because it’s different content. One is an HTML page, on is a PDF. Even if the primary piece of content on there is the same, the whole thing around it is different. From that level, Google wouldn’t see it as duplicate content. At most, the difficulty might be that in the search results it can happen that both of these show up at the same time. From SEO point of view, it is not necessarily a bad things, but maybe there’s a personal strategic reasons to have either the PDF or the HTML page more visible.

Paginated Content

Q. In paginated content Google views first pages of content as more important than pages that come after.

  • (19:39) The person asking the question has a website with discussions, where a thread can have too many comments to have them all in one long page. He wants to make it a paginated content but is not sure whether the newest comments should appear on the first page or on the last pages. John says that that is something that is ultimately up to the person asking the question and which comments he wants to prioritise. John assumes that if something is on page four, then Google would have to crawl page one, tow, three first to find that, and usually that would mean that it’s longer away from the main part of the website. From Google’s point of view, what would probably happen there is Google wouldn’t give it that much weight, and probably Google wouldn’t recrawl that page as often. Whereas if the newest comments should be the most visible ones, then maybe it makes sense to reverse that order and show them differently. Because if the news comments are right on the main page, then it’s a lot easier for Google to recrawl that more often and to give it a little bit more weight in the search results. That’s up to the website owner how he wants to balance that.

Page Number

Q. Google doesn’t have a specific ratio on how many pages or how many indexable pages a website should have.

  • (28:48) From Google’s point of view, there’s no specific ratio that Google would call out for how many pages a website should have or how many indexable pages a website should have. That’s ultimately up to a website owner. What John says he tends to see is that fewer pages tend to perform better in the sense that if the value of the content is concentrated on fewer pages, then, in general, those few pages tend to be a lot stronger than if the content was to be diluted across a lot of different pages. That plays across the board, in the sense that, from a ranking point of view, Google can give these pages a little bit more weight. From a crawling point of view, it’s easier for Google to keep up with these. So especially if it’s a new website, John recommends starting off small focusing on something specific and then expanding from there, and not just going in and creating 500,000 pages that Google needs to index. Because especially for a new website, when it starts off with a big number of pages, then chances are, Google will just pick up a small sample of those pages, and whether or not that small sample is the pages most important to the website is questionable.

Referring Pages

Q. If URLs referring to the pages of the website are long-retired microsite domains, it is perfectly fine.

  • (30:36) John says that URLs from long-retired microsite domains referring to the important pages on the website are not bothersome at all. So in particular, the referring page in the Inspection Tool is where Google first sees the mention of the pages, and if it first sees them on some random website, then that’s just where it saw them. That is what is listed there, it doesn’t mean that there’s anything bad with those pages. From Google’s point of view, that’s purely a technical thing. It’s not a sign that there is a need to make sure that the pages were first found on some very important part of the website. If the pages are indexed, that’s the important part there. Referring page is useful when there’s a question on how Google even found the page, or where it comes from. If there are weird URL parameters in there, or if there’s something really weird in the URL that Google should never have found in the first place, then looking at the referring URL is something that helps to figure out where this actually comes from. 

A Drop In Crawl Stats

Q. There are several things that Google takes into account when deciding on the amount of crawling it does on a website.

  • (35:09) On the one hand, Google tries to figure out how much it needs to crawl from a website to keep things fresh and useful in its search results. That relies on understanding the quality of the website, how things change on the website. Google calls it the crawl demand. On the other hand, there are the limitations that Google sees from the server, from the website, from the network infrastructure with regard to how much can be crawled on a website. Google tries to balance these two.
  • The restrictions tend to be tied to two main things – the overall response time to requests to the website and the number of errors, specifically, server errors, that can be seen during crawling. If Google sees a lot of server errors, then it will slow down crawling, because it doesn’t want to cause more problems. If it sees that the server is getting slower, then it will also slow down crawling. So those are the two main things that come into play there. The difficulty with the speed aspect is that there are two ways of looking at speed. Sometimes that gets confusing when looking at the crawl rate.
  • Specifically for the crawl rate, Google just looks at how quickly it can request a URL from the server. The other aspect of speed is everything around Core Web Vitals and how quickly a page loads in a browser. The speed that it takes in a browser tends not to be related directly to the speed that it takes for Google to fetch an individual URL on a website, because in a browser the JavaScript needs to be processed, external files need to be pulled in, content needs to be rendered, and all of the positions of the elements on the page need to be recalculated. That takes a different amount of time than just fetching that URL. That’s one thing to watch out for.
  • When trying to diagnose a change in crawl rate there’s no need to look at how long it takes for a page to render, instead, it’s better to look at just purely how long it takes to fetch that URL from the server.
  • The other thing that comes in here as well – is that, from time to time – depending on what is done on the website, Google tries to understand where the website is actually hosted. If Google recognises that a website is changing hosting from one server to a different server – that could be to a different hosting provider, that could be moving to a CDN, or changing CDNs, anything like that – Google’s systems will automatically go back to some safe rate where it knows that it won’t cause any problems, and then, step by step, increase again.

Link Juice

Q. Link Juice is a great way to tell Google which pages on the website are important.

  • (46:01) “Link Juice” is always one of those terms where people have very conflicting feelings about it because it’s not really how Google’s systems look at it. With regard to internal linking, this is one of the most important elements of a website because it is a great way to tell Google what is considered to be important on the pages. Most websites have a home page that is seen as the most important part of the website, and especially links that can be provided from those important pages to other pages that are thought to be important – that’s really useful for Google. It can be that these are temporary links too. For example, if there’s an e-commerce site, and a new product is linked to from the home page, then that’s a really fast way for Google to recognise those new products, to crawl and index them as quickly as possible, and to give them a little bit of extra weight. But of course, if those links are removed, then that internal connection is gone as well. With regard to how quickly that is picked up, that’s essentially picked up immediately as soon as Google recrawls and reindexes those pages.

Crawling

Q. There are a couple of ways to go around “discovered, not indexed” URLs.

  • (58:21) John says Google finds all kinds of URLs across the web, and a lot of those URLs don’t need to be crawled and indexed, because maybe they’re just variations of URLs Google already knows, or maybe they’re just some random forum or scraper script that has copied URLs from the website and included them in a broken way. Google finds all of these linked all the time on the web. So it’s very normal to have a lot of these URLs that are either crawled and not indexed or discovered and not crawled, just because there are so many different sources of URLs across the web. John suggests to first of all download a list of a sample of those so that it is possible to look at individual examples and try to classify which of those URLs are actually ones that are important and which of these are ones that can be ignored. Anything that looks really weird as a URL is better to be ignored. Regarding the ones that are important, that’s something where it would be useful to try to figure out what can be done to better tie these to the website with regard to tying them to things like internal linking.

Sign up for our Webmaster Hangouts today!

GET IN CONTACT TODAY AND LET OUR TEAM OF ECOMMERCE SPECIALISTS SET YOU ON THE ROAD TO ACHIEVING ELITE DIGITAL EXPERIENCES AND GROWTH