More indexed pages – higher website quality?

Q. A website having a higher number of indexed pages doesn’t affect its authority

  • (03:52) John says that it’s not the case that if a website has more pages indexed, then Google thinks it’s better than some other websites with less number of indexed pages. The number of indexed pages is not a sign of quality.

Error page redirects during crawling

Q. Sometimes there can be issues with rendering a page that leads to the crawling running into error pages

  • (06:05) When there is a problem with rendering website pages, it might cause the crawling to reach error pages. When those pages are tested in Search Console, it might be the case that 9 times out of 10 it works well, and then it doesn’t work 1 time out of 10 and that redirects to an error page. There might be too many requests to render the page or something complicated with the JavaScript, that sometimes takes too long and sometimes works well. It could even be the case that the page is not found when the traffic is high, and everything works well, when the traffic is down. John explains, that what basically happens is that Google crawls the HTML page, and then tries to process the HTML page in a Chrome-type browser. For that, Google tries to pull in all of the resources that are mentioned there. In the developer console in Chrome, in the network section, there is a waterfall diagram of everything that it loads to render the page. If there are lots of things that need to be loaded, it can happen that things time out, and crawling runs into the error situation. As one of the possible solutions, John suggests getting the developer team to combine the different JavaScript files or combine CSS files, minimise the images and etc.

Pages for different search intents

Q. Website pages for different search intents don’t really define the purpose of a website as a whole

  • (10:00) Google doesn’t really have rules on how a website would be perceived as a whole, depending on whether it has more informational or transactional or some other types of pages. John says that it’s more of a page-level thing. A lot of websites have a mix of different kinds of content, and Google tries to figure out which of these pages match the searcher’s intent and tries to rank those appropriately. He thinks it’s a page-level thing rather than something on a website level. For example, adding lots of informational pages on a website that sells products, doesn’t dilute the product pages.

Redirecting old pages to the parent category page

Q. Old pages redirects to parent category pages will be treated as a soft 404

  • (13:17) The person asking the question has a situation where people are linking to his website pages, but sometimes the pages might be changed or can get deleted, the content comes and goes. And the question is, for example, if it’s a subcategory getting linked in a backlink, and the subcategory gets deleted is it okay to temporarily redirect to the parent category? John says that if Google sees this happening at a larger scale, that there are redirects to the parent level, it will probably see it as a soft 404, and decide that the old page is gone. Redirects might be better for users, but Google will only see 404 – there is little SEO difference. Redirect or no redirect – there’s no penalty.
    When it comes to 301 or 302, John says, there is no difference as well, as Google will either see it as 404 or as canonicalisation question. If it’s a canonicalisation question, then it comes down to which URL Google shows in the search results. Usually, the higher level one will have stronger signals anyway, and Google will focus on the higher level one, so it doesn’t matter if that’s a 301 or a 302.

Q. If a page thats linked to gets deleted and then comes back, it doesn’t change much in terms of crawling 

  • (16:04) If a page that is linked through a backlink gets deleted and then comes back, John says there is a minimal difference in terms of recrawling it. One thing to know is that crawling of that page will be slowed down, as if the page is seen as 404, because there is nothing there, and if there is a redirect, the focus will be on the primary URL not on this one. The crawling slows down until Google gets new signals that tell it there is something new again – that would be internal linking or sitemap file – a strong indication of need for crawling.

References

Q. There is no change that comes from linking someone in a content – it’s purely a usability thing

  • (23:25) John says, that while referencing the original source when making a quote makes sense in terms of website usability, it doesn’t really change anything SEO-wise. It used to be one of the spammy techniques, where people would create a low-quality page and on the bottom link CNN, Google and Wikipedia, and then hope that Google will think the page is good because it referenced CNN.

Guest posts

Q. Guest posts are a good way to raise awareness about your business

  • (27:54) Google’s guidance for links and guest posts is that they should be no-follow. Writing guest posts to drive awareness to a business is perfectly fine. John says an important thing about guest posts is keeping in mind that they should be no-follow, so that the post drives awareness, talks about what the business does and making it easy for users to go to the linked page. Essentially, it’s just an ad for a business.

Product price and ranking

Q. From a web search point of view the price of a product doesn’t play a role in ranking

  • (32:25) Purely from a web search point of view, the price of a product doesn’t make any difference in terms of ranking – it’s not the case that Google recognises price on a page and makes the cheaper product rank higher. However, John points out, a lot of these products end up in kind of the product search results, which could be because a feed was submitted or because the product information on the page was recognised. And there the price of a product might be taken into account and influence the order in which the products appear, but John is not sure. So, from a web search point of view the price of a product doesn’t matter, from a price search point of view – it’s possible. The tricky part is that in SEO often these different aspects of search are combined in one search result page, and maybe there are some product results on the side or see it having an effect in some other way.

Sitemap files and URLs

Q. Generally, it’s better to keep the same URLs in the same sitemap files, but doing otherwise is not really problematic

  • (34:04) John says, that as a general rule of thumb, it’s better to keep the same URLs in the same sitemap files. The main reason for that is Google processing sitemap files at different rates. So if one URL is moved from one sitemap file to another, it might be that Google has the same URL in the system from multiple sitemap files. And if there is different information for a particular URL – like different change dates, for example – then Google wouldn’t know which attribute to actually use. From that point of view if the same URLs are kept in the same sitemap files, it makes it a lot easier for Google to understand and trust that information. John advises trying to avoid shuffling URLs around randomly. But at the same time, it usually doesn’t break processing of a sitemap file, and doesn’t have a ranking effect on a website. There’s nothing in Google’s sitemap system that maps to the quality of a website.

SEO for beginners

Q. There isn’t a one ultimate SEO checklist for beginners, but there are lots of useful sources

  • (35:41) John recommends looking at different SEO starter guides, as there are no official SEO checklists. He suggests looking at the starter guide by Google. Also there are starter guides available from various SEO tools, that for the most part contain correct information. John says, that it seems like it’s a lot less the case that people publish something wrong, especially when it comes to the beginning side of SEO. He suggests focusing on aspects that actually play a role for one’s website. 
    The tricky part is that all of these starter guides, at least the ones he has seen, are often based on an almost old school model of websites where HTML pages were created. And usually when small businesses go online, they don’t create HTML pages anymore – they use WordPress or Wix or any other common hosting platform. They create pages by putting text in, dragging images in and those kinds of things. They don’t realise that in the back, there’s actually an HTML page. So sometimes starter guides can feel very technical and not really map to what is actually being done when these web pages are created. For example, when it’s about title elements, people don’t look at HTML and try to tweak that, but rather they try to find the field in whatever hosting system that they have, and think about what they need to put there. So the guides might seem very technical, but now it’s actually more about filling in the fields and making sure the links are there, and that’s something to keep in mind about the SEO guides.

Multi-regional websites

Q. When creating a multi-regional website, it’s advised to choose one version of a page as canonical 

  • (38:13) When creating a website for different countries, there is an aspect of geo-targeting, which makes everything pretty straightforward. But when it’s about versions of a website within the same country, specifically multi-regional website, the issue of duplicate content becomes more important. The tricky aspect of websites like this is that a multi-regional website would compete with itself. For example, if one news article gets published across five or six different regional websites, then all of these different regional websites try to rank for exactly the same article. That could result in article not ranking as well as it otherwise could. John recommends trying to find canonical URLs for these individual articles, so that there is a preferred version of an article that is on five regional websites. Then Google can concentrate all of its efforts and signals on that one preferred version and rank it a little bit better. It doesn’t have to be the same version all the time – it can be the case that one news article that is within one region is canonical, and a different news article is more canonical for another region.
    As for the categories, sections and the home pages, it seems like the content there is more unique and more specific to the individual region. Because of that John recommends those index-level separate, so that they could all be indexed individually. That works across different domain names as well. So if there are different domains for individual regions, but it’s all a part of the same group, canonical shifting across the different versions can still be done. If it’s done within the same domain with subdirectories, that’s fine too.

301 Redirects

Q. Redirecting all pages at once during a site move is the easiest approach

  • (44:34) John says that there isn’t a sandbox effect when a website is redirecting all of its URLs, at least from his point of view. So he suggests that redirecting all of the website’s pages at once is the easiest approach when making a site move. Google is also tuned to that a little and tries to recognise the process. So when it sees that a website starts redirecting all pages to a different website, it tries to reprocess that a little bit faster so that the site move can be processed as quickly as possible. It’s definitely not the case that Google slow things down if it notices a site move, quite the opposite.

APIs and crawling

Q. Whether API affects crawling or not depends on the level to which API is embedded on the page

  • (46:13) John notes to things about API’s influence on the page crawling. On the one hand, if the APIs are included when a page is rendered, then they would be included in the crawling, and would count towards the crawl budget, essentially because those URLs need to be crawled to render the page. They can be blocked by robot.txt if it’s preferred that they’re not crawled or used during rendering. It makes sense to do so, if the API is costly to maintain or takes a lot of resources. The tricky thing is that if the crawling part of the API endpoint is disallowed, Google won’t be able to use any data about the API returns for indexing. So, if the page’s content comes purely from the API, and the API crawling is disallowed, Google won’t have that contact. If API does something supplementary to the page, for example, draws a map or a graphic of a numeric table that is on the page, or something like that, then maybe it doesn’t matter if that content isn’t included in indexing.
    The other thing is that sometimes it’s non-trivial how a page functions when the API is blocked. In particular, if JavaScript is used, and the API calls are blocked because of robot.txt – that exception needs to be handled somehow. And depending on how the JavaScript is embedded on the page and what is done with API, it’s important to make sure it still works. If that API call doesn’t work, then the rest of the page’s rendering breaks completely, and Google can’t index much, as there’s nothing left to render. However, if the API breaks, and the rest of the page still can be indexed, that might be perfectly fine.
    It’s trickier if API is run for other people, because if crawling them is disallowed, there is a second order effect that someone else’s website might be dependent on this API. And depending on what this API does, the website might suddenly not have indexable content. 
    I think it’s trickier if you run an API for other people,

Google Search Console

Q. In case a website loses its verification and gets verified again the data starting from when the site lost its verification is not processed in Google Search Console

  • (56:12) When a website loses its verification, Google Search Console stops processing the data, and starts processing again, when it’s verified again. Whereas if a website was never verified at all, Google tries to recreate all of the old data. So in case someone needs to regenerate the rest of the data, one way to try it is to verify a subsection of a website. If there is a subdirectory or a subdomain, or instead of doing the domain verification, John recommends trying to do the specific hostname verification, and see if that triggers regenerating the rest of the data. But he points out, there’s no guarantee that will work.

Sign up for our Webmaster Hangouts today!

GET IN CONTACT TODAY AND LET OUR TEAM OF ECOMMERCE SPECIALISTS SET YOU ON THE ROAD TO ACHIEVING ELITE DIGITAL EXPERIENCES AND GROWTH