October 2021 - Premium eCommerce marketing services

WebMaster Hangout – Live from October 22, 2021

Core Web Vitals

Q. The weight of Core Web Vitals doesn’t change depending on what kind of website is being assessed.

  • (00:50 )Google doesn’t evaluate what kind of website it’s assessing and decide that some Core Web Vital indicators are more important in a particular case. The reason for that is that in some search results the competition is quite strong and everyone is similarly strong, and as a result it might look like some indicator has more weight, but that is not actually the case.

Reviews from applications

Q. Google doesn’t pick up reviews left on Android and IOS applications

  • (05:44)John says that at least for web search, Google doesn’t take Android and IOS application reviews into account. Google doesn’t have a notion of quality score when it comes to web search. Indirectly these reviews might be picked up and get indexed if they are published somewhere on the web, but if they’re in an app stores, Google probably doesn’t even see them neither for web search nor for other kinds of searches.

Crawl Request

Q. The number of crawl requests depends on two things: crawl demand and crawl capacity

  • (07:29)When it comes to the number of requests that Google makes on a website, it has two things to balance: crawl demand and crawl capacity. Crawl demand is how much Google wants to crawl from a website. When a website is reasonable, the crawl demand usually stays pretty stable. It can go up if Google sees there is a lot of new content, or it can go down if there is very little content, but these changes happen slowly over time. 
    Crawl capacity is how much Google thinks the server can support from crawling without causing any problems, and that is something that is evaluated on a daily basis. So Google reacts quickly if it thinks there is a critical problem on the website. Among critical problems are having lots of server errors, Google not being able to access the website properly, the server speed going down significantly (not the time to render a page, but the time to access HTML files directly) – those are the three aspects that play into that. For example, if the speed goes down significantly and Google decides that it’s from crawling too much, crawl capacity will scale back fairly quickly.
    Also 5xx errors are considered more problematic than 400 errors, as the latter basically means content doesn’t exist, so if a page disappears that doesn’t cause problems.
    Once these problems are addressed, the crawl rate usually goes back to what it was step by step within a couple of days.

Search Console Parameter Tool

Q. Parameter tool acts differently compared to robots.txt

  • (15:32)Parameter tool is used as a way of recognising pages that shouldn’t be indexed and picking better canonical choices. If Google has never seen the page that is listed in the tool before, nothing will get indexed, and if it has seen it before, and there was real canonical on it previously, it helps Google to understand that the website owner doesn’t want it to get indexed. So Google doesn’t index it and follow the rel canonical.

Random increase in keyword impressions

Q. Random keyword impression increases in Search Console can be caused by bots and scrapers

  • (18:42)Google tries to filter and block bots and scrapers at a different level in the search results, and it can certainly happen that some of these go through into Search Console as well. 
    It’s a strange situation that if someone runs these scrapers to see what his position or ranking would be on these pages, then they’re getting some metrics, but they’re also skewing other metrics, and that is discouraged by Google’s terms of service. It’s better to ignore these kinds of things when they happen because it’s not something that you can filter-out in the Search Console or manually do anything about. 

Internal Linking

Q. Internal linking is about giving a relative importance to certain pages on a website

  • (20:37) Internal linking can be used for spreading the value of external links pointing to that page, to other pages on the website, but only in a relative sense, meaning that Google understands that you think these pages are important, so we’ll take that feedback on board. For example, if all the external links go to the homepage, and that’s where all of the signals get collected, and the homepage has absolutely no links, then Google can focus purely on the homepage. As soon as the homepage has other links as well, then Google in a way distributes that out across all of these links. Depending on the way the website has its internal linking set up, there are certain places within the website that are relatively speaking more important based on the internal linking structure, and that can have an impact on rankings and at least tells Google that its important to you. It’s not a one-to-one mapping of the internal linking to the ranking, but it does give a sense of relative importance within the website. From that point of view it makes sense to link to important and new things on the website – Google will pick that up a little faster and might give it a little more weight in the search results. It doesn’t mean it will automatically rank better, it just means that Google will recognise its importance to you and try to treat it appropriately.

Website Speed and Core Web Vitals

Q. It takes about a month for the Core Web Vitals to catch up with changes in website speed

  • (26:28)For the Core Web, Google takes into account the data that is delayed by 28 days or so. That means if there’s a significant speed changes made on the website that affect the Core Web Vitals, and accordingly, the page experience ranking factor, then it should be expected that it will take about a month to be visible in the search results. So if there are changes in search happening on the next day, that wouldn’t be related to the speed changes made the previous day. Similarly, if there are big speed changes, it will take about a month to see any effects from that.

Nested Pages for FAQ

Q. FAQ doesn’t have to be nested as long as the script is included in the page header and the data can be pulled out

  • (28:35)FAQ doesn’t have to necessarily be nested. In case there’s an FAQ on the page, John suggests, it’s better to use appropriate structured data testing tools to make sure that the data can be pulled out. Testing tools essentially do what Google would do for indexing and tell the website owner if everything is fine.

Delayed loading of non-critical JavaScript elements

Q. It’s perfectly fine to delay loading of non-critical JavaScript until the first user interaction

  • (30:17)If it’s the case that someone lazy loads the functionality that takes place when a user starts to interact with the page and not the actual content, John says, it’s perfectly fine. That’s something similar to what is called “hydration” in JavaScript based sites, where the content is loaded from HTML as a static HTML page, and then the JavaScript functionality is added on top of that.
    From Google’s point of view, if the content is visible for indexing then it can be taken into account, and Googlebot will use that. It’s not the case that Googlebot will go off and click on different things, it just essentially needs the content to be there. The one thing, where clicking on different things might come into play is with regard to links on a page. If those links are not loaded as elements, Google won’t be able to recognise them as being links.
    John refers to one of the questions from before about lazy loading of images on a page. If the images are not loaded as image elements then Google doesn’t recognise them as image elements for image search. For that it’s good to have a backup in the form of structured data or an image sitemap on the file. That way,  Google understands that even if those images are currently not loaded on the page, they should be associated with that page.

Out of stock products

Q. There are different ways to handle temporarily out of stock products from the SEO point: structured data, internal linking, Merchant Center

  • (33:38)There can be situations when some or lots of products are out of stock on the website, and the situation needs handling on the SEO side. For those situations, John suggests, it’s best if the URL can be kept online for things that are temporarily out of stock in a sense that the URL remains indexable and it is indicated with structured data that the product is currently not available. In that case, Google can at least keep the URL in the index and keep refreshing it regularly to pick up the change in availability as quickly as possible. However, if the website owner decides to ‘no index’ these kind of pages or to just remove the internal linking to these pages, then when that state changes back, Google should try to pick that up fairly quickly as well. Google will try to understand these state changes through things like sitemaps and internal links. So especially if the product is added back and then suddenly has internal links again, that helps Google to pick that up again. This process can be sped up a little by making internal linkings deliberately. For example, these products can be linked to from homepage, as Google views internal links from homepage as a little more important. It’s a good idea to add the products back and add a link to the homepage saying that these things are in stock again.
    Another thing that could be done for out of stock products is hedging the website SEO together with product search so if a Merchant Center feed is submitted, those products can be shown within the product search sidebar. So Google doesn’t have to necessarily recrawl the individual pages to recognise that the products are back in stock, it can be recognised from the feed that was submitted.

Security Vulnerabilities

Q. Security vulnerabilities that can be found by using Lighthouse, for example, don’t affect SEO directly

  • (37:28)John says that security vulnerabilities are not something that Google would flag as an SEO issue. But if these are real vulnerabilities on scripts that are being used and that means that the website ends up getting hacked, then the hacked state of the website would be a problem for SEO. But just the possibility that it might be hacked is not an issue with regard to SEO.

Authorship and E-A-T

Q. E-A-T mostly matter for medical and finance related websites and not more generic content

  • (38:48)E-A-T, which stands for Expertise, Authoritativeness, Trustworthiness basically applies to sites that are really critical and essentially websites, where medical or financial information is given. In those cases it’s always better to make sure that an article is written by someone who’s trustworthy or has an authority on the topic. When it comes, to something more general, like theatre or SEO news or anything random on the web, that’s not necessarily something where trustworthiness of the author is a big issue. With regard to any business, it might be better to say that there’s no author that a piece of content is written by the website.
    The one place where the author name does come into play is some types of structured data that have information for the author. In that case it might be something that is shown in the rich results on a page, so from that point of view it’s better to make sure there’s a reasonable name there.

Impressions and Infinite Scroll

Q. Impression works the usual way with infinite scroll, the difference being that some websites will probably get a little bit more impressions

  • (45:51)From the Google’s side, even with infinite scroll, it’s still loading the search results in groups of 10, and as a user scrolls down, it loads the next set of 10 results. When that set of 10 results is loaded, that counts as an impression. That basically means that when a user scrolls down and starts seeing page two of the search result, Google sees it as page two and the page now has impressions similar to if someone were to just click on page two directly in the links. From that point of view not much changes. John suggests that what will change is that users will probably scroll a little bit easier to page two, three or four and based on that, the number of impressions that a website can get in the search results will probably go up a little bit. John also suggests that the click-through rate will be a little weird: it probably will go down slightly, and it might be due to the number of impressions going up rather than something being done wrong on the website.

Average Response Time

Q. Average response time can affect crawling

  • (52:26)There is no fixed number regarding the average response time, however John recommends it to be 200 milliseconds maximum. That affects how quickly Google can crawl the website. So if Google wants to crawl 100 URLs from the website, and it thinks it can do five connections in parallel to the website, then based on the response time, those 100 URLs will be spread out and Google won’t be able to crawl that much per day. That’s the primary effect of average response time on crawling.
    Average response time is about http requests that Google sends to the website’s server. So if there is a page that has CSS and images and things like that, the overall loading time goes into the Core Web Vitals. But the individual http requests go into the crawl rate, and that doesn’t affect the rankings – it’s purely from a technical point of view how much Google can crawl.

FAQ not showing in the search results

Q. FAQ not showing in the results might be due to its quality or technical issues, and there is a way to check that

  • (52:54)The person asking the question is concerned by the fact that after his customer redesigned their website, all the FAQ schemas stopped being displayed in Google Search Results. John says there are two things that might have happened. The first is that the website might have been reevaluated in terms of quality at about the same time the changes were made. If the coincidence did take place, then Google probably is not so convinced about the quality of the website anymore, in that case it wouldn’t show any rich results and that includes FAQs. One way to double check that is to do a site query for these individual pages and see if rich results show up. If they do show up, that means they’re technically recognised by Google, but it doesn’t want to show them and that’s a hint that there needs to be an improvement in terms of quality. If they don’t show up, that means that there’s still something technical which is broken.

Sign up for our Webmaster Hangouts today!

GET IN CONTACT TODAY AND LET OUR TEAM OF ECOMMERCE SPECIALISTS SET YOU ON THE ROAD TO ACHIEVING ELITE DIGITAL EXPERIENCES AND GROWTH 

WebMaster Hangout – Live from October 08, 2021

More indexed pages – higher website quality?

Q. A website having a higher number of indexed pages doesn’t affect its authority

  • (03:52) John says that it’s not the case that if a website has more pages indexed, then Google thinks it’s better than some other websites with less number of indexed pages. The number of indexed pages is not a sign of quality.

Error page redirects during crawling

Q. Sometimes there can be issues with rendering a page that leads to the crawling running into error pages

  • (06:05) When there is a problem with rendering website pages, it might cause the crawling to reach error pages. When those pages are tested in Search Console, it might be the case that 9 times out of 10 it works well, and then it doesn’t work 1 time out of 10 and that redirects to an error page. There might be too many requests to render the page or something complicated with the JavaScript, that sometimes takes too long and sometimes works well. It could even be the case that the page is not found when the traffic is high, and everything works well, when the traffic is down. John explains, that what basically happens is that Google crawls the HTML page, and then tries to process the HTML page in a Chrome-type browser. For that, Google tries to pull in all of the resources that are mentioned there. In the developer console in Chrome, in the network section, there is a waterfall diagram of everything that it loads to render the page. If there are lots of things that need to be loaded, it can happen that things time out, and crawling runs into the error situation. As one of the possible solutions, John suggests getting the developer team to combine the different JavaScript files or combine CSS files, minimise the images and etc.

Pages for different search intents

Q. Website pages for different search intents don’t really define the purpose of a website as a whole

  • (10:00) Google doesn’t really have rules on how a website would be perceived as a whole, depending on whether it has more informational or transactional or some other types of pages. John says that it’s more of a page-level thing. A lot of websites have a mix of different kinds of content, and Google tries to figure out which of these pages match the searcher’s intent and tries to rank those appropriately. He thinks it’s a page-level thing rather than something on a website level. For example, adding lots of informational pages on a website that sells products, doesn’t dilute the product pages.

Redirecting old pages to the parent category page

Q. Old pages redirects to parent category pages will be treated as a soft 404

  • (13:17) The person asking the question has a situation where people are linking to his website pages, but sometimes the pages might be changed or can get deleted, the content comes and goes. And the question is, for example, if it’s a subcategory getting linked in a backlink, and the subcategory gets deleted is it okay to temporarily redirect to the parent category? John says that if Google sees this happening at a larger scale, that there are redirects to the parent level, it will probably see it as a soft 404, and decide that the old page is gone. Redirects might be better for users, but Google will only see 404 – there is little SEO difference. Redirect or no redirect – there’s no penalty.
    When it comes to 301 or 302, John says, there is no difference as well, as Google will either see it as 404 or as canonicalisation question. If it’s a canonicalisation question, then it comes down to which URL Google shows in the search results. Usually, the higher level one will have stronger signals anyway, and Google will focus on the higher level one, so it doesn’t matter if that’s a 301 or a 302.

Q. If a page thats linked to gets deleted and then comes back, it doesn’t change much in terms of crawling 

  • (16:04) If a page that is linked through a backlink gets deleted and then comes back, John says there is a minimal difference in terms of recrawling it. One thing to know is that crawling of that page will be slowed down, as if the page is seen as 404, because there is nothing there, and if there is a redirect, the focus will be on the primary URL not on this one. The crawling slows down until Google gets new signals that tell it there is something new again – that would be internal linking or sitemap file – a strong indication of need for crawling.

References

Q. There is no change that comes from linking someone in a content – it’s purely a usability thing

  • (23:25) John says, that while referencing the original source when making a quote makes sense in terms of website usability, it doesn’t really change anything SEO-wise. It used to be one of the spammy techniques, where people would create a low-quality page and on the bottom link CNN, Google and Wikipedia, and then hope that Google will think the page is good because it referenced CNN.

Guest posts

Q. Guest posts are a good way to raise awareness about your business

  • (27:54) Google’s guidance for links and guest posts is that they should be no-follow. Writing guest posts to drive awareness to a business is perfectly fine. John says an important thing about guest posts is keeping in mind that they should be no-follow, so that the post drives awareness, talks about what the business does and making it easy for users to go to the linked page. Essentially, it’s just an ad for a business.

Product price and ranking

Q. From a web search point of view the price of a product doesn’t play a role in ranking

  • (32:25) Purely from a web search point of view, the price of a product doesn’t make any difference in terms of ranking – it’s not the case that Google recognises price on a page and makes the cheaper product rank higher. However, John points out, a lot of these products end up in kind of the product search results, which could be because a feed was submitted or because the product information on the page was recognised. And there the price of a product might be taken into account and influence the order in which the products appear, but John is not sure. So, from a web search point of view the price of a product doesn’t matter, from a price search point of view – it’s possible. The tricky part is that in SEO often these different aspects of search are combined in one search result page, and maybe there are some product results on the side or see it having an effect in some other way.

Sitemap files and URLs

Q. Generally, it’s better to keep the same URLs in the same sitemap files, but doing otherwise is not really problematic

  • (34:04) John says, that as a general rule of thumb, it’s better to keep the same URLs in the same sitemap files. The main reason for that is Google processing sitemap files at different rates. So if one URL is moved from one sitemap file to another, it might be that Google has the same URL in the system from multiple sitemap files. And if there is different information for a particular URL – like different change dates, for example – then Google wouldn’t know which attribute to actually use. From that point of view if the same URLs are kept in the same sitemap files, it makes it a lot easier for Google to understand and trust that information. John advises trying to avoid shuffling URLs around randomly. But at the same time, it usually doesn’t break processing of a sitemap file, and doesn’t have a ranking effect on a website. There’s nothing in Google’s sitemap system that maps to the quality of a website.

SEO for beginners

Q. There isn’t a one ultimate SEO checklist for beginners, but there are lots of useful sources

  • (35:41) John recommends looking at different SEO starter guides, as there are no official SEO checklists. He suggests looking at the starter guide by Google. Also there are starter guides available from various SEO tools, that for the most part contain correct information. John says, that it seems like it’s a lot less the case that people publish something wrong, especially when it comes to the beginning side of SEO. He suggests focusing on aspects that actually play a role for one’s website. 
    The tricky part is that all of these starter guides, at least the ones he has seen, are often based on an almost old school model of websites where HTML pages were created. And usually when small businesses go online, they don’t create HTML pages anymore – they use WordPress or Wix or any other common hosting platform. They create pages by putting text in, dragging images in and those kinds of things. They don’t realise that in the back, there’s actually an HTML page. So sometimes starter guides can feel very technical and not really map to what is actually being done when these web pages are created. For example, when it’s about title elements, people don’t look at HTML and try to tweak that, but rather they try to find the field in whatever hosting system that they have, and think about what they need to put there. So the guides might seem very technical, but now it’s actually more about filling in the fields and making sure the links are there, and that’s something to keep in mind about the SEO guides.

Multi-regional websites

Q. When creating a multi-regional website, it’s advised to choose one version of a page as canonical 

  • (38:13) When creating a website for different countries, there is an aspect of geo-targeting, which makes everything pretty straightforward. But when it’s about versions of a website within the same country, specifically multi-regional website, the issue of duplicate content becomes more important. The tricky aspect of websites like this is that a multi-regional website would compete with itself. For example, if one news article gets published across five or six different regional websites, then all of these different regional websites try to rank for exactly the same article. That could result in article not ranking as well as it otherwise could. John recommends trying to find canonical URLs for these individual articles, so that there is a preferred version of an article that is on five regional websites. Then Google can concentrate all of its efforts and signals on that one preferred version and rank it a little bit better. It doesn’t have to be the same version all the time – it can be the case that one news article that is within one region is canonical, and a different news article is more canonical for another region.
    As for the categories, sections and the home pages, it seems like the content there is more unique and more specific to the individual region. Because of that John recommends those index-level separate, so that they could all be indexed individually. That works across different domain names as well. So if there are different domains for individual regions, but it’s all a part of the same group, canonical shifting across the different versions can still be done. If it’s done within the same domain with subdirectories, that’s fine too.

301 Redirects

Q. Redirecting all pages at once during a site move is the easiest approach

  • (44:34) John says that there isn’t a sandbox effect when a website is redirecting all of its URLs, at least from his point of view. So he suggests that redirecting all of the website’s pages at once is the easiest approach when making a site move. Google is also tuned to that a little and tries to recognise the process. So when it sees that a website starts redirecting all pages to a different website, it tries to reprocess that a little bit faster so that the site move can be processed as quickly as possible. It’s definitely not the case that Google slow things down if it notices a site move, quite the opposite.

APIs and crawling

Q. Whether API affects crawling or not depends on the level to which API is embedded on the page

  • (46:13) John notes to things about API’s influence on the page crawling. On the one hand, if the APIs are included when a page is rendered, then they would be included in the crawling, and would count towards the crawl budget, essentially because those URLs need to be crawled to render the page. They can be blocked by robot.txt if it’s preferred that they’re not crawled or used during rendering. It makes sense to do so, if the API is costly to maintain or takes a lot of resources. The tricky thing is that if the crawling part of the API endpoint is disallowed, Google won’t be able to use any data about the API returns for indexing. So, if the page’s content comes purely from the API, and the API crawling is disallowed, Google won’t have that contact. If API does something supplementary to the page, for example, draws a map or a graphic of a numeric table that is on the page, or something like that, then maybe it doesn’t matter if that content isn’t included in indexing.
    The other thing is that sometimes it’s non-trivial how a page functions when the API is blocked. In particular, if JavaScript is used, and the API calls are blocked because of robot.txt – that exception needs to be handled somehow. And depending on how the JavaScript is embedded on the page and what is done with API, it’s important to make sure it still works. If that API call doesn’t work, then the rest of the page’s rendering breaks completely, and Google can’t index much, as there’s nothing left to render. However, if the API breaks, and the rest of the page still can be indexed, that might be perfectly fine.
    It’s trickier if API is run for other people, because if crawling them is disallowed, there is a second order effect that someone else’s website might be dependent on this API. And depending on what this API does, the website might suddenly not have indexable content. 
    I think it’s trickier if you run an API for other people,

Google Search Console

Q. In case a website loses its verification and gets verified again the data starting from when the site lost its verification is not processed in Google Search Console

  • (56:12) When a website loses its verification, Google Search Console stops processing the data, and starts processing again, when it’s verified again. Whereas if a website was never verified at all, Google tries to recreate all of the old data. So in case someone needs to regenerate the rest of the data, one way to try it is to verify a subsection of a website. If there is a subdirectory or a subdomain, or instead of doing the domain verification, John recommends trying to do the specific hostname verification, and see if that triggers regenerating the rest of the data. But he points out, there’s no guarantee that will work.

Sign up for our Webmaster Hangouts today!

GET IN CONTACT TODAY AND LET OUR TEAM OF ECOMMERCE SPECIALISTS SET YOU ON THE ROAD TO ACHIEVING ELITE DIGITAL EXPERIENCES AND GROWTH 

WebMaster Hangout – Live from October 01, 2021

Internal Relative URLs pointing to absolute canonical URLs

Q. When we have internal relative URLs pointing to the canonical absolute URLs, is that fine for Google?

  • (0:33) Relative URLs are perfectly fine as well as a mix of relative and absolute urls on the site. Pay attention to the relative urls to ensure there are no mistakes in the path.

Other questions from the floor:

MUM and BERT

Q. So a couple of years ago Google noted that when it came to ranking results, BERT would better understand and impact about ten percent of searches in the US. So has that percentage changed for BERT and what percentage is MUM expected to better understand and impact searches?

  • (02:15) John is pretty sure that the percentage has changed since then because everything is changing but he’s not sure if they have a fixed number that goes for BERT or that goes for MUM. MUM is more like a multi-purpose machine-learning library anyway so it’s something that can be applied in lots of different parts of the search. So, it’s not so much that you would like isolated to just ranking but rather you might be able to use it for understanding things like on the level of very fine grain and then that’s kind of interwoven in a lot of different kinds of search results but he doesn’t think they have any fixed numbers. In his opinion, it’s always tricky to look at the marketing around machine learning algorithms because it’s very easy to find very exceptional examples but that doesn’t mean that everything is as flashy. He was talking with some of these search quality folks they’re really happy with the way that these kinds of machine learning models are working cool.

Pages blocked by robots.txt

Q. A question is about a message in the search console indexed though blocked by robots.txt, so of a certain page type, a person asking has about 20,000 valid indexed pages which look about right. However, he’s alarmed that there’s a warning for about 40,000 pages that say indexed though blocked by robots when he inspected these it turned out they are auxiliary pages that are linked from the main page. These URLs are either not accessible by users who are not logged in or they’re just kind of thin informational pages so he did indeed add a disallow for these on robots. He guesses that Google must have hit these a while back before he got a chance to disallow them and because it knows about them and what’s alarming is that console says that they’re “indexed” right. Does this actually mean that they are and more importantly, are they counted toward his site’s quality because when he is pulling them up using a site colon query, about one in five show up?

  • (04:22) John says that mostly as something of a warning for a situation where they were not aware of what’s happening. So, if they’re certain that these should be blocked by robots.txt text then that’s pretty much fine but if they weren’t aware that pages were blocked by robots.txt text and they actually wanted to have them indexed and ranking well, then that’s something where they could take action on. So from that point of view, he wouldn’t see this as something that’s alarming it’s mostly just Google found these pages, and they’re blocked by robots.txt. By the way, if you care about these pages you might want to take some action if you don’t care about them just leave them the way they are. And it’s not going to count against your website. It can happen that they appear in the search results but usually, that’s only for very artificial queries like a site query.
    And the main reason it’s mostly those kinds of queries is that for all of your normal queries that are going to your pages you almost certainly have reasonable content that is actually indexable and crawlable and that will rank instead so if we have the choice between a robot page that we don’t know about kind of what’s on there and a reasonable page on your website and that reasonable page has those keywords on it then we’ll just show your normal pages. So, from that point of view, it’s probably extremely rare that they would show up in search and it’s more a warning in case they weren’t aware of that.

Indexing pages

Q. Let’s say you consider a site with 100 million pages and there may be 10 million pages that we believe are low-quality maybe 5 million of those are actually indexed and let’s suppose that we want to add a noindex tag to these 10 million as well as reduce the number of internal links that they received. So over a hypothetical four months let’s say google crawls and removes two million from the index while the other three million remain. And maybe a few months down the road, we determined that a few hundred thousand are actually of decent quality and we want to reinstate them essentially so we remove the noindex tag and we start adding meaningful internal links back do you foresee or think the history here of being indexed and applying a noindex tag and then trying to get it indexed again at a later time would be detrimental you know would google be reluctant to crawl these URLs again knowing that they previously were not indexed and you know do you think that there would be any issues with getting they indexed again after they were crawled.

  • (10:24) John thinks like in the long run there definitely wouldn’t be any kind of long-term effect from that I think what you might see initially is that if we’ve seen a noindex for a longer period of time that we might not crawl that URL as frequently as otherwise but if you’re saying this is kind of in the process of we added a noindex it dropped out and then we changed our mind we added or we removed it again then that’s not going to be like along the term noindex situation that’s probably going to be something where we still, crawl that URL with the kind of the normal crawl rate anyway and when you’re talking about millions of pages we have to spread that out anyway. So, that’s something where you’ll probably see depending on these pages. You might see every couple of weeks or every month trying to crawl those pages individually. John doesn’t see that kind of crawl rate dropping off to completely zero, maybe it’ll go from one month to two months if we see in noindex for a longer period of time. He doesn’t see it going completely to zero so that kind of like we added noindex and they changed their mind at some point later on usually, that’s perfectly fine and this is especially the case if you’re working on internal linking at the same time where you add more internal links to those pages then, of course, we’re going to crawl those pages a little bit more because they see like oh it’s like freshly linked from this and this place within your website therefore maybe there’s something we actually do need to index. He says if you’re talking about a website with millions of pages then any optimisation you do there is such a small level that you actually can’t measure it.

Q. The person asking says that he found in his research that they(client) were cloaking internal links but then he checked the way back machine and found out they made a certain template change and it’s about footer links and these photo links were there back in January. When he checked their Google Search Console he didn’t really see a penalty. So, he wonders how long does it take as he wants to advise his client before they do get a penalty as they’ve been doing this for approximately 9 months. 

  • (17:03) John doesn’t see the webspam team taking action on that because especially when it comes to internal linking like that, it’s something that has quite a subtle effect within the website and you’re essentially just shuffling things around within your own website. He thinks it would be trickier if they were buying links somewhere else and then hiding them that would be problematic that might be something that our algorithms pick up on or that even the webspam team at some point might manually look at it but if it’s within the same website if it’s set to display “none”.
    John doesn’t think it’s a great practice. If you think it’s an important link then look to make it visible to people but it’s not going to be something where the webspam team is going to take action and remove the site or do anything crazy.

Question oriented title tags

Q. A person asked about the effect of ‘question oriented titles’ such as what, how, which and who are in the content in terms of being comparable with a semantic search engine?

  • (28:44) John doesn’t know exactly what direction headed with this question so it’s hard to say generally. He would recommend to focus on things like keyword research to try to understand what people are actually searching for and if it is need to match that one-to-one he always find it a little bit risky to try to match these queries one-to-one because those queries can change fairly quickly so that’s something where he wouldn’t focus so much on.

How does Google handle interactive content

Q. A person asks how does Google evaluate interactively content like a quiz or questionnaire that helps users figure out which product or thing that they need would rankings still be based on the static content that appears on the page?

  • (29:57) Yes. The ranking essentially is based on the static content on these pages so if have something like a full page of questions and rank that page based on those questions and that means if have products that are kind of findable after going through those questions should also make sure that you have internal linking to those products without needing to go through that questionnaire so it’s something along the lines of having a normal category set up on the website as well as something like a product wizard or whatever you have to help the users make those decisions John thinks that’s really important, the questionnaire pages can still be useful in search if recognise that people are searching in a way that not sure which particular product matches their needs and kind of address that in the questionnaire in the text on the questionnaire than those questionnaires can also appear in search as well but it’s not the case that Googlebot goes in and tries to fill out a questionnaire and sees what’s kind of happening there.

Q. A person asks if they give a do-follow link to a trusted authoritative site is that good for SEO?

  • (31:28) John thinks that this is something that people used to do way, at the beginning, where they would create a spammy website and on the bottom, they’d have a link to Wikipedia and CNN and then hope that search engines look at that and say like this must be a legitimate website but like John said people did this way in the beginning and it was a really kind of traditional spam technique almost and John doesn’t know if this ever actually worked so from that point of view, John would say no this doesn’t make any sense obviously if have good content within website and part of that references existing other content then kind of that whole structure that makes a little bit more sense and means that website overall is a good thing but just having a link to some authoritative page that doesn’t change anything from our of point of view.

Relevant keyword research on Taiwanese culture

Q. Person is going to do basic research regarding what do foreigners Google the most with Taiwanese culture what are the most relevant keywords with Taiwan, on Google search, he asked if it would be great if he could generate a ranking list of it to acquire that information he could further designate a campaign for certain products

  • (32:32) John doesn’t have that information so he can’t give that to him but essentially what he is looking for is probably everything around the lines of keyword research and there’s lots of content written up on how to do keyword research there are some tools from Google that he can use to help you figure that out there are a lot of third-party tools as well and John has no insight into what all is involved there so John couldn’t really help with that and he definitely can’t like give you a list of the queries that people in Taiwan do.

Two languages on the one landing page

Q. A person was having two languages like Hindi and English on the same page and he was ranking good on Google but after the December core update he lost ranking for Hindi keywords mostly he asks what he should do to get it back?

  • (33:31) John doesn’t know. So, on the one hand, he doesn’t recommend having multiple languages per page because it makes it really hard for us to understand which language is the primary language for this page so from that point of view I think that configuration of having Hindi on one side English on the other side on a single page is something that can be problematic on its own so John would try to avoid that setup and instead make pages that are clearly in Hindi and clearly in English and by having separate pages like that it’s a lot easier for us to say “someone is searching in Hindi for this keyword here’s a page on the specific topic” whereas if we can’t recognise the language properly then we might say well we have an English page but the user is searching in Hindi so we probably shouldn’t show it to the user and if we’re not sure about the language of a page then that’s also kind of tricky especially when there are other comparative pages out there that are clearly in Hindi so that’s kind of the one thing the other thing is with regards to core updates we have a lot of blog posts around core updates and John would go through those as well because if you’re seeing this kind of a change happening together with a core update it might be due to kind of two languages on the page but probably it’s more likely due to just general core update changes that we’ve made so John would take a look at those blog posts and think about what you might want to do to kind of make sure that your site is still very relevant to modern users.

Doorway page creation

Q. Person asks if it’s okay from an SEO perspective to create doorway pages when they actually help users, for example, this page leads users who have searched for a non-scientific name of a cactus to the original page?

  • (35:34) John doesn’t know about this specific situation and usually we would call things doorway pages if they essentially lead to the same funnel afterward where essentially you’re taking a lot of different keywords and you’re guiding people to exactly the same content in the end in the case of something like an the encyclopedia isn’t the same content it’s essentially very unique pieces of content on there and just because it covers a lot of keywords doesn’t necessarily mean that it’s a doorway page, so without digging into this specific site in detail my guess is that that would not be considered a doorway page but a doorway the page might be something where if you have a cactus page on your website and you’re saying like cactuses in all cities nearby you make individual city pages where all of the traffic is essentially funneled to the same direction on your website then that would be considered a doorway page where you’re kind of like creating all of these small doorways but they lead all to the same house.

Classified websites

Q. A question related to classified websites have add listings on search results person allow to crawl in the index if he has no add listings for some time should I disallow to index or should he let Google decide if search results don’t have ad listings and excluding those pages from the sitemap would also be a good practice? 

  • (37:14) John thinks just for sake of clarity he thinks the search results that this person means are the search results within their own website so if someone is searching for a specific kind of content then the website pulls together all the ads that it knows and it’s those search results not Google search results and John essentially the direction here is if like what you should do with empty internal search results pages and our preference is essentially to be able to recognise these empty internal search results pages which could be by just adding noindex to those pages that’s kind of the ideal situation because what we want to avoid is to have a page like that in our index where it’s basically like saying oh someone is searching for a blue car of this model and make and you have this page on your website but it says like he doesn’t know of any people selling this kind of a car then sending people to your website for that kind of a query would be a really bad user experience so we would try to recognise those pages and say like these are either soft 404 in that we recognise they’re an empty search results page or you put a noindex on them and you tell us that it’s an empty search results page so essentially that’s kind of the direction to go there if you can recognise it ahead of time John would generally prefer having a noindex directly from your side if you can’t recognise it ahead of time then using javascript to add a noindex might be an option with regards to sitemap or not the sitemap file only helps us with additional crawling within a website it doesn’t prevent us from crawling these pages so removing these pages from the sitemap file would not result in us dropping them from search or and would not result in us recognising that actually, they don’t have any content there so removing something in a cycle file wouldn’t negatively affect the natural crawling and indexing that we do for individual pages so I think those are kind of the two aspects if you can recognise it’s an empty search results page put a noindex on it removing it from a sitemap file is not going to remove it from our index.

Not updating data in Google Search Console

Q. Person’s question is that Google is not indexing websites, even fresh sites and also not updating data and in Google Search Console. Is there any hidden update going on?

  • (42:35) There are always updates going on so that’s kind of hard to say John doesn’t think there’s anything explicitly hidden going on what John does sometimes see is that because it’s so much easier to create websites nowadays people create websites with a large number of pages and then they focus more on the technical aspect of getting millions of pages up and they disregard a lot of the quality aspects and then because of the way that search console tries to provide more and more information about the indexing process, it’s a lot easier to recognise that Google is not actually indexing everything from this website and then the assumption is often there that well perhaps this is a technical issue that John just need to tweak and usually, it’s more of a quality issue where when we look at the website overall and we’re not convinced about the quality of this website then our systems are not going to take the time to kind of invest in more crawling and indexing of a website so if you give us a million pages and the pages that we end up showing initially don’t convince us then we’re not going to spend time to actually get all of those millions of pages indexed we’re going to kind of hold off and keep a small subset and if over time we see that the subset is doing really well and has all the signs that we look at with regards to quality then we will go off and try to crawl more but just because there are a lot of pages on a website does not mean that we’re going to crawl and index a lot of pages from that website.

Password protect and Google Penalties

Q. A person created a small website for their mom’s business using a CMS tool called Squarespace he knows that they automatically submit a sitemap once you create a new page and now we’ve decided to add the e-commerce functionality like about two weeks ago and the site was password protected so his first question would be if Google penalises you in a way if the user can’t access the page if it’s basically just password protected and the second would be yeah the site was basically indexed and shown really nicely and the pages before but after editing all those products and different pages he looked it up on Google Search Console for crawling but basically, his mom was giving him a hard time now when these pages are going to be shown again.

  • (48:09) John thinks so there’s no penalty for having password protection but it means that we can’t access the content behind the password so probably that is the initial step that happened with regards to the kind of turning on almost like an e-commerce site or shop section on your website we actually have a whole article on that now in our search documentation specifically for e-commerce sites so John would take a look at that there might be some tricks that you missed out on that that can help to speed things up there

The performance measurement of Google discover

Q. Updating and expanding an existing content might take longer to recrawl and re-index, and trying to push that by submitting manually might not be the best strategy

  • (50:57) John thinks the only way to measure it is in search console because in particular in analytics the traffic from discover is almost always folded into Google search and then you can’t separate that out so it’s only in the search console do you see kind of the bigger picture.

Internal search pages

Q. A person had a question in regards to the internal search pages so we’re allowing indexation of on-site searches so sometimes someone does a search on our site we create a page for that and now that’s gone out a bit of control so he has hundreds of millions of these pages so how would you recommend we saw that house and if there are actually any benefits to cleaning that up or if he shouldn’t worry about it?

  • (52:34) John thinks for the most part it does make sense to clean that up because it makes crawling a lot harder so that’s kind of the direction I would look at it there is to think about which pages you actually, do you want to have crawled and indexed and to help our systems to focus on that not so much that like you should get rid of all internal search pages some of these might be perfectly fine to show in search but really try to avoid the situation where anyone can just go off and create a million new pages on your website by linking to random URLs or words that you might have on your pages so to kind of take it and say well you have control of which pages you want to have crawled in the index rather than like whatever randomly happens on the internet.

Sign up for our Webmaster Hangouts today!

GET IN CONTACT TODAY AND LET OUR TEAM OF ECOMMERCE SPECIALISTS SET YOU ON THE ROAD TO ACHIEVING ELITE DIGITAL EXPERIENCES AND GROWTH 

WebMaster Hangout – Live from September 24, 2021

Blog Pages are Ranking, while Product Pages are not

Q. If on a website certain types of pages rank well, while others don’t, it might be about the user query that lands them on the website

  • (03:18) The person asking the question talks about the situation when the blog posts on his website get attention, while the product pages don’t really rank. John explains that for some types of queries google tries to understand what the intent is behind the query, and tries to figure out if someone is looking for information about a product or if they’re looking to buy a product. For this particular website from the question it might be that Google is interpreting the queries that are landing on those pages as more information-seeking queries rather than transactional queries. Since it’s Google’s assumption on what the user is asking for, it’s not easy to change that. What John suggests is to try making it as easy as possible for people to get to the products, so that there is a clear call to action in the blog post. In case people landing on the blog pages don’t want to buy products but are searching for information, then that’s really out of SEO’s control.
    John advises to improve the website visibility by making the overall quality good, encouraging people to buy the products, to review them and to recommend the website to other people. With time, this will convert into better visibility, and ensuring there is an easy-to-follow call to action from the blog pages to the product pages will in its turn convert the visibility into valuable results.

Building Visibility

Q. It’s more reasonable to build visibility by first trying to rank for more specific unique keywords rather than general ones

  • (10:34) Even with time and continued effort, it might be too hard to rank for general keywords like, for example, “green tea” and get visibility from them. John suggests finding more specific queries, more specific kinds of products where there is less competition, like a specific type of green tea or special leaf type – something unique where you can stand out. That kind of queries don’t get traffic comparable to the general keywords, but usually they get enough for a website that is just starting out. From there, you can start taking up the next specific keyword and keep expanding.

Reviews

Q. Reviews are an indirect factor of website assessment

  • (13:54) Reviews gathered over-time on the products sold on the website are not a direct signal that Google takes into account, but seeing people engaging with the products is a good sign, and that means that other signals might be built up over-time.

Page Experience Update

Q. If there is a drop in website traffic right after the Page Experience Update rollout, it might not be due to the update

  • (14:39) The Page Experience Update started to roll out in July and was finished at the end of August, and it was on a per-page basis. That means, if Google saw that a website was slow for Core Web Vitals, there would be a gradual change in traffic over-time. So if there happened to be some kind of drastic change, both gains and losses, around those dates, it might mean that there is something else causing it, not the update. 

More Pages or Fewer Pages?

Q. It’s better to create fewer but stronger pages for the areas where there is more competition, and vice versa

  • (16:54) Having pages on a website is all about balancing more general and more specific pages. John says, that whenever there are fewer more general pages, those pages tend to be a little bit stronger, whereas if there are a lot of pages, then the value is in a way spread out across those pages. If there is a specific topic, where the competition is stronger, then it’s better to have fewer but very strong pages, and if the targeted area doesn’t have a high competition, having more pages is fine. So, when starting out, it’s generally wiser to have fewer very strong pages so that the website can be as strong as possible in that area, and over-time as the website consolidates itself in that area, those pages could be split off into more niche topics.

Internal Linking

Q. The way to explain Google which pages are more of a priority is by internal linking

  • (18:48) There isn’t really a way to give a priority to a certain page over the others but this can be helped by internal linking. So within the website it is possible to highlight certain pages by making sure they’re well linked internally, and maybe it’s also a good idea to have non-priority pages a little bit less well linked internally. John suggests linking to important pages from the homepage, and to the less important ones from category and subcategory pages. Google looks at the website, and it knows the homepage is very important, and the pages the homepage points to are also important. Google doesn’t always follow that, but it’s a way to give that kind of information.

Canonical URL

Q. Setting up canonical URLs for internal linking is not necessary

  • (20:39) Canonical URLs are important if there are multiple URLs that show the same content, for example, if there are tracking URLs within a blog – in this case with the canonical URL Google understands what is the primary page there. But for a normal website where there are just links to different things, there is no critical need for canonical URLs – it’s a good practice to have but there are no SEO benefits for that.

New Language Version of Website

Q. If there is a new language version of a website, it’s good to add a JavaScript-based banner to direct users to the right version of the website

  • (21:26) John suggests creating a JavaScript-based banner to the pages of the website that have another version to try to recognise if the wrong user is on the wrong version of the page by recognising the browser language or the user’s location, if possible. The banner on the top should say that there is a better version for this user, and that he can follow the link to the right version. Using a banner like that means that Google will still be able to index all of these pages, but it becomes possible to guide users to the appropriate one a little bit faster. If it is to be done on the server side, for example with a redirect, then the problem could be that Googlebot never sees the other version because it always gets redirected, and the banner is like a backup plan where usually hreflang will help and geotargeting is set up. Hreflang and geotargeting don’t guarantee there are only the right users going to these pages, so banner helps them find the right pages.

Wrong Publish Date in the Search Results

Q. There are 2 important things about publish date: date alignment across the page and time zones.

  • (27:47) When it comes to dates in the Search Results, Google tries to find dates that essentially align the best across all of the signals that it gets. That means that Google looks at things like the structured data, page text etc. to understand what the date might be. If Google can’t recognise that the same date and time are used across multiple locations, then it tries to figure out which one of these might be the most relevant. In cases, when instead of the date there are things like “10 minutes ago” or “5 hours ago” in the visible part of the article, then that is something that Google wouldn’t be able to match because it doesn’t know what exactly is meant by that. Making the date and the time in a visible in the article together with the structured data is a good way of making Google use that. Also watching out for things like time zones is important, as it’s one of the things that usually go wrong.

Spam Traffic

Q. Google has a fairly good understanding of spam traffic and it doesn’t end up causing problems for websites

  • (30:02) Google sees lots of weird spam traffic on the web over-time and has a good understanding of that. There are certain requirements that Google watches out for, and it filters out the usual spam traffic, so that shouldn’t be causing any problems to websites.

E-A-T

Q. E-A-T is not determined by some specific technical factors on a website

  • (33:47) E-A-T stands for Expertise, Authoritativeness and Trustworthiness, and it’s something that comes from the Google’s Quality Rater Guidelines. Quality Rater Guidelines are not a handbook to Google’s algorithms, but rather something that Google gives to those who review changes that it makes in the algorithms. E-A-T is specific to certain kinds of sites and certain kinds of content – it’s not something where E-A-T score is based on a specific number of links or anything like that. It’s more about Google improving its algorithms and Quality Raters trying to review them, but there aren’t any certain technical moments that are involved into an SEO factor. John suggests looking into the E-A-T if the website maps into the broad area where google has mentioned E-A-T in the Quality Rater Guidelines.

Recognising Sarcasm

Q. Google is not adept at recognising sarcasm, so it’s better to make important messages very clear 

  • (36:40) There is always a risk that Google misunderstands things, so it doesn’t understand when there’s sarcasm on a page. Especially if it’s something where it’s really critical to get the right message across to google and all users – making sure the message is as clear as possible is important. It’s generally better to avoid sarcasm, when, for example, talking about medical information, but when writing about some entertainment topic it’s probably less of an issue.

Captcha

Q. If content is visible without the need to fill out the captcha, Google is okay with that, otherwise it might be a problem

  • (41:57) Googlebot doesn’t fill out any captchas, even if they’re Google-based captchas. If the captcha needs to be completed in order for the content to be visible, then Google wouldn’t have access to the content, but if the content is available without needing to do anything and the captcha is just shown on top, usually that would be fine with Google crawling and indexing the page. To test that, John suggests using the Inspect URL Tool in Search Console and fetching those pages to see what comes back: on the one hand, the visible page to make sure that matches the visible content, and the HTML that is rendered there to make sure that that includes the content that is to be indexed. He restates that from a policy point of view, the situations where the full content is served, and the captcha is required on the user side – basically if things are done slightly differently for Googlebot or other search engines compared to an average user that would be fine.

One Author Writing for Different Websites

Q. There are no guidelines on one person writing content for different websites, but it’s better to help Google recognise it’s the same person

  • (43:41) From Google’s point of view, there are no guidelines on where people can write and what kind of content they can create. People creating content on multiple sites is perfectly fine. From a practical point of view, it’s better if the author creates something like a profile page where he collects all the information about the things that he does. Pointing to something like the author page or profile page is a good way to make sure the Search Engines understand that this is a certain person who writes for certain pages. It’s not something that must be done according to some policies or guidelines, but it’s a good practice.

Recrawling Thin Content

Q. Updating and expanding an existing content might take longer to recrawl and re-index, and trying to push that by submitting manually might not be the best strategy

  • (46:10) When publishing thin content, it sometimes takes a little bit longer for Google to recrawl the page and to pick up the new version of the content there, so John suggests trying to avoid doing that on a regular basis. Sometimes updating an article is necessary, and as it expands over-time, Google tries to pick that up over-time.
    An issue for the person asking the question, who is primarily worried about Google not recrawling and re-indexing his recent article updates, John says, might actually be the fact that he manually submits all the links of every article after publishing. That makes Google a little bit nervous and pickier about the content of the website, because usually if there’s fantastic content on a website, Google goes off and crawls the website regularly, so there’s no need to submit everything manually. 

Content Author’s Qualifications

Q. Qualifications of the author is not a direct factor in terms of SEO

  • (51:06) The authority of a person writing the content (citations in journals and etc.) doesn’t really play a role as a ranking factor, John says. However, he points out, associating with a strong author might come into play in a bigger picture of the website, so it’s a long term thing, rather than an SEO factor.

Discovered, not Indexed

Q. If a website often runs into the problem of Googlebots discovering the new content but not indexing it, the problem might be the overall quality of the website

  • (54:54) Sometimes Google might be not really sure about the quality of the website, and when new things get published on the website, Google understands that new content exists, it discovers the content, but ends up not indexing it. The main approach to solve this problem is to increase the overall quality of the website. Sometimes that means removing some old things and making sure that everything that is being published is fantastic. If the system is convinced of the quality of the website, it will crawl and index more and will try to pick up things as quickly as possible. Whereas, if the system is not 100 percent sure, it works sometimes, and sometimes it doesn’t work.

Sign up for our Webmaster Hangouts today!

GET IN CONTACT TODAY AND LET OUR TEAM OF ECOMMERCE SPECIALISTS SET YOU ON THE ROAD TO ACHIEVING ELITE DIGITAL EXPERIENCES AND GROWTH 

WebMaster Hangout – Live from September 17, 2021

Quality of Old Pages

Q. When assessing new pages on a website, the quality of older pages matters too

  • (15:38)  When Google tries to understand the quality of a website overall, it’s a process that takes a lot of time and if new 5 pages are added to the website that already has 10000 pages, then the focus will be on most of the site first. Over time Google will see how that settles down with the new content there as well, but the quality of the most of the website matters in the context of new pages.

Q. Links from high value pages are treated with a little bit more weight compared to some random page on the Internet

  • (17:02) When assessing a value of the link from another website, Google doesn’t look at the referral traffic or at the propensity to click on that link. John explains that Google doesn’t expect people to click on every link on a page – people don’t follow links because some expert said to do so, look at the website and confirm whether things written are true about the website. People follow the link when they want to find out more information about the product. Therefore, things like the referral traffic or the propensity to click are not taken into account when evaluating the link. 
    When evaluating the link Google looks at the page factors and quality of the website linking. It works almost the same way PageRank works – there is a certain value set up for an individual page, and the fraction of that value is passed on through the links. So if a page is of high value, then the links from the page are treated with more weight compared to random pages on the Internet. It’s not exactly that way now as it was in the beginning, but it’s a good way of thinking about these things.

Other questions from the floor:

Category Page Dropping in the Rankings for One Keyword

Q. If a page tries to cover more than one search intent, it might lead to the page dropping in rankings

  • (00:34) The person asking the question talks about a category page on his website that has a huge advisory text, dropping in the rankings for a particular keyword in a plural form. John says that in this particular category page there is so much textual information that it’s hard to tell whether the page is for people who want to buy the products or for those who want to get more info about these products, if the page is for someone looking for one product or for a bigger picture of the whole space. Therefore, John suggests splitting those pages and have two for different purposes instead of one that tries to cover everything.
    As for the singular and plural forms of the keyword, he thinks that probably when someone searches for plural they want different kinds of products, and when searching for singular someone might want a product landing page, and overtime the system tries to figure out the intent behind these queries, and also how this intent changes over time. For example, “Christmas tree” might be an informational query, and around December it becomes more of a transactional intent. And if a category page covers for both, on the one hand, it’s good, because it covers both sides, but at the same time Google might see only the transactional side and ignore the informational one. So having these two intents on a separate page is a reasonable thing to do. There are different ways to do that: some people have completely separate pages, others make informational blogs from which they link to category pages or individual products.

Q. Even though this drop might happen to only certain keywords, Google doesn’t have a list of keywords for these things

  • (04:52) The person asking the question wonders why this happens only to some keywords, and if there is a list of specific keywords that this happens to. John says that it’s doubtful they will manually create a list of keywords for that, as it’s something that the system tries to pick up automatically, and it might pick it up in one way for certain pages and differently for others, and change over time.

Content Word Limit

Q. There is no limit on how many words the content on the category page should be

  • (09:05) There needs to be some information about the products on the page, so that Google understands what the topic is, but generally that is very little information and in many cases Google understands that from the products listed on the page if the names of the products are clear enough, for example “running shoes”, “Nike running shoes” and running shoes by other brands. In this case, it’s clear that the page is about running shoes, there is no need to put an extra text there. But sometimes product names are a little hard to understand, and in that situation it makes sense to put some text there, and John suggests that these texts need to be around 2-3 sentences.

Q. The same chunks of text can be used in category pages and blogs posts

  • (10:19) Having a small amount of text duplicated is not a problem. For example, using a few sentences of text from a blog post in category pages is fine.

Merging Websites

Q. There is no fixed timeline on when the pages of the merged websites are crawled and the results of that become visible

  • (10:56) Pages across a website get crawled at different speeds: some pages are recrawled every day, some are once a month or once in every few months. If the content is on the page that rarely gets recrawled then it’s going to take a long time for that to be reflected whereas if it’s content that is being crawled very actively, then the changes should be seen within a week or so.

Index Coverage Report

Q. After merging websites, if traffic is going to the right pages, if there is a shift in performance report, then there is no need to watch out for the Index Coverage Report

  • (13:25) The person asking the question is concerned by the fact that after merging two websites, they’re not getting any difference in the Index Coverage Report results. John says that when pages are merged, their system needs to find a new canonical for this page first, and it takes a little bit of time for that to be reflected in the reporting. Usually, when it’s the case of simply moving everything, it’s just a transfer, there is no need for figuring out the canonicalisation. And when it’s the merging process, it takes more time.

301 Redirect, Validation Error

Q. The Change Address Tool is not necessary for migration, checking the redirects is more priority

  • (20:53) John says that although some people use the Change of Address tool when migrating a website, it’s just an extra signal and not a requirement – as long as the redirects are set up properly, Change of Address doesn’t really matter. If there are things like 301 redirect error, redirects need to be re-checked, but it’s hard to tell what the issue might be without looking at it case by case. John suggests that the person asking the question can look at the things like, for example if he has a www version and a non-www version of their website, he might need to look at his redirects step by step through that. For example, he might be redirecting to the non-www version and then redirecting to the new domain, and then submitting the Change of Address in the version of the site that is not a primary version – that’s one of the things to double-check. Basically, whether it’s the version of the website that is or was currently indexed is being submitted or maybe it’s the alternate version in search console.

Several Schemas on a Page

Q. There can be any number of structured data on a page, but it should be noted that only one kind of structured data will be shown on rich results page

  • (23:36) There can be any number of schema on one page, but John points out that for most cases when it comes to the rich results that Google shows on the search results, only one kind of structured data will be picked to be shown there. If someone has multiple structured data on their page, then there is a very high chance Google will pick one of these types and show in rich results. So if there is a need for one particular type to be featured, and there is no combined uses in the search results, then it’s better to focus on one structured data that is to be shown in rich results.

Random 404 Pages

Q. Random 404 URLs in a website don’t really affect anything

  • (24:39) The person asking the question is concerned with a steady increase of 404 pages on his website that are not part of the website, and them making up for over 40% of the crawl response. John argues that there is nothing to worry about, as these URLs are probably random URLs found on some scraper site that is scraping things in a bad way – that’s a very common thing. When Google tries to crawl them, they return 404, so the crawlers start to ignore those pages. 404 pages don’t really exist. When looking at a website, Google tries to understand which URLs it needs to crawl and how frequently it needs to crawl them. And after working out what needs to be done. Google looks at what it can do additionally and starts trying to crawl a graded set of URLs that sometimes includes URLs from scraper sites. So if these random URLs are being crawled on a website, that means the most important URLs have already been crawled, and there are time and capacity to crawl more. So, in a way, 404 are not an issue, but a sign that there is enough capacity, and if there is more content than what was linked within the website, Google would probably crawl and index that too. It’s a good sign, and these 404 pages don’t need to be blocked by robot.txt or suppress them in any way.

Blocking Traffic from Other Countries

Q. If you’re to block undesired traffic from other countries, don’t block the U.S., since the website are crawled from there

  • (27:34) The person asking the question is concerned that they’re getting their Core Web Vital scores go down because they originated in France and there is a lot of traffic from other countries with a bad bandwidth. John advises against blocking traffic from other countries, especially the U.S. as crawlers crawl from the U.S., and if it’s blocked, the website pages wouldn’t be crawled and indexed. So if the website owner is to block other countries, he should at least keep the U.S.

Q. Blocking the content for users and showing it to the Google Bots is against the guidelines

  • (28:47) From the guidelines, it’s clear that the website should be showing to the crawlers what it shows the users from that country. John says, one way to not involve undesired traffic from some countries (countries for which the website doesn’t provide service), is to use Paywall Structured Data. After marking the content up with the Paywall, the users that have the access can log in and get the content, and the page can be crawled. 
    Another way for that, John suggests, is providing some level of information that can be provided in the U.S. For example, casino content is illegal in the U.S., so some websites have a simplified version of the content which they can provide for the U.S., which is more like descriptive information about the content. So, if, for instance, there are movies that can’t be provided in the U.S., the description of the movie can be served in the U.S., even if the movie itself can’t.

Page Grouping

Q. When it comes to grouping, there is no exact definition on how Google does grouping, because that evolves over time depending on the amount of data Google has for a website

  • (35:36) The first thing John highlights about grouping is that if there is a lot of data for a lot of different pages on a website, it’s easier for Google to say that it will do grouping slightly more fine-grained rather than rough, while if there is not a lot of data, then it might end up taking the whole website as one group.
    The second thing John points out is that the data collected is based on field data that the website owner sees in Search Console, which means that it’s not so much of Google taking the average of an individual page and averaging them by the number of pages, but rather Google will do something like traffic weighed average. Some pages will have a lot more traffic and there will be more data there, some will have less traffic and less data. So if a lot of people go to the home page of the website and not so many on individual products, then it might be that home page weighs a little higher just because it has more data there. Therefore, it’s more reasonable to look at Google Analytics or any other analytics, figure out which pages are getting a lot of traffic, and by optimising those pages improve the user experience that will count towards Core Web Vitals. Essentially, it is less of averaging across the number of pages and more averaging across the traffic of what people actually see when they come to the website.

Subdomains and Folders

Q. Subdomains and subdirectories are almost equivalent in terms of content, but there are differences in other aspects

  • (45:25) According to the Search Quality Team, subdomains and subdirectories are essentially equivalent – the content can be put either way. Some people in SEO might think otherwise.
    John argues, there are a few aspects where that plays a role, and it is less with regards to SEO and more about reporting. For example, like if the performance of these sites are to be tracked separately on separate host names or together in one host name. For some websites Google might treat things on a subdomain slightly differently because it thinks it is more like a separate website. John suggests that even though these aspects may come to play a role, it’s more important to focus on the website infrastructure first and see what makes sense for that particular case.

Gambling Website Promotion

Q. From the SEO side, there is no problem in publishing gambling related things

  • (48:59) John is not sure about the policies for publishers when it comes to the gambling content, but he says that SEO wise, people can publish whatever they want.

Removing Old Content

Q. Blindly removing old content from a website is not the best strategy

  • (49:42) John says that he wouldn’t recommend removing old content from a website just for the sake of removing it – old content might still be useful. He doesn’t see a value in that for SEO side of things, but he points out that archiving the old pages for usability or maintenance reasons should be fine.

Duplicate Content

Q. Duplicate content is not penalised, with a few exceptions to this rule

  • (55:05) John says that the only time when Google would have something like a penalty or an algorithmic action or manual action is when the whole website is purely duplicated content – one website scraping the other website. For example, when it’s an ecommerce website that has the same description, but the rest of the website is different, that’s perfectly fine – it doesn’t lead to any kind of demotion or dropping in rankings.
    With duplicate content, there are two things that Google looks at, the first being, if the whole page is the same. That includes everything – the header, the footer, the address of the store and things like that. So if it’s just a description on an ecommerce website matching the manufacturer’s description, but everything around that is different, it’s fine.
    The second thing about the duplicate content plays a role when Google shows a snippet in the search results. Essentially, Google tries to avoid creating search result pages where the snippet is exactly the same as the other websites’. So if someone is searching for something generic, which is only in the description of that product, and the snippet that google would show for the website and for the manufacturer’s website are exactly the same, then Google tries to pick one of the pages. Google will try to pick the best page out of those that have exactly the same description.

Sign up for our Webmaster Hangouts today!

GET IN CONTACT TODAY AND LET OUR TEAM OF ECOMMERCE SPECIALISTS SET YOU ON THE ROAD TO ACHIEVING ELITE DIGITAL EXPERIENCES AND GROWTH