Internal Relative URLs pointing to absolute canonical URLs
Q. When we have internal relative URLs pointing to the canonical absolute URLs, is that fine for Google?
- (0:33) Relative URLs are perfectly fine as well as a mix of relative and absolute urls on the site. Pay attention to the relative urls to ensure there are no mistakes in the path.
Other questions from the floor:
MUM and BERT
Q. So a couple of years ago Google noted that when it came to ranking results, BERT would better understand and impact about ten percent of searches in the US. So has that percentage changed for BERT and what percentage is MUM expected to better understand and impact searches?
- (02:15) John is pretty sure that the percentage has changed since then because everything is changing but he’s not sure if they have a fixed number that goes for BERT or that goes for MUM. MUM is more like a multi-purpose machine-learning library anyway so it’s something that can be applied in lots of different parts of the search. So, it’s not so much that you would like isolated to just ranking but rather you might be able to use it for understanding things like on the level of very fine grain and then that’s kind of interwoven in a lot of different kinds of search results but he doesn’t think they have any fixed numbers. In his opinion, it’s always tricky to look at the marketing around machine learning algorithms because it’s very easy to find very exceptional examples but that doesn’t mean that everything is as flashy. He was talking with some of these search quality folks they’re really happy with the way that these kinds of machine learning models are working cool.
Pages blocked by robots.txt
Q. A question is about a message in the search console indexed though blocked by robots.txt, so of a certain page type, a person asking has about 20,000 valid indexed pages which look about right. However, he’s alarmed that there’s a warning for about 40,000 pages that say indexed though blocked by robots when he inspected these it turned out they are auxiliary pages that are linked from the main page. These URLs are either not accessible by users who are not logged in or they’re just kind of thin informational pages so he did indeed add a disallow for these on robots. He guesses that Google must have hit these a while back before he got a chance to disallow them and because it knows about them and what’s alarming is that console says that they’re “indexed” right. Does this actually mean that they are and more importantly, are they counted toward his site’s quality because when he is pulling them up using a site colon query, about one in five show up?
- (04:22) John says that mostly as something of a warning for a situation where they were not aware of what’s happening. So, if they’re certain that these should be blocked by robots.txt text then that’s pretty much fine but if they weren’t aware that pages were blocked by robots.txt text and they actually wanted to have them indexed and ranking well, then that’s something where they could take action on. So from that point of view, he wouldn’t see this as something that’s alarming it’s mostly just Google found these pages, and they’re blocked by robots.txt. By the way, if you care about these pages you might want to take some action if you don’t care about them just leave them the way they are. And it’s not going to count against your website. It can happen that they appear in the search results but usually, that’s only for very artificial queries like a site query.
And the main reason it’s mostly those kinds of queries is that for all of your normal queries that are going to your pages you almost certainly have reasonable content that is actually indexable and crawlable and that will rank instead so if we have the choice between a robot page that we don’t know about kind of what’s on there and a reasonable page on your website and that reasonable page has those keywords on it then we’ll just show your normal pages. So, from that point of view, it’s probably extremely rare that they would show up in search and it’s more a warning in case they weren’t aware of that.
Indexing pages
Q. Let’s say you consider a site with 100 million pages and there may be 10 million pages that we believe are low-quality maybe 5 million of those are actually indexed and let’s suppose that we want to add a noindex tag to these 10 million as well as reduce the number of internal links that they received. So over a hypothetical four months let’s say google crawls and removes two million from the index while the other three million remain. And maybe a few months down the road, we determined that a few hundred thousand are actually of decent quality and we want to reinstate them essentially so we remove the noindex tag and we start adding meaningful internal links back do you foresee or think the history here of being indexed and applying a noindex tag and then trying to get it indexed again at a later time would be detrimental you know would google be reluctant to crawl these URLs again knowing that they previously were not indexed and you know do you think that there would be any issues with getting they indexed again after they were crawled.
- (10:24) John thinks like in the long run there definitely wouldn’t be any kind of long-term effect from that I think what you might see initially is that if we’ve seen a noindex for a longer period of time that we might not crawl that URL as frequently as otherwise but if you’re saying this is kind of in the process of we added a noindex it dropped out and then we changed our mind we added or we removed it again then that’s not going to be like along the term noindex situation that’s probably going to be something where we still, crawl that URL with the kind of the normal crawl rate anyway and when you’re talking about millions of pages we have to spread that out anyway. So, that’s something where you’ll probably see depending on these pages. You might see every couple of weeks or every month trying to crawl those pages individually. John doesn’t see that kind of crawl rate dropping off to completely zero, maybe it’ll go from one month to two months if we see in noindex for a longer period of time. He doesn’t see it going completely to zero so that kind of like we added noindex and they changed their mind at some point later on usually, that’s perfectly fine and this is especially the case if you’re working on internal linking at the same time where you add more internal links to those pages then, of course, we’re going to crawl those pages a little bit more because they see like oh it’s like freshly linked from this and this place within your website therefore maybe there’s something we actually do need to index. He says if you’re talking about a website with millions of pages then any optimisation you do there is such a small level that you actually can’t measure it.
Hiding internal links with CSS
Q. The person asking says that he found in his research that they(client) were cloaking internal links but then he checked the way back machine and found out they made a certain template change and it’s about footer links and these photo links were there back in January. When he checked their Google Search Console he didn’t really see a penalty. So, he wonders how long does it take as he wants to advise his client before they do get a penalty as they’ve been doing this for approximately 9 months.
- (17:03) John doesn’t see the webspam team taking action on that because especially when it comes to internal linking like that, it’s something that has quite a subtle effect within the website and you’re essentially just shuffling things around within your own website. He thinks it would be trickier if they were buying links somewhere else and then hiding them that would be problematic that might be something that our algorithms pick up on or that even the webspam team at some point might manually look at it but if it’s within the same website if it’s set to display “none”.
John doesn’t think it’s a great practice. If you think it’s an important link then look to make it visible to people but it’s not going to be something where the webspam team is going to take action and remove the site or do anything crazy.
Question oriented title tags
Q. A person asked about the effect of ‘question oriented titles’ such as what, how, which and who are in the content in terms of being comparable with a semantic search engine?
- (28:44) John doesn’t know exactly what direction headed with this question so it’s hard to say generally. He would recommend to focus on things like keyword research to try to understand what people are actually searching for and if it is need to match that one-to-one he always find it a little bit risky to try to match these queries one-to-one because those queries can change fairly quickly so that’s something where he wouldn’t focus so much on.
How does Google handle interactive content
Q. A person asks how does Google evaluate interactively content like a quiz or questionnaire that helps users figure out which product or thing that they need would rankings still be based on the static content that appears on the page?
- (29:57) Yes. The ranking essentially is based on the static content on these pages so if have something like a full page of questions and rank that page based on those questions and that means if have products that are kind of findable after going through those questions should also make sure that you have internal linking to those products without needing to go through that questionnaire so it’s something along the lines of having a normal category set up on the website as well as something like a product wizard or whatever you have to help the users make those decisions John thinks that’s really important, the questionnaire pages can still be useful in search if recognise that people are searching in a way that not sure which particular product matches their needs and kind of address that in the questionnaire in the text on the questionnaire than those questionnaires can also appear in search as well but it’s not the case that Googlebot goes in and tries to fill out a questionnaire and sees what’s kind of happening there.
Do-follow links to trusted authoritative sites
Q. A person asks if they give a do-follow link to a trusted authoritative site is that good for SEO?
- (31:28) John thinks that this is something that people used to do way, at the beginning, where they would create a spammy website and on the bottom, they’d have a link to Wikipedia and CNN and then hope that search engines look at that and say like this must be a legitimate website but like John said people did this way in the beginning and it was a really kind of traditional spam technique almost and John doesn’t know if this ever actually worked so from that point of view, John would say no this doesn’t make any sense obviously if have good content within website and part of that references existing other content then kind of that whole structure that makes a little bit more sense and means that website overall is a good thing but just having a link to some authoritative page that doesn’t change anything from our of point of view.
Relevant keyword research on Taiwanese culture
Q. Person is going to do basic research regarding what do foreigners Google the most with Taiwanese culture what are the most relevant keywords with Taiwan, on Google search, he asked if it would be great if he could generate a ranking list of it to acquire that information he could further designate a campaign for certain products
- (32:32) John doesn’t have that information so he can’t give that to him but essentially what he is looking for is probably everything around the lines of keyword research and there’s lots of content written up on how to do keyword research there are some tools from Google that he can use to help you figure that out there are a lot of third-party tools as well and John has no insight into what all is involved there so John couldn’t really help with that and he definitely can’t like give you a list of the queries that people in Taiwan do.
Two languages on the one landing page
Q. A person was having two languages like Hindi and English on the same page and he was ranking good on Google but after the December core update he lost ranking for Hindi keywords mostly he asks what he should do to get it back?
- (33:31) John doesn’t know. So, on the one hand, he doesn’t recommend having multiple languages per page because it makes it really hard for us to understand which language is the primary language for this page so from that point of view I think that configuration of having Hindi on one side English on the other side on a single page is something that can be problematic on its own so John would try to avoid that setup and instead make pages that are clearly in Hindi and clearly in English and by having separate pages like that it’s a lot easier for us to say “someone is searching in Hindi for this keyword here’s a page on the specific topic” whereas if we can’t recognise the language properly then we might say well we have an English page but the user is searching in Hindi so we probably shouldn’t show it to the user and if we’re not sure about the language of a page then that’s also kind of tricky especially when there are other comparative pages out there that are clearly in Hindi so that’s kind of the one thing the other thing is with regards to core updates we have a lot of blog posts around core updates and John would go through those as well because if you’re seeing this kind of a change happening together with a core update it might be due to kind of two languages on the page but probably it’s more likely due to just general core update changes that we’ve made so John would take a look at those blog posts and think about what you might want to do to kind of make sure that your site is still very relevant to modern users.
Doorway page creation
Q. Person asks if it’s okay from an SEO perspective to create doorway pages when they actually help users, for example, this page leads users who have searched for a non-scientific name of a cactus to the original page?
- (35:34) John doesn’t know about this specific situation and usually we would call things doorway pages if they essentially lead to the same funnel afterward where essentially you’re taking a lot of different keywords and you’re guiding people to exactly the same content in the end in the case of something like an the encyclopedia isn’t the same content it’s essentially very unique pieces of content on there and just because it covers a lot of keywords doesn’t necessarily mean that it’s a doorway page, so without digging into this specific site in detail my guess is that that would not be considered a doorway page but a doorway the page might be something where if you have a cactus page on your website and you’re saying like cactuses in all cities nearby you make individual city pages where all of the traffic is essentially funneled to the same direction on your website then that would be considered a doorway page where you’re kind of like creating all of these small doorways but they lead all to the same house.
Classified websites
Q. A question related to classified websites have add listings on search results person allow to crawl in the index if he has no add listings for some time should I disallow to index or should he let Google decide if search results don’t have ad listings and excluding those pages from the sitemap would also be a good practice?
- (37:14) John thinks just for sake of clarity he thinks the search results that this person means are the search results within their own website so if someone is searching for a specific kind of content then the website pulls together all the ads that it knows and it’s those search results not Google search results and John essentially the direction here is if like what you should do with empty internal search results pages and our preference is essentially to be able to recognise these empty internal search results pages which could be by just adding noindex to those pages that’s kind of the ideal situation because what we want to avoid is to have a page like that in our index where it’s basically like saying oh someone is searching for a blue car of this model and make and you have this page on your website but it says like he doesn’t know of any people selling this kind of a car then sending people to your website for that kind of a query would be a really bad user experience so we would try to recognise those pages and say like these are either soft 404 in that we recognise they’re an empty search results page or you put a noindex on them and you tell us that it’s an empty search results page so essentially that’s kind of the direction to go there if you can recognise it ahead of time John would generally prefer having a noindex directly from your side if you can’t recognise it ahead of time then using javascript to add a noindex might be an option with regards to sitemap or not the sitemap file only helps us with additional crawling within a website it doesn’t prevent us from crawling these pages so removing these pages from the sitemap file would not result in us dropping them from search or and would not result in us recognising that actually, they don’t have any content there so removing something in a cycle file wouldn’t negatively affect the natural crawling and indexing that we do for individual pages so I think those are kind of the two aspects if you can recognise it’s an empty search results page put a noindex on it removing it from a sitemap file is not going to remove it from our index.
Not updating data in Google Search Console
Q. Person’s question is that Google is not indexing websites, even fresh sites and also not updating data and in Google Search Console. Is there any hidden update going on?
- (42:35) There are always updates going on so that’s kind of hard to say John doesn’t think there’s anything explicitly hidden going on what John does sometimes see is that because it’s so much easier to create websites nowadays people create websites with a large number of pages and then they focus more on the technical aspect of getting millions of pages up and they disregard a lot of the quality aspects and then because of the way that search console tries to provide more and more information about the indexing process, it’s a lot easier to recognise that Google is not actually indexing everything from this website and then the assumption is often there that well perhaps this is a technical issue that John just need to tweak and usually, it’s more of a quality issue where when we look at the website overall and we’re not convinced about the quality of this website then our systems are not going to take the time to kind of invest in more crawling and indexing of a website so if you give us a million pages and the pages that we end up showing initially don’t convince us then we’re not going to spend time to actually get all of those millions of pages indexed we’re going to kind of hold off and keep a small subset and if over time we see that the subset is doing really well and has all the signs that we look at with regards to quality then we will go off and try to crawl more but just because there are a lot of pages on a website does not mean that we’re going to crawl and index a lot of pages from that website.
Password protect and Google Penalties
Q. A person created a small website for their mom’s business using a CMS tool called Squarespace he knows that they automatically submit a sitemap once you create a new page and now we’ve decided to add the e-commerce functionality like about two weeks ago and the site was password protected so his first question would be if Google penalises you in a way if the user can’t access the page if it’s basically just password protected and the second would be yeah the site was basically indexed and shown really nicely and the pages before but after editing all those products and different pages he looked it up on Google Search Console for crawling but basically, his mom was giving him a hard time now when these pages are going to be shown again.
- (48:09) John thinks so there’s no penalty for having password protection but it means that we can’t access the content behind the password so probably that is the initial step that happened with regards to the kind of turning on almost like an e-commerce site or shop section on your website we actually have a whole article on that now in our search documentation specifically for e-commerce sites so John would take a look at that there might be some tricks that you missed out on that that can help to speed things up there
The performance measurement of Google discover
Q. Updating and expanding an existing content might take longer to recrawl and re-index, and trying to push that by submitting manually might not be the best strategy
- (50:57) John thinks the only way to measure it is in search console because in particular in analytics the traffic from discover is almost always folded into Google search and then you can’t separate that out so it’s only in the search console do you see kind of the bigger picture.
Internal search pages
Q. A person had a question in regards to the internal search pages so we’re allowing indexation of on-site searches so sometimes someone does a search on our site we create a page for that and now that’s gone out a bit of control so he has hundreds of millions of these pages so how would you recommend we saw that house and if there are actually any benefits to cleaning that up or if he shouldn’t worry about it?
- (52:34) John thinks for the most part it does make sense to clean that up because it makes crawling a lot harder so that’s kind of the direction I would look at it there is to think about which pages you actually, do you want to have crawled and indexed and to help our systems to focus on that not so much that like you should get rid of all internal search pages some of these might be perfectly fine to show in search but really try to avoid the situation where anyone can just go off and create a million new pages on your website by linking to random URLs or words that you might have on your pages so to kind of take it and say well you have control of which pages you want to have crawled in the index rather than like whatever randomly happens on the internet.
Sign up for our Webmaster Hangouts today!