Disallow is an instruction that is entered in Robots.txt and is used to deny search engine bots access to a specific page or an entire directory.
The Disallow instruction belongs to the Standard Robots Exclusion Protocol, which was first created in 1984, and on July 1, 2019, Google began the process to make it a standard on the Internet.
This protocol establishes that search engine crawlers must access and read the journalist email database file called robots.txt before starting to crawl a web page. For this reason, pages should have a file called robots.txt in the root directory, which is the file that will contain the Disallow instruction.
disallow is a directive, not an obligation
It is important to note that the disallow instruction is a directive, therefore, it is not mandatory , since if the search engine in question considers that it should be able to access the content, it will access it without any problem.
This usually happens when we try to block a specific directory or page, and we have external websites that are linking to us. In this case, it is normal for the search engine to index the content that we wanted to block with the disallow instruction.
For this reason, it is recommended that if we want Google not to index a certain page or folder, we use other more effective methods, such as using the noindex directive of the robots meta tag or the X-Robots-Tag HTTP header or even blocking access to the page in question with programming .
In case we use the noindex directive, we must allow indexing of the page, since it is necessary to crawl it to see the tag.
What is the disallow instruction used for?
As we have previously mentioned, the disallow instruction is used to indicate to search engine robots what content they should not follow.
A utility that is often used a lot is to hide information, however, as we have said, it is not a good option for two reasons: first, because this information will always be accessible to everyone just by having the URL, and second, because the search engine can index it if it considers it to be relevant.
How should we use the disallow directive?
The disallow directive is very easy to use, since we have very few options, which we will detail below:
Disallow: /
This is the typical instruction used when we are developing a website, and what it does is tell the bots that crawl our website not to access the website, which will ensure that the website is not indexed.
Grow with us!
User-agent: * Disallow: /
User-agent: * means that the instruction applies to all robots. Disallow: / means that access to all pages or files available on that domain will be blocked.
It is important to note that if we have pages linked from other websites outside our own, search engines will access the pages in question.
It is important to note that if we put the following instruction, search engines will crawl everything.
Disallow:
When we put the disallow: instruction, what we want to tell the search engine bot is to access all content.
What is Disallow and what is it for?
-
- Posts: 218
- Joined: Sun Dec 22, 2024 3:35 am