How to effectively block Sogou spider from crawling your website content?
Method 1: Use the robots.txt file
To prohibit the Sogou spider from crawling your website content, you can achieve this by creating a robots.txt file. Add the following content to the file:
User-agent: Sogou web spider
Disallow: /
User-agent: sogou spider
Disallow: /
User-agent: *
Disallow:
Because it is uncertain whether it is a sogou spider or a Sogou web spider, two lines have been written. Other search engines usually indicate the name of their spiders in related articles, but Sogou does not, which also shows its nature. Upload the file to the root directory of the website to take effect. However, it should be noted that the Sogou spider sometimes does not comply with the robots.txt file protocol, so it is still possible for it to crawl even if prohibited.
Method 2: Use the .htaccess file
In conjunction with the robots.txt file, you can create a new .htaccess file. The file is named .htaccess, and add the following content to the file:
#block spider
order allow,deny
#Sogou block
deny from 220.181.125.71
deny from 220.181.125.68
deny from 220.181.125.69
deny from 220.181.94.235
deny from 220.181.94.233
deny from 220.181.94.236
deny from 220.181.19.84
allow from all
</LIMIT>
Upload the file to the root directory of the website. The listed IP addresses are all from the Sogou spider. Because they frequently change, new IP addresses can be added at any time.