Website anomaly detection and handling

When doing SEO, you often check the website's log files to perform analysis and make judgments. However, if you control hundreds or thousands of sites, would you still look at each log file one by one?

When dealing with a large number of websites, we generally monitor them according to the importance of the sites. For example, important resource sites may be analyzed and maintained as if they were main sites, while for some sites, analysis and maintenance may only occur when problems arise. There are also some sites that are left to thrive or perish based on the strategy defined for the site group, and there is no universal approach.

Generally, I only perform abnormal monitoring on site groups, meaning that I manually analyze and deal with anomalies when they are detected, and I rarely check them otherwise.

Definition of abnormal situations

When monitoring the operation of websites, we need to define what situations are considered abnormal. Generally speaking, the abnormal situations I personally defined mainly include the following 5 types:

  1. Abnormal spider visit frequency: For example, sudden loss of ranking leading to spiders no longer visiting, or the occurrence of frequent spider visits after a ranking drop.
  2. Website Traffic Anomaly: Generally, there should not be significant fluctuations in the traffic of the station group. If fluctuations occur, it may be due to someone collecting or attacking the website.
  3. 404 Error: It means the page does not exist and needs to be addressed promptly.
  4. Special Page Traffic Anomaly: Traffic anomaly on important pages, such as Taobao customer redirection pages, can be observed by comparing traffic and conversion rates to understand traffic sources.
  5. Special Keyword Traffic Anomaly: If market search volume and description click-through rate remain constant, the traffic of special keywords reflects the ranking situation.

Monitoring Methods

To monitor the above anomalies, a data table can be established for each indicator, represented by ABCDE. Then, an automated task can be set up to save the daily data of each website to the database.

Under IIS, it is recommended to use the Logparser tool provided by Microsoft, which can process logs using SQL queries. The specific parameters can be looked up using a search engine for more information.

Specific Usage Method

Taking the first anomaly situation as an example, you can monitor abnormal spider visit frequency using the following command:

Logparser -i:iisw3c "+"Select count(0) as hits Into A from xxx.log where cs(User-Agent) like "%spider%"" -o:SQL -server: Server IP -driver:"SQL Server" -database:Database Name -username:sa -password:***

Exception Handling

When preprocessing, compare the data from today with the data from yesterday to obtain the difference. Set a threshold, exceeding the threshold is considered an exception. For example, traffic anomalies can be judged by percentage, with over 30% being considered an anomaly; 404 errors can be directly judged by subtraction.

I use a C# program to handle exceptions, for example, by comparing the latest 404 data to identify anomalies. When an exception occurs, the program will notify via email for timely handling.

Other Suggestions

In addition to the above methods, you can also use Logparser to split logs, and then send them to a specified FTP address via FTP commands, so you can directly use the data without manual processing each time.

Overall, monitoring website anomalies is an important means to maintain the security and stable operation of the website. Timely detection and handling of anomalies can ensure the normal operation of the website and user experience.