Other Articles On Visitor Traffic and Reducing & Analytic Rules:
It is important to understand what contributes to traffic bandwidth usage on your site to manage it effectively.
The primary traffic sources contributing to bandwidth usage on your website are:
- Traffic from spiders/bots for indexing
- Traffic from visitors
- Optionally, web service traffic from integrations via our Web Service API
- Optionally, large videos streamed from your website
Traffic from spiders/bots:
Your website is crawled by spiders and bots to get your site indexed for search engines, comparison engines and much more. However, spiders/bots bandwidth usage can add up quickly, leading to OnDemand Licensing Fees after your monthly reserve has been used. The key to manage your traffic from spiders is to review which spiders/bots visit your site and to block/disallow those that are not beneficial to your business. Example: If you don't sell internationally, it may not be beneficial for your site to be crawled by international search engine bots. Some international crawlers can take up very significant amounts of bandwidth, sometimes bordering on abuse.
To restrict such spiders from indexing your site and using your monthly reserve you can use the Visitor Traffic & Analytic Rules feature in AmeriCommerce online stores under Tools > Power Features> Rule Engine>Visitor Traffic & Analytics
Spiders can be of known or unknown origin.
Known spiders are spiders that have already been identified by AmeriCommerce online stores and named. Identifying and naming spiders is ongoing at AmeriCommerce Online Stores and every spider that we identify is named and added to your reports.
Below is an example of how to identify and block a known international spider.
Step 1: Select the Top Spiders/Crawlers Report and Identify International Spiders to block
Step 2: Setup a Rule under Visitor Traffic & Analytics to block the identified International Spiders
There are two parts to this step:
- Identify the User Agent
- Setup a rule for the User Agent
Browse to Tools > 3 Power Features > Rule Engine > Visitor Traffic & Analytics
To identify the User Agent Name find the admin rule for it:
- Use the arrows to browse the pages to find the spider in the list and identify the User Agent
- Find the Spider you wish to block and click on the edit icon beside it
- Make note of the User Agent. (This will be used in the rule for blocking)
To setup the rule to block the spider:
Select User Rules instead of Admin Rules, then click on orange New button in upper right.
Setup a Rule to Block the Spider:
- Enter a Rule Name. This can be any name of your choice and mark as 'Active'
- Select Condition Type as User Agent and the Condition as a unique portion of the user agent area of the visitor session of the spider; Click the '+' button to add the condition
- Select the Action as 'BlockUserRequest'; Click the '+' button to add the Action
- Click Save
After this is setup, if the international spider visits your site, the condition will be met and the Action of blocking the user request will be triggered. In effect, the spider visit will be blocked/disallowed and reduce unnecessary bandwidth usage. The spider will be shown a typical blocked request message that is customize-able via the Store Text & Languages module in AmeriCommerce online stores.
Additionally, you can redirect the spider to a page of your choice as well.
You can repeat the rule setup step for all the international spiders that you need to block.
You can also block a more broader span of the spiders if they are malicious and still popping up after attempting to block them the route above. You would take the IP address and block it by IP address. You can do this using the rule engine, however, a better way would be to use the IP Blacklist/Whitelist feature.
The way to view the session is to head to the Admin Panel -> Reports -> Visitor Sessions and when you see one that is repeatedly visiting the site, viewing a few pages in a fraction of a second and then doing it again that would be a good indication of a spider. If you click the edit button next to that remote address you can see the following and this information is what you use to create the rules we discussed.
While we continue to identify spiders and name them, this is an ongoing process and there will always be unknown spiders indexing your site. The Visitor Analysis and Top Visits by User Agent reports provides detailed information on User Agents. You can review them to identify visits that are using up bandwidth and then use the Visitor Traffic and Analytics Rules feature to disallow them. Please be wary of making rules too simplified or blocking too much traffic as you could easily block beneficial traffic or every visitor to your site if making rules improperly.
This site helps identify bots/crawlers via IP or User Agent by using the search tool on this site: http://www.botsvsbrowsers.com/
Reducing Traffic of Spiders using Robot.txt
If you have a spider you want to visit your site. Take google for example. However they are viewing your site in such rapid succession that it is running up your bandwidth. You can head to the Admin Panel -> Settings -> Search Optimization -> Robots.txt
Here you will see a text file. You can add the below:
And fill out that information. You can delay the amount of time between attempts, generally it would be a good idea to set it to 20 seconds. So, lets say it was google that was crawling the site over and over, it would be the below that needs to be added to the text.
Be sure to save that and it should reduce the amount of attempts/times they crawl the site.
Traffic from Visitors to your site:
Traffic from visitors to your site is good for business and should not be blocked. Blocking visitors from accessing the site is helpful only in rare instances if you are trying to block a fraudulent user or trying to block your site from untargeted traffic. It is not recommended that visitor traffic be blocked unless you are certain that a visitor needs to be blocked or visitors from a specific link/referring domain should be blocked.
In some cases hackers may mimic standard user agent strings to appear like a normal browser, still be wary of abnormally high numbers and add IP Address or HostName to your report columns to see if one IP or Host is abusing you, and you can block these specific items as well.