bcb spider logo

BCB Spider Tracker - Installation

by: www.bluecollarbrain.com


Description
BCB Spider Tracker will track hits from Search Engine Bots (aka Spiders) and record their visits in a database for you (and your site's human visitors) to view.
Other Features:
  • Easily add/remove Spiders to track (over 100 spiders tracked AS IS!).
  • Sort Result Pages by Date, Spider, or Pages Crawled
  • Paginated Results - customize how many Results per page you want to view.

NOTE: This script only works on files that are parsed by php.


Files
Here's a short description of the files included in this package.

bcb_spider.sql Use this to build the database table used to store spider hits.
install.html Umm,yer lookin' at it.
spider_agent_graph.php A php file that builds a PNG graph image for use in spider_agent_report.php.
spider_agent_report.php A report of total visits by each spider that has visited your site.
spider_bcb.php The main file that will be referenced (by 'include') in every page you want to track.
spider_results.php A report of all tracked spider visits, including date, spider name and page visited.
spider_config.php The database configuration file. You will need to edit this to reflect your setup.
bcb_spider.css A simple style sheet for the report pages.
bcb_spider.jpg Our graphic.
graph_bg.png A small gradient that will be used as a backround image for the spider graph.
spider_array.php Main list of spiders that can be tracked.
spider_functions.php Just a function that the reports use to modify the date/time dispay.

 


Requirements
BCB Spider Tracker requires php and a MySQL database. That's it - pretty standard stuff these days.
Also, you should have at least a passing familiarity with working with your database as I will not go into detail regarding that aspect of this script (I strongly recommend using phpMyAdmin for database administration).

Installation
Load the bcb_spider.sql file (provided) into your database. bcb_spider.sql will build the table structure the script requires to log the spider visits to your site.
I have included 5 records just for testing purposes, so you can see the results page even before any spiders crawl your site. Please delete these records after testing is complete.
  Edit the file config/spider_config.php to reflect your MySQL database setup. The items in italics must be changed, or you will get errors. DO NOT remove the quotation marks.
Here are the lines you need to edit from a sample spider_config.php file:

$usernam="myusername";
$pass="mypassword";
$db="mydbname";
$db_host="myhost";
$tablename = "bcb_spider";
// Your MySQL username
// Your MySQL Password
// The name of your MySQL database
// Your MySQL Host
// Change this value only if you rename your db table.

  FTP the entire Spider folder (including sub-folders) to your web site. Where you put it is up to you, but I suggest placing it at the root, so you can browse to it more easily when you want to check your results page, and so you won't have to hunt through all the files to change paths to this and that.
Modify your pages
Insert the following line into any php page you want tracked.

If you uploaded the Spider folder to the root of your site, insert this:

<?php include $_SERVER['DOCUMENT_ROOT']."/Spider/includes/spider_bcb.php"; ?>

If you uploaded it somewhere else, insert this, and change mypath to reflect the path you used:

<?php include $_SERVER['DOCUMENT_ROOT']."/mypath/Spider/includes/spider_bcb.php"; ?>

If you use a common included header for all the pages of your site, you can just insert the line in it, and it will work for all your pages that use that header (nifty, eh?).

View your Results
To view your results, simply browse to www.yoursite.com/Spider/spider_results.php. If you didn't install the folder Spider at the root, just modify the path to the Spider/spider_results.php file.
Support / Contact

This script is provided AS IS, and we at bluecollarbrain assume no liability for any problem you may encounter due to the use of this script. That being said, we wouldn't be opposed to giving a little help if you followed the installation instructions and can't seem to get it to work. Email any problems you encounter to bassface@bluecollarbrain.com, and we'll try to help you get it sorted out.

Also, if you get it working and are liking it, let us know that, too! Email us at bassface@bluecollarbrain.com with a link to your results page.

Notes
You may change anything about the results page you want: colors, fonts, etc. You can change/remove the .jpg, too. We would ask that you leave the link at the bottom, especially if the page is available for public view. Thanks.

Customizing There are a few things you may want to customize about this script. Here's how to do a few of the ones that we think you'll be most interested in.
Adding/Deleting Spiders to Track

A little background and what's going on, first. When a search engine's spider (or anyone for that matter) visits your site, a little information is given about the visiting machine (NOT personal information). We can use php to extract pieces of that information, and from that info we can tell if it's a spider or not. One of these pieces is called the HTTP_USER_AGENT, which will include the spider's name. For example, Google's™ current spider has the HTTP_USER_AGENT
Googlebot/2.1 (+http://www.google.com/bot.html)
.

Now in order for a spider to be tracked by our script, we have to tell the script what names we want to look out for. Using Google™ again as our example, we can tell the script to watch out for any HTTP_USER_AGENT that contains the word googlebot. A match on the word googlebot will trigger the script to record the time, page visited, and HTTP_USER_AGENT of the spider into your database. If a match is NOT made, then nothing happens, and your page loads like it always has.

All names to be watched are kept in the file includes/spider_array.php. So, to add a new spider to track, all we have to do is tell the script to watch out for a new word. Let's say we want to look out for the new search engine spider named GeddyBot/2.0. Open up the spider_array.php file and add a line to the bottom of the list, following the format of all the other lines...

$spider_array[] = "geddybot";

UpperCase/LowerCase is not important, and we don't need the entire string of text, just the unique part that we can use to identify this particular spider.

That's it. To delete a spider, just delete the entire line from the spider_array.php file, and it will no longer be tracked.

Number of Results per Page
There is a variable in the spider_results.php page that controls how many results are displayed per page. It is called $limit, and it is set to 20 by default. Just change that 20 to whatever you want, and the results page will change accordingly.
BCB SpiderTracker
© 2004 - www.bluecollarbrain.com
 
| home | profile | services | portfolio | forums | contact | login! | sitemap | privacy |
All logos and trademarks in this site are property of blue collar brain. © 2004 by blue collar brain
Page was generated in 0.234232 seconds!