|
Description |
• |
BCB
Spider Tracker will track hits from Search Engine Bots (aka Spiders)
and record their visits in a database for you (and your site's
human visitors) to view.
Other Features:
- Easily
add/remove Spiders to track (over 100 spiders tracked AS IS!).
- Sort
Result Pages by Date, Spider, or Pages Crawled
- Paginated
Results - customize how many Results per page you want to
view.
NOTE:
This script only works on files that are parsed by php. |
|
Files |
• |
Here's a short description of the files included in this package.
| bcb_spider.sql |
Use
this to build the database table used to store spider
hits. |
| install.html |
Umm,yer
lookin' at it. |
| spider_agent_graph.php |
A
php file that builds a PNG graph image for use in spider_agent_report.php. |
| spider_agent_report.php |
A
report of total visits by each spider that has visited
your site. |
| spider_bcb.php |
The
main file that will be referenced (by 'include') in every
page you want to track. |
| spider_results.php |
A
report of all tracked spider visits, including date, spider
name and page visited. |
| spider_config.php |
The
database configuration file. You will need to edit this
to reflect your setup. |
| bcb_spider.css |
A
simple style sheet for the report pages. |
| bcb_spider.jpg |
Our
graphic. |
| graph_bg.png |
A
small gradient that will be used as a backround image
for the spider graph. |
| spider_array.php |
Main
list of spiders that can be tracked. |
| spider_functions.php |
Just
a function that the reports use to modify the date/time
dispay. |
|
|
Requirements |
• |
BCB
Spider Tracker requires php and a MySQL database. That's it - pretty
standard stuff these days. Also, you should have at least a passing
familiarity with working with your database as I will not go into
detail regarding that aspect of this script (I strongly recommend
using phpMyAdmin for database
administration). |
Installation |
• |
Load the bcb_spider.sql
file (provided) into your database. bcb_spider.sql
will build the table structure the script requires to log the spider
visits to your site.
I have included 5 records just for testing purposes, so you can see
the results page even before any spiders crawl your site. Please delete
these records after testing is complete. |
| |
• |
Edit
the file config/spider_config.php
to reflect your MySQL database setup. The items in italics
must be changed, or you will get errors. DO
NOT remove the quotation marks.
Here are the lines you need to edit from a sample spider_config.php
file:
$usernam="myusername";
$pass="mypassword";
$db="mydbname";
$db_host="myhost";
$tablename = "bcb_spider"; |
// Your MySQL username
// Your MySQL Password
// The name of your MySQL database
// Your MySQL Host
// Change this value only if you rename your db table. |
|
| |
• |
FTP
the entire Spider folder
(including sub-folders) to your web site. Where you put it is up to
you, but I suggest placing it at the root, so you can browse to it
more easily when you want to check your results page, and so you won't
have to hunt through all the files to change paths to this and that.
|
Modify
your pages |
• |
Insert
the following line into any php page you want tracked.
If
you uploaded the Spider
folder to the root of your site, insert this:
<?php
include $_SERVER['DOCUMENT_ROOT']."/Spider/includes/spider_bcb.php";
?>
If
you uploaded it somewhere else, insert this, and change mypath
to reflect the path you used:
<?php
include $_SERVER['DOCUMENT_ROOT']."/mypath/Spider/includes/spider_bcb.php";
?>
If
you use a common included header for all the pages of your site,
you can just insert the line in it, and it will work for all your
pages that use that header (nifty, eh?). |
View
your Results |
• |
To
view your results, simply browse to www.yoursite.com/Spider/spider_results.php.
If you didn't install the folder Spider
at the root, just modify the path
to the Spider/spider_results.php file. |
Support
/ Contact |
• |
This
script is provided AS IS, and we at bluecollarbrain
assume no liability for any problem you may encounter due to the
use of this script. That being said, we wouldn't be opposed to
giving a little help if you followed the installation instructions
and can't seem to get it to work. Email any problems you encounter
to bassface@bluecollarbrain.com,
and we'll try to help you get it sorted out.
Also,
if you get it working and are liking it, let us know that, too!
Email us at bassface@bluecollarbrain.com
with a link to your results page. |
Notes |
• |
You
may change anything about the results page you want: colors, fonts,
etc. You can change/remove the .jpg, too. We would ask that you leave
the link at the bottom, especially if the page is available for public
view. Thanks. |
| Customizing |
• |
There
are a few things you may want to customize about this script. Here's
how to do a few of the ones that we think you'll be most interested
in. |
| Adding/Deleting
Spiders to Track |
• |
A
little background and what's going on, first. When a search engine's
spider (or anyone for that matter) visits your site, a little
information is given about the visiting machine (NOT personal
information). We can use php to extract pieces of that information,
and from that info we can tell if it's a spider or not. One of
these pieces is called the HTTP_USER_AGENT, which will include
the spider's name. For example, Google's™ current spider
has the HTTP_USER_AGENT
Googlebot/2.1 (+http://www.google.com/bot.html).
Now
in order for a spider to be tracked by our script, we have to
tell the script what names we want to look out for. Using Google™
again as our example, we can tell the script to watch out for
any HTTP_USER_AGENT that contains the word googlebot.
A match on the word googlebot
will trigger the script to record the time, page visited, and
HTTP_USER_AGENT of the spider into your database. If a match is
NOT made, then nothing happens, and your page loads like it always
has.
All
names to be watched are kept in the file includes/spider_array.php.
So, to add a new spider to track, all we have to do is tell the
script to watch out for a new word. Let's say we want to look
out for the new search engine spider named GeddyBot/2.0.
Open up the spider_array.php file and add a line to the bottom
of the list, following the format of all the other lines...
$spider_array[]
= "geddybot";
UpperCase/LowerCase
is not important, and we don't need the entire string of text,
just the unique part that we can use to identify this particular
spider.
That's
it. To delete a spider, just delete the entire line from the spider_array.php
file, and it will no longer be tracked. |
Number
of Results per Page |
• |
There
is a variable in the spider_results.php
page that controls how many results are displayed per page. It is
called $limit, and it
is set to 20 by default. Just change
that 20 to whatever you want, and the
results page will change accordingly. |
|