I've been working on a web stats solution for my
company using awstats to process the
web logs of our customers.
Today I was working on the script which will setup
configs/directories for new clients, servers, or
websites and ran
in to a problem that I haven't decided how I'm going
to tackle yet. My original plan was to take the stats
from any
one box and dice the log files so that each line was
examined to determine which website that line in the
log file pertained to. Then have the script shunt
that to another file (using something like SEC) for
processing for that specific site, on that specific
server. The problem I ran into had to do with how
to determine the website that was being hit. Since a
typical line in a log file might look like this:
66.249.65.77 - - [28/Jun/2005:13:26:14 -0400] "GET
/galleries/ HTTP/1.1" 404 346 "-" "Mozilla/5.0
(compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
I have no way to determine which website the person
was hitting only that they tried to "GET" /galleries/
so I'm
trying to decide if I want it to be setup so that
each customer will have a different directory for
each website to
dump stats. It's causing me a brain cramp so I
thought I'd take a break and whine about it.