soledad penadés
repeat 4[fd 100 rt 90]

Find out the full referrer (with the shell)

Are you fed up with Google Analytics not showing the full referrer url and just showing something like http://www.example.com/forum/viewtopic.php? I also do, I love to know who's linking me (yeah I'm curious!).

My hosting compresses access_logs which reach a certain size, so when I downloaded the access logs files I get a bunch of .gz files which I'm not going to manually uncompress… So I went to the terminal and once in the folder where the log files are, I type

find . -name "*.gz" -exec gunzip {} \;

Now I have lots of files like access_log.20060929, access_log.20060930, etc. For searching let's say a referrer called example.com which I see in GA, I do:

cat * | grep example.com

and that will return you the apache log lines where the term appears.

For example:

81.39.91.97 - - [26/Sep/2006:11:27:47 +0000] "GET /index.php HTTP/1.1" 200 9562 "http://example.com/viewtopic.php?t=747" "Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"

It's a bit of brute force approach as it's searching in all the files (now that I realize it's even searching in the compressed files since I didn't remove they yet, haha!). But it's very fast even though!

With a bit more of love this could be a rudimentary stats script but I'm not that much into shell scripting (and I'm trying to force myself into really learning regular expressions to do that stats script with ruby instead).
Oh and I forgot to say this works for any decent shell - linux, mac… I think I also could do it with a windows box with unxutils installed (so that you get the funky stuff like grep, find, cat, etc).

// 4 responses to Find out the full referrer (with the shell)

winden^network^batman.group
winden^network^batman.group
20061003

sola, zcat * will decrunch on the fly without writting to disk anything :)

sole
sole
20061003

oh, i didn't know about zcat!!! :D
but I can't manage to make it work or find a good example of it. Any suggestion? If I just do
zcat * | grep whatever
it gives me this error message:
zcat: access_log.20061002.Z: No such file or directory

[LuY]
[LuY]
20061004

2 things..

- gzip -dc | grep..

And if that helps.. check this.

- gzip -dc | awk '{print $11}' | sort | uniq

I think that the referrer is in the 11th field.

sole
sole
20061004

ahhh good old awk :)

You came late to the party

Comments are closed, but if you want to comment anything about this entry please let me know using the info in this page.