"Francis A. Vidal" wrote:
>
> hello everyone,
>
> i'm trying to build a list of sites that i want to ban. i'm getting the
> list from the logfile of all the sites that have been visited by all
> users.
>
> this is the format of the logfile:
>
> 907389399.705 61 192.168.2.57 TCP_HIT/200 2172 GET http://www.excite.com/pfp/excite/images/big_logo.gif - NONE/- image/gif
>
> can someone help me on creating a script that will extract all domains
> that has no TCP_DENIED tag to a file with no duplication? i'm not familiar
> with sed, gawk or perl so i need your help on this.
>
> i would like the format to be (from the above example) one domain per
> line:
>
> excite.com
>
There's not a terrible lot of distinction between domain names, and
fqdn's. So:
fgrep -v TCP_DENIED access.log | cut -d/ -f4 | cut -d: -f1
gets you all the host names. If you want to pare it to domains, go play
count the dots with the output.
D
-- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GAT d- s++: a C++++$ UL++++B+++S+++C++H++U++V+++$ P+++$ L+++ E- W+++(--)$ N++ w++$>--- t+ 5++ X+() R+ tv b++++ DI+++ e- h-@ ------END GEEK CODE BLOCK------Received on Mon Oct 05 1998 - 06:49:32 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:42:20 MST