Netways Global Home |
|
Here is an example: Suppose you were asked to take a standard Web Server log file for one month and produce a list of hits per host and the number of unique hosts hitting the site. Each entry in the log file looks like this:
fw-gb-1.somehost.com - - [01/Mar/1999:17:09:58 +0000] "GET / HTTP/1.0" 200 6409 "-" "Mozilla/4.5 [en] (X11; I; Linux 2.0.36 i686)"
Now, how do you quickly write a program to solve this problem? You could use Perl and that is a perfectly acceptable answer, but, consider how easy this is to do in C++:
int main(int argc, char *argv[])
while(!cin.eof())
typedef map<string, int>::const_iterator CMI;
#include <iostream>
#include <string>
#include <map>
{
map <string, int> counter;
int total = 0;
{
string entry;
string ignore;
cin >> entry;
getline(cin, ignore); // Ignore
everything but the first word
if(!counter[entry]) total++;
//A new hostname entry
counter[entry]++;
}
cout << "Unique hosts: " << total << endl;
cout << endl << "Here is a list of hits per host:
" << endl;
for(CMI l = counter.begin(); l!=counter.end();
++l)
cout <<
l->second << ": " << l->first << endl;
}
Not only was that simple, but, it runs quickly and works great.
I compiled [ g++ cnt.cc -o cnt ] this on a Linux box using egcs-1.1 and
ran it against my Web Server log file for February 1999. Here is
a partial output:
# cat netwaysglobal-logs/access_log|./cnt
Here is a list of hits per host:
Unique hosts: 351
1:
3: 131.104.82.154
3: 131.96.156.174
1: 137.51.82.43
7: 144.207.57.1
1: 146.115.62.35
6: 155.74.112.126
1: 159.148.84.166
9: 159.218.12.43
1: 164.166.137.70
94: 170.65.25.14
Here are some links to other sites and resources for generic programming:
|