As always, if you can't figure out how to do something, raise your hand. Also, when you are done call over the instructor and show what you have before logging off.
You might remember you can use perl0 in my subman (e.g., http://www.cs.wustl.edu/~loui/513f03/subman.cgi?manterm=perl0&tagterm=split) and you can look at the examples in http://www.cs.wustl.edu/~loui/363s04/4/.
First, write a program in gawk which takes lines from stdin, then reports the total number of lines, total number of words, and total number of bytes. This is essentially the wc program. Do it in perl. Now show which is faster, by finding a very large file and timing each one. You might try repeating the task 100 times if your run time is too short.
Second, write a program which takes lines from a file, then records the frequency of each word (after stripping all non-alpha characters).
So if the file is input:
this is not the entire file at all! this is not the entire file either! this is not the entire file yet! this completes the entire file!!!it will have output
4 this 4 the 4 file 4 entire 3 not 3 is 1 yet 1 either 1 completes 1 at 1 allAnd now do it in perl. Compare which is fastest and by how much.
11 e 9 th 8 s 7 is 5 le 5 he 5 e 4 ti 4 t 4 re 4 nt 4 ir 4 il 4 hi 4 fi 4 en 4 ! 4 t 4 f 3 ot 3 no 3 n 3 i 2 et 2 !! 2 a 1 ye 1 te 1 t! 1 r! 1 pl 1 om 1 mp 1 ll 1 l! 1 it 1 es 1 er 1 ei 1 e! 1 co 1 at 1 al 1 y 1 cNow do it again in perl and compare the run-times on a large file.