WHY NOT PROGRAM THIS WEEK?

Maybe by learning to program early in the semester,
you'll have something to brag about.

1.  There's this awesome language called gawk which is my favorite
	language in the world.  Totally.  I want to hang out at
	Wesleyan because their three best alumns, as far as I can
	tell, are Joss Whedon, creator of Buffy the Vampire Slayer
	for TV, David Kohan, creator of Will and Grace, and Peter
	Weinberger, the W in AWK.  (A is for Al Aho, who teaches
	at Columbia, and K is for Brian Kernighan, whom Princeton
	keeps trying to claim as their own.  The G is for genius,
	as far as I can tell.)

	We can program in gawk by logging onto a UNIX machine and
	learning the editor.  Or we can use my nifty html interface.
	It's here:  http://k9.cs.wustl.edu/~loui/gawkterm.html

	And that's actually a program to print "hello world."

	Try run it.

2.  You can have it print other things.  See if you can make the
	program print things you wouldn't want to say in public.

3.  You can print numbers.  Have it print the numbers 1 through 10.

4.  As you know, computers are good at doing this with much less
	instruction.  The looping construct is:

	for (i=1; i<=10; i=i+1) print i

	Try it.

	Try change the upper and lower bounds.

	Try printing the even numbers from 20 to 100.

5.  You can make the program to two things in a loop by putting braces
	around the body of the loop:

	for (i=1; i<=10; i=i+1) {
	  printf i".  "
	  print "hello world."
	}

	Try this.  

6.  If you want the program to do something special when a certain
	condition arises, you can use a conditional statement,
	such as:

	for (i=1; i<=10; i=i+1) {
	  printf i".  "
	  print "hello world."
	  if (i==10) print "goodbye world."
	}

	Here, the "==" tests for equality.  You can change the "10"
	to whatever number you want.  Make it say goodbye on the
	7th line.

7.  Gawk is pretty good at doing things with data.
	The right hand window is for data that you want to
	feed the program.

	Put ten numbers, each on its own line, in the
	right hand window.

	Now, the gawk program has a beginning, and end, and a middle
	part.  Do you see that we have just been using the beginning part?

	This time, we are just going to use the middle and the end parts.
	The middle part is a statement that is executed on each line
	of the data.

	Put "print $1 + $1"

	in the middle part.

	Then run.  It should double the number on each line.

	It does this because $1 refers to the first "field" on each line.
	In our case, a field is a word is a number.  Sometimes this
	is not true.

	Try hiding your numbers between words, so that the number is the
	second word in each line, e.g., "go 55 south".

	Now try "print $2 + $2" and see if that works.

	What if you go back to "print $1 + $1"?

8.  Any idea how to sum all of your numbers?

	You need a persistent accumulator, also known as a variable,
	in which to store your sum.

	How about 

	s = s + $1 + $2
	print s

	as your middle part?

	I put " + $1 + $2" because I don't really care what your
	data looks like at the moment, whether your numbers are in
	the first field or the last.

	In fact, try changing your data so that you have numbers
	and words all over the place.  Maybe one line is "go 55 miles south on 15 then 5 east"
	and another line is "15 is fifteen and 10 is ten".

	Can you sum all the numbers in all the fields?

	A loop will help here.  Assume that there are at most 10 words
	on each line.

	How about 

	for (i=1; i<=10; i=i+1) s = s+$i
	print s

	as your middle part?

9.  It's getting late.  I'm not going to help, but I want you to generate
	this table of squares, using a loop:

	"3 * 4" is the way to multiply two numbers, by the way.
	1 1
	2 4
	3 9
	4 16
	5 25
	6 36
	7 49
	8 64
	9 81
	10 100

	Then I want you to generate this table of factorials.
	Ha, that's a bit harder.

	1 1
	2 2
	3 6
	4 24
	5 120
	6 720
	7 5040
	8 40320
	9 362880
	10 3628800

10.  This is much easier.  Modify your program so that if the second number
	is greater than 1000, it also prints "and I would like that in pesos."

	You should output:

	1 1
	2 2
	3 6
	4 24
	5 120
	6 720
	7 5040 and I would like that in pesos.
	8 40320 and I would like that in pesos.
	9 362880 and I would like that in pesos.
	10 3628800 and I would like that in pesos.

11.  It seems that proper timing requires me to have you do 12 things instead of just 10.
	
	You can print parts of a string using the "substr" function.

	Try print all the substrings of "hello".

	Here is a program to print all the prefixes:

	BEGIN {
	  for (i=1; i<=length("hello"); i=i+1) print substr("hello",1,i)
	}

	But that doesn't include "ell" which is also a substring.

	You'll need a DOUBLE loop:

	BEGIN {
	  for (i=1; i<=5; i=i+1) {
	    for (j=1; j<=5; j=j+1) {
	      print substr("hello",i,j)
	    }
	  }
	}

	Except this prints several of the suffixes too many times.
	It prints substr("hello",5,1) and substr("hello",5,2),
	but since there is no substring starting in position 5, and
	having length 2, it just prints "o" a second time.

	Can you fix the j loop so that it doesn't print redundant stuff?

	Sure you can.

12.  Put some new text in the right hand side.

	Name ten people, and say something about them in a sentence.

	For example, "I like Sahil Kumar because he is so gd pleasant."

	We are going to print all the capitalized words in each line:

	{
	  for (i=1; i<=100; i=i+1) {
	    if (substr($i,1,1)=="A") print $i
	    if (substr($i,1,1)=="B") print $i
	    ...
	    if (substr($i,1,1)=="Z") print $i
	  }
	}

	Sorry, but you have to fix it so it works.

	Ok, there is this thing called a "regular expression test"
	which is just a pattern match.  You can say:

	if (substr($i,1,1) ~ /[A-Z]/) print $i

	which will test for any capital letter!

	Fix your program so that it just has one conditional in the body of
	the loop.

	In fact, regular expressions (regexps) are really powerful.

	Try just 

	if ($i ~ /^[A-Z]/) print $i

	which says that the upper case letter has to be at the beginning
	of the word.


That's it!  You learned an awful lot in this lab, I think.