ALPHA 264/500MHz-4ML2/512MB/SCSI-Wx3iswap (40MB) store-twice test

fields  lines   speedup us(sec) them    avg-len-index   size(M) us(swp) them
3	100000	1.50	4	6	47.0232	47.0232	  9		
3	200000	1.25	8	10	47.0329	47.0329	 19		
3	300000	1.15	13	15	47.0413	47.0413	 28		
3	400000	1.12	17	19	47.0402	47.0402	 38		
3	500000	1.14	21	24	47.0301	47.0301	 47		
3	600000	1.04	27	28	47.0172	47.0172	 56		
3	700000	1.10	30	33	47.0192	47.0192	 66		
3	800000	1.14	36	41	47.0081	47.0081	 75		
3	900000	1.13	39	44	46.9988	46.9988	 85		
6	100000	0.86	7	6	83.1966	83.1966	 17		
6	200000	1.15	13	15	83.2014	83.2014	 33		
6	300000	1.05	19	20	83.2083	83.2083	 50		
6	400000	1.12	25	28	83.2047	83.2047	 67		
6	500000	1.09	33	36	83.1972	83.1972	 83		
6	600000	1.07	40	43	83.1847	83.1847	100		
6	700000	1.09	47	51	83.1876	83.1876	116		
6	800000	1.09	55	60	83.1778	83.1778	133		
6	900000	1.07	71	76	83.1695	83.1695	150		
9	100000	1.11	9	10	132.393	132.393	 26		
9	200000	1.05	19	20	132.398	132.398	 53		
9	300000	1.08	26	28	132.349	132.349	 79		
9	400000	1.05	38	40	132.346	132.346	106		
9	500000	1.07	46	49	132.358	132.358	132		
9	600000	1.07	54	58	132.318	132.318	159		
9	700000	1.08	65	70	132.316	132.316	185		
9	800000	0.89	951	845	132.294	132.294	212		
9	900000	0.95	7838	7450	132.279	132.279	238		
12	100000	1.09	11	12	186.857	186.857	 37		
12	200000	1.09	22	24	186.824	186.824	 75		
12	300000	1.09	33	36	186.825	186.825	112		
12	400000	1.09	45	49	186.829	186.829	149		
12	500000	1.11	57	63	186.825	186.825	187		
12	600000	1.09	68	74	186.823	186.823	224		
12	700000	0.93	3880	3611	186.82	186.82	262		
12	800000	0.96	25276	24254	186.81	186.81	299		
12	900000	1.00	46945	46726	186.801	186.801	336		
15	100000	1.08	13	14	209.739	209.739	 42		
15	200000	1.12	25	28	209.647	209.647	 84		
15	300000	1.08	37	40	209.607	209.607	126		
15	400000	1.10	51	56	209.613	209.613	168		
15	500000	1.09	64	70	209.6	209.6	210		
15	600000	1.00	136	136	209.545	209.545	251		
15	700000	1.00	7574	7590	209.546	209.546	293		
15	800000	0.99	38517	37968	209.506	209.506	335		
15	900000	0.98	71315	69843	209.48	209.48	377		

f	100k	200k	300k	400k	500k	600k	700k	800k	900k
3	1.50	1.25	1.15	1.12	1.14	1.04	1.10	1.14	1.13
6	0.86	1.15	1.05	1.12	1.09	1.07	1.09	1.09	1.07
9	1.11	1.05	1.08	1.05	1.07	1.07	1.08	0.89	0.95
12	1.09	1.09	1.09	1.09	1.11	1.09	0.93	0.96	1.00
15	1.08	1.12	1.08	1.10	1.09	1.00	1.00	0.99	0.98

SUN ULTRA 10/333MHz-2ML2/128MB/SCSI store-twice test




ALPHA 164/533MHz-2ML2/256MB/SCSI-UW2 (60MB?) store-twice test

fields  lines   speedup us(sec) them    avg-len-index   size(M) us(swp) them
3	100000	1.12	8	9	47.0232	47.0232	  9	10848	11760
3	200000	1.13	15	17	47.0329	47.0329	 19	10816	11752
3	300000	1.23	22	27	47.0413	47.0413	 28	10816	11640
3	400000	1.13	30	34	47.0402	47.0402	 38	10816	11616
3	500000	1.02	40	41	47.0301	47.0301	 47	10816	11616
3	600000	13.60	47	639	47.0172	47.0172	 56	10952	38696
3	700000	77.29	58	4483	47.0192	47.0192	 66	10952	81128
3	800000	185.72	65	12072	47.0081	47.0081	 75	10832	123304
3	900000	390.79	87	33999	46.9988	46.9988	 85	10816	172480
6	100000	1.27	11	14	83.1966	83.1966	 17	10816	80
6	200000	1.16	25	29	83.2014	83.2014	 33	10816	80
6	300000	9.76	33	322	83.2083	83.2083	 50	10672	24072
6	400000	162.61	44	7155	83.2047	83.2047	 67	10672	106480
6	500000	432.53	58	25087	83.1972	83.1972	 83	10672	189600
6	600000	43.08	1251	53896	83.1847	83.1847	100	46768	276672
6	700000	.	5625		83.1876		116	90048	
6	800000	.	14346		83.1778		133	139088	
6	900000	.	45621		83.1695		150	179008	
9	100000	1.13	15	17	132.393	132.393	 26	10672	7568
9	200000	1.13	31	35	132.398	132.398	 53	10672	7584
9	300000	204.08	50	10204	132.349	132.349	 79	10672	119192
9	400000	373.78	86	32145	132.346	132.346	106	15520	230848
9	500000	49.99	1986	99274	132.358	132.358	132	74624	354976
9	600000	.	8526		132.318		159	133792	
9	700000	.	20386		132.316		185	193296	
9	800000	.	65654		132.294		212	253720	
9	900000	.	74223		132.279		238	312296	
12	100000	1.05	19	20	186.857	186.857	 37	11896	11640
12	200000	45.85	39	1788	186.824	186.824	 75	11872	59640
12	300000	305.82	57	17432	186.825	186.825	112	11848	203760
12	400000	26.08	3141	81929	186.829	186.829	149	61360	348624
12	500000	.	10194		186.825		187	132096	
15	100000	1.14	21	24	209.739	209.739	 42	11864	11784
15	200000	102.09	45	4594	209.647	209.647	 84	11864	93840
15	300000	461.33	64	29525	209.607	209.607	126	11848	261064
15	400000	.	5513		209.613		168	80280	

f	100k	200k	300k	400k	500k	600k	700k	800k	900k
3	1.12	1.13	1.23	1.13	1.02	13.60	77.29	185.72	390.79
6	1.27	1.16	9.76	162.61	432.53	43.08	.	.	.
9	1.13	1.13	204.08	373.78	49.99	.	.	.	.
12	1.05	45.85	305.82	26.08	.	 	 	 	 
15	1.14	102.09	461.33	.	 	 	 	 	 

DUAL PPRO 2x180MHz-256kL2/144MB/SCSI-W store-twice test

fields  lines   speedup us(sec) them    avg-len-index   size(M) us(swp) them
1	100000	1.09	11	12	41.5803	41.5803	  8	10256	10944
1	200000	1.00	20	20	41.5843	41.5843	 17	10256	10944
1	300000	1.04	28	29	41.5941	41.5941	 25	10488	10944
1	400000	1.03	36	37	41.5941	41.5941	 33	11108	10944
1	500000	1.04	46	48	41.5854	41.5854	 42	12424	10864
1	600000	1.06	72	76	41.5694	41.5694	 50	6052	324
1	700000	1.06	84	89	41.571	41.571	 58	6452	1248
1	800000	1.06	96	102	41.5601	41.5601	 66	7064	1784
1	900000	1.06	108	115	41.5516	41.5516	 75	7564	1604
3	100000	1.06	16	17	47.0232	47.0232	  9	7752	10868
3	200000	1.07	28	30	47.0329	47.0329	 19	7752	10868
3	300000	1.07	40	43	47.0413	47.0413	 28	7752	10832
3	400000	1.04	56	58	47.0402	47.0402	 38	8144	10740
3	500000	1.03	75	77	47.0301	47.0301	 47	9908	10960
3	600000	1.06	98	104	47.0172	47.0172	 56	8108	9984
3	700000	1.10	113	124	47.0192	47.0192	 66	9948	13108
3	800000	1.14	173	197	47.0081	47.0081	 75	22884	24276
3	900000	1.87	1557	2909	46.9988	46.9988	 85	81880	63360
4	100000	1.07	15	16	69.7447	69.7447	 14	11020	10752
4	200000	1.03	33	34	69.7518	69.7518	 28	10860	10700
4	300000	1.08	50	54	69.7606	69.7606	 42	11768	10864
4	400000	1.07	73	78	69.7576	69.7576	 56	13376	11332
4	500000	1.41	1924	2716	69.7491	69.7491	 70	51496	51688
4	600000	1.20	11866	14191	69.736	69.736	 84	143188	92348
4	700000	1.74	25044	43629	69.739	69.739	 98	176480	130388
4	800000	.	46327		69.7291		112	204740	
6	100000	1.09	23	25	83.1966	83.1966	 17	9908	10720
6	200000	1.07	44	47	83.2014	83.2014	 33	10032	11000
6	300000	1.10	62	68	83.2083	83.2083	 50	10916	11192
6	400000	2.52	1205	3041	83.2047	83.2047	 67	94172	75972
6	500000	2.05	12630	25870	83.1972	83.1972	 83	131952	106656
6	600000	2.34	29506	69123	83.1847	83.1847	100	165676	148560
6	700000	2.43	57020	138341	83.1876	83.1876	116	203660	191480

f	100k	200k	300k	400k	500k	600k	700k	800k	900k
1	1.09	1.00	1.04	1.03	1.04	1.06	1.06	1.06	1.06
3	1.06	1.07	1.07	1.04	1.03	1.06	1.10	1.14	1.87
4	1.07	1.03	1.08	1.07	1.41	1.20	1.74	.	 
6	1.09	1.07	1.10	2.52	2.05	2.34	2.43	 	 

AMD K7	900MHz-256kL2/128MBPC100/IDE(UDMA33)x2iswap store-twice test

fields  lines   speedup us(sec) them    avg-len-index   size(M) us(swp) them
3	100000	0.03	298	9	47.0232	47.0232	  9	47268	9200
3	200000	0.00	6299	12	47.0329	47.0329	 19	212264	8984
3	300000	0.00	16511	15	47.0413	47.0413	 28	368832	8500

f	100k	200k	300k	400k	500k	600k	700k	800k	900k
3	0.03	0.00	0.00	 	 	 	 	 	 


To: khk1@express.cec.wustl.edu, mwa@cs, mlp2@cec
Subject: another good number...
Cc: arnold@skeeve.com


well, alphadog finally choked on the benchmark... we finished in under
2 hrs:  5625sec to do 700,000 lines of 83 chars (on avg), swapping 90M
past 256M real.  the old gawk churned my poor disks nonstop since March
6, finally reaching the 512M swap limit.

but 2hrs vs. 2weeks is a big big deal. 

	(* esp since this is mechanical work -- disk -- not just
	cpu heat)

here are the numbers i have so far (this is the long-index benchmark,
where we just read in lines and store them twice as array indexes):

we should start collecting them for our report -- i'll be filling in 
all of the trivial computation numbers just so we can get as complete
a table as possible.

oh yeah -- please don't do anything on alphadog or bdog unless the load
is 0.00.

		ALPHA/533-256M/SCSIUW2 BENCHMARKS 


(times are in seconds)

		MEDIUM INDEXES

ours	normal	   speedup
47	639	   13.60x
avglength=47.0172
count=600000

58	4483	   77.29x
avglength=47.0192
count=700000

65	12072	  185.72x
avglength=47.0081
count=800000

87	33999	  390.79x
avglength=46.9988
count=900000



1251	53896	   43.08x
avglength=83.1847
count=600000

5625	dies(2wk)	>144x (job incomplete)
avglength=83.1876
count=700000


		LONG INDEXES

ours	normal	speedup
19	20	    1.05x
avglength=186.857
count=100000

39	1788	   45.85x
avglength=186.824
count=200000

57	17432	  305.82x
avglength=186.825
count=300000

3141	81929	   26.08x
avglength=186.829
count=400000



21	24	    1.14x
avglength=209.739
count=100000

45	4594	  102.09x
avglength=209.647
count=200000

64	29525	  461.33x
avglength=209.607
count=300000

5513	dies(27h8m)	>18x (job incomplete)
avglength=209.613
count=400000

		SMALLER JOBS


ours	normal	    speedup
8	9	    1.12x
avglength=47.0232
count=100000

15	17	    1.13x
avglength=47.0329
count=200000

22	27	    1.23x
avglength=47.0413
count=300000

30	34	    1.13x
avglength=47.0402
count=400000

40	41	    1.02x
avglength=47.0301
count=500000

11	12	    1.09x
avglength=83.1966
count=100000

25	27	    1.08x
avglength=83.2014
count=200000

33	283	    8.58x
avglength=83.2083
count=300000

44	6737	  153.11x
avglength=83.2047
count=400000


		PPRO-180xDUAL-144M/IDE33

ours    normal  speedup
72      76          1.06x
avglength=41.5694
count=600000

84      89          1.06x
avglength=41.571
count=700000

96      102         1.06x
avglength=41.5601
count=800000

108     115         1.06x
avglength=41.5516
count=900000

11866   14191       1.20x
avglength=69.736
count=600000

25044   43629       1.74x
avglength=69.739
count=700000

46327           x
avglength=69.7291
count=800000



> you should talk to kevin krouse, khk1@cec.wustl.edu, to see what he
> did.  basically, he calls our version of malloc, mmmalloc from
> array.c.  in doug lea's new release of malloc, he names it
> independent-comalloc.  you basically send an array of sizes to malloc,
> rather than a single size.  it tries to return a location that fits the
> sum of the sizes, and allocates them individually so that they can be
> freed separately.
>
> the idea is to localize dynamically allocated objects (we see 10x in
> c++ hash tables), to prevent chasing pointers on disk when you to go
> virtual memory.  we also sort the entries, which gives 2x when the hash
> chain is fully traversed (but that's just 1% to 10% of the time).  and
> we put the data with the index (this can also help avoid paging).  more
> importantly, on the alpha, we get much smaller memory footprints, like
> 30% smaller, instead of 1% smaller on pentiums.  we are not sure why,
> but this often causes speedups of 50x-450x when our program stays in
> memory and the normal version starts paging heavily.
From loui@cs.wustl.edu Sat Apr  7 00:21 CDT 2001
Received: from siesta.cs.wustl.edu (siesta.cs.wustl.edu [128.252.165.3])
	by taumsauk.cs.wustl.edu (8.9.1/8.9.1) with ESMTP id AAA14027;
	Sat, 7 Apr 2001 00:21:52 -0500 (CDT)
From: "R. Prescott Loui" 
Received: (from loui@localhost)
	by siesta.cs.wustl.edu (8.9.1/8.9.1) id AAA16042;
	Sat, 7 Apr 2001 00:21:52 -0500 (CDT)
Date: Sat, 7 Apr 2001 00:21:52 -0500 (CDT)
Message-Id: <200104070521.AAA16042@siesta.cs.wustl.edu>
To: adl4@cec.wustl.edu, khk1@cec.wustl.edu, mdeters@cs.wustl.edu,
        mlp2@cec.wustl.edu, mwa@cs.wustl.edu, plezbert@cs.wustl.edu
Subject: the alphadog vs. k9 numbers
Cc: loui@cs.wustl.edu
Content-Type: text
Content-Length: 2520


actually, i am no longer concerned that the alpha speedups are out-of-line
due to our failure to upgrade the old (4.2) release.  if you look at the
source of the speedup, it is because the footprint is much smaller than
any of the other machines.  the checksum (avg length) shows that the
right computation is really being done.  it's just that the alpha tends
to have 50%-75% of the memory use.  why?  i dunno.  but on three
different machines, i did the 6x600000 test:

http://www.cs.wustl.edu/~loui/gawkreport.html


fields  lines   speedup us(sec) them    avg-len-index   size(M) us(swp) them

k7/900MHz/128MB-133/UDMA33:
6       600000  1.00    33674   33771   83.1847 83.1847 100     172300  170472

ppro/2x180MHz/144MB-33/SCSI-W:
6       600000  2.34    29506   69123   83.1847 83.1847 100     165676  148560

alpha164/533MHz/256MB-66/SCSI-UW:
6       600000  43.08   1251    53896   83.1847 83.1847 100     46768   276672

note:  the speeds are ballpark the same, within a factor of 2.5x, except
	for the time when (1251sec) most of the job fit in memory.

note:  most of the swap use is the same, within a factor of 2x, except for
	the very impressive compaction (46768MB), a 1/3 use compared to
	"nominal".

note:  on the alpha, normal gawk took almost as long as on the inferior 
	machine and used much more swap.  on the ppro2x180, our version
	was faster than even the mighty k7/900.  it proves that we are
	very vm-bound, and even the 8% smaler footprint (plus locality,
	comparing 165676 to 172300) makes a 15% speedup (comparing
	33674 to 29506 sec).

so why no speedup on k9?  still no idea.  i think we should have adl4
look at mlp2's code to see if he used alpha-specific optimizations!!!

i can't wait to get the pIIIeb2x1000MHz/1GB/UDMA66 machine munching on
these jobs.  i really just want to see if it also has the k9 non-speedup
problem.  

i wonder if khk1 could take a look at k9/~loui/gawk/gawk* to see if i
did the make correctly.  perhaps we did something wrong to be no faster
on the k9 than the ppro.  if you look at the (33771 vs 69123), normal
gawk seems to be speeding along twice as fast.  we know since the job is
disk bound that this can't be because of the processor.  it isn't because
of the footprint either.  and they both have very recent redhat linux.
so it isn't the speculative prefetching that the job candidate thought
might be responsible.  i wonder if it is because of the big 2MB disk cache.
perhaps i should swap a small-cache IDE disk into k9 and rerun the test.