Fall '09                                        STA 3163
                                               Statistical Methods I

Text: 1) Applied Regression Analysis and Other Multivariable Methods, by Kleinbaum, D. G., Kupper, L. L.,Nizam, A.and Muller, K. E., 4th Edition, Duxbury Press
2) SAS / STAT User's Guide, Vol 1 and 2. (4th Edition)

Instructor : Dr. Sen
Office No. : 14/2706
Phone No. : 620 - 3724
Office Hr. : 3 - 4 P.M. on Mondays and Wednesdays, 2:30- #:30 P.M. on Tuesdays, Thursdays and by appointments


Week                                Subject Material
1 - 3                     Review: Chapters 1 - 3, Introduction to SAS,
4 - 9                     Regression Analysis: Linear Regression, Prediction: Chapters 4 - 7.
10 - 15                 Analysis of Variance. One -way, two-way classification, Multiple comparison                             methods, Analysis for some standard designs: Chapters 17- 19.


Grading: Final grades will be computed based on a number of assignments, a midterm and a final exam. The assignments will be graded mainly on interpretation of results. For most assignments you will use SAS. So you must get acquainted with it by the end of the 3rd week. All assignments must be typed on regular papers, you may use cut and paste to attach computer output. I do not need to see your command statements but need to see the results from the output. Output files should only include the pertinent information related to the question. I will not accept any late assignment. If you have trouble running your program you should contact me or ask for help. 45% of your grade will come from assignments, 25% from midterm and the rest 30% from the final exam. The exams are open books and open notes.

Midterm:                         October 22 (Thursday)

Last day of classes:          December 4 (Friday)

Final Exam:                     December 08 (Tuesday) 3:00 - 4:50 P.M.

Holidays:                        Sept. 7 (Mon.), Nov. 11 (Wed.), Nov. 26-28 (Thur. - Sat.)

Last Day to Withdraw:    November 6 (Friday), 2009


Dr. Sen                                  Assignment # 1
Fall '09                                   STA 3163                                                                Due on 09/03/09
Total: 20 points

For each of the problems show all work. To show the probability shade the region in question. For a confidence interval state the formula and for a testing problem write the proper steps. You may use a calculator but you do not need any computer for this assignment.


From chapter 3 # 3, 4, 5, 6, 9, 10.

 


Dr. Sen                                  Assignment # 2
Fall '09                                   STA 3163                                                                Due on 09/10/09
Total 50 points

Read Chapters 1 - 3 from text. Then do the following problems.

1. Consider only the SPB( systolic blood pressure) data given in #2 on page 65.
a) Make class intervals of width 5.
b) Draw a relative frequency histogram for the classes.
c) Draw a cumulative relative frequency polygon for the data.
d) Calculate the mean and the standard deviation for the data.

Now let's consider the data above has a normal distribution with a mean of 144 and a standard deviation of 14.
e) What proportion of the scores will exceed 177?
f) Less than 126?
g) Between 126 and 168?
h) What is the percentile rank for the score less than 179?

2. If we consider the data given in #1 is a population and we want to take a sample of size 5 find the sampling distribution of the sample mean.

3. Problem # 3.11, 3.13, 3.14, and 3.15 from text, page 32 &33 .


Dr. Sen                                  Assignment # 3
Fall '09                                   STA 3163                                                                Due on 09/17/09

Total 30 points

1. The lifetime (in years) of ten automobile batteries of a certain brand are
2.4 1.9 2.0 2.1 1.8 2.3 2.1 2.3 1.7 2.0

a. Calculate the mean and the standard deviation for the data.

b. Estimate the mean lifetime, using a 95% confidence interval.

c. If the manufacturer of the batteries guarantees the average lifetime of the batteries to be at least 2.0 years, do the data support their claim at 1% significant level?

d. What is the p-value of the test?

2. A pollution -control inspector suspected that a riverside community was releasing semi treated sewage into a river and this, as a consequence, was changing the level of dissolved oxygen of the river. To check this, he drew 5 randomly selected specimens of river water at a location above the town and another 5 specimens at a location below the town. The dissolved oxygen readings, in parts million, are given in the accompanying table. Do the data provide sufficient evidence to indicate a difference in mean oxygen content between locations above and below the town at a 5% significance level?

Above town    | 4.8   5.2   5.0   4.9   5.1
___________ | _________________________________
Below town     | 5.0   4.7   4.9   4.8   4.9

Find the p-value of the test.

3. Consider the data given in problem # 5.11 p. 79 in your text.
Plot the data using SAS for Gas type. Follow the instruction below.

Data two;
Input gas $ rate weight;
Cards;
a 3.85 4.0
b.............

proc plot;
plot rate*gas = '* ' weight*gas = '-' / overlay;
run;
data;
run;
These commands will give you a nice graph on a single page.


Dr. Sen                                  Assignment # 4
Fall '09                                   STA 3163                                                                Due on 09/24/09

Total 30 points

1. A regional IRS auditor ran a test on a sample of returns filed by March 15 to determine whether the average refund for taxpayers is larger this year than last year. Sample data are shown here for a random sample of 100 returns for each year.
                   Last year                        This year
_________________________________________________
Mean              320                                410
Variance         300                                350
Sample size     100                                100
_________________________________________________

Do the data support the claim at 5% significance level?

2. As part of a detailed driver-training program, school officials are requiring teen-agers to take a depth-perception test. In one phase of this test, the student is asked to judge the distance between a parked vehicle and a pedestrian stationed a given distance from the student. The recorded distances in feet are listed below for 15 driver-education students.

5   8   7   7   10   6   4   11
6   8   4   9    9    6   5

Use these data to construct a 99% confidence interval for the variance of the depth-perception distances.

3. A chemist at an iron ore mine suspects that the variance in the amount (weight, in ounces) of iron oxide per pound of ore tends to increase as the mean amount of iron oxide per pound increases. To test this theory, ten one-pound specimens of iron ore are selected at each of two locations, one, location 1, containing a much higher mean content of iron oxide than the other, location 2. The amounts of iron oxide contained in the ore specimens are shown in the accompanying table.

Location 1 | 8.1   7.4   9.3   7.5   7.1   8.7   9.1   7.9   8.4   8.8
______________________________________________________________________
Location 2 | 3.9   4.4   4.7   3.6   4.1   3.9   4.6   3.5   4.0   4.2

(a) Test for the equality of the variances for the two locations at 5% significance level.

(b) Do the data provide sufficient information to indicate that the amount of iron oxide is higher at location 1 than at location 2. Use 1% significance level.

(c) What is the p-value for the test?                                          


Dr. Sen                                  Assignment # 5
Fall '09                                   STA 3163                                                                Due on 10/01/09

Total 50 points

1. Consider the data in problem 5.2 on page 65 from text.

i) Use computer to do the following problems. Just cut out the results from the computer printout and paste with your answers.

ii) Show all steps of your work. Only answers will not be graded.

iii) All papers must be stapled together.

iv) Type neatly with spacing between the problems.

v) For testing you must write the following steps
a) Statement of your hypotheses
b) Test Statistics
c) Critical Point(s) and Decision Rules
d) Calculations and a Conclusions


Questions:

1. Find the means and standard deviations for the three variables SBP, QUET, AGE for smoking and nonsmoking group separately.

2. Estimate with 99% confidence intervals the average differences between the smokers and the nonsmokers for the three variables separately.

3. Test for no differences in the means for three variables for smoker vs. nonsmoker. You may use 5% significant level. What do you conclude?

4. Report the p-values for the above three tests.

5. Answer 5.2 (b) 1 - 7 from the text.


Dr. Sen                                  Assignment # 6
Fall '09                                   STA 3163                                                                Due on 10/08/09

Total points: 40 points

1. Consider the data for 5.5 on page 70, run the linear regression model.

a) Use Plot statement in SAS and use ‘overlay' to put the estimated graph on the scatter plot.

b) Plot residuals vs. x and look for any violation of assumptions.

c) For the linear regression model answer c - e.

For c - d write the steps of testing. Explain your conclusion.

2. Use SAS to do problem # 5.8. Consult the class example for your computer program. Using the results from your output to do the followings:

a) Write the equation of the estimated regression line.
b) Test for the slope parameter equal to zero.
c) Construct a 90% confidence interval for the intercept parameter.
d - f ) answer the questions from the book.
g) Plot the predicted line with the observations on the same graph. What do you find? Do you have a linear or a nonlinear line?
h) Plot residuals vs. x and look for any violation of assumptions.


Dr. Sen                                  Assignment # 7
Fall '09                                   STA 3163                                                                Due on 10/15/09

Total 40 points.

This homework set is a very good indicator for what to expect in the test. You should
pay extra attention and take your time to do the assignment.

Use computer as much as possible to do the assigned problems

1. Problem # 5.6 (page 71) (a - f) from the text. For your graphs you may

Plot the observed, the predicted, the upper & the lower 95% confidence bands against the independent variable on the same page.

Plot the residuals separately.

Comment on the fit of your model. Is it a good fit? If not what seems to be the
problem?


2 . Refer to problem # 6.7-12 on page 105. Use only the data in #5.6

(a) - (c) same as in the text

(d) Based on your calculations in part (c) can you make an inference on true correlation coefficient
being -.9 or not? Keep the significance levels at 5%.

3. Chapter 7 #11 page 112.


Dr. Sen                                  Review Sheet for Midterm (Oct 22)
Fall '09                                   STA 3163                                                                 10/15/09

1. Sixteen batches of the plastic were made, and from each batch one test item was molded. Each test item was randomly assigned to one of the four predetermined time levels, and the hardness was measured after the assigned elapsed time. The results are shown below: X is the elapsed time in hours, and Y hardness in Brinell units.

i:     1    2     3     4       5      6     7       8      9      10      11     12      13    14    15   16
___________________________________________________________________

X:  16  16    16   16    24    24   24     24    32     32      32     32      40     40    40   40

Y: 199  205 196 200 218  220 215   223 237    234    235   230    250   248  253 246
___________________________________________________________________

a. Obtain the estimated regression function. Plot the estimated regression function and the data. Does a linear function appear to give a good fit here?

b. Obtain a point estimate of the mean hardness when X = 40 hours

c. Obtain a point estimate of the change in mean hardness when X increases by one hour. Also construct a 99% confidence interval. Interpret your result.

d. Conduct a test to determine whether or not there is a linear association between X and Y here. State the alternatives, a decision rule and conclusion. Use significance level at .10. What is the p-value for your test?

e. The plastic manufacturer has stated that the mean hardness should increase by 2 Brinell units per hour. Conduct a two-sided test to decide whether this standard is being satisfied; use .01 significance level. State the alternatives, decision rule, and conclusion. What is the p-value of the test?

f. Obtain a 98% prediction interval for the hardness of a newly modeled test item with an elapsed time of 30 hours.


Dr. Sen                                  Assignment # 8
Fall '09                                   STA 3163                                                                Due on 10/29/09

Total 40 points

1. Do problem # 17.1 (a - d). For part (a) use computer to find the mean and the standard deviation for data for each factor level.

1 (e). Assuming equal variances construct 99% confidence intervals for the differences between pairs of means. Identify any different means from your confidence intervals.


2. Thirty trainees are randomly divided into three groups of 10 and each is given instruction in the use of a different word processing system. At the end of the training period, each trainee is given the same "benchmark" word-processing project to complete and the time required for completion is recorded. ANOVA model will be used to test whether or not the mean time is the same for the three systems. The data is given below.

Trainee
Instruction |         1       2       3         4      5       6      7      8        9        10
_________________________________________________________________
I               |        23     25      21      22    21     22    20     23     19       22
II              |       28      27      27      29    26     29    27     30     28       27
III             |       23      20      25      21    22     23    21     20     19       20
_________________________________________________________________

a) Identify the 'dependent variable', 'factor studied' and the 'factor levels'.

b) Is the factor a random or a fixed factor? Would you answer differently if each trainee had been allowed to select the word processing system of his or her choice?

c) Obtain the analysis of variance table.

d) Conduct the F test for equality of factor level means at .01 significance level. State the alternatives, decision rule, and conclusion. What is the p-value of the test?


Dr. Sen                                  Assignment # 9
Fall '09                                   STA 3163                                                                Due on 11/05/09

Total 30 points

1. Do problem #'s 17.8 and 17.14 from the text.


2. An experiment was conducted to test the effects of five different diets in turkeys. Six turkeys were randomly assigned to each of the five diet groups and were fed for a fixed period of time.
__________________________________________________________
Group                             Weight Gained (pounds)
__________________________________________________________

Control diet                        4.1,     3.3,       3.1,        4.2,       3.6,       4.4
Control diet + additive A    5.2,     4.8,       4.5,        6.8,       5.5,       6.2
Control diet + additive B    6.3,     6.5,       7.2,         7.4,      7.8,       6.7
Control diet + additive C    6.5,     6.8,      7.3,          7.5,      6.9,       7.0
Control diet + additive D    9.5,     9.6,      9.2,          9.1,      9.8,       9.1
___________________________________________________________


a) Assuming that the five groups were comparable with respect to initial weights of the turkey's, can you conclude that the diets have the equal effects on the weight gain? Use .01 significance level for test. Make sure that you use all four steps for the test. What is the p-value?

b) Use an appropriate multiple comparison method to test which treatment means are different. Use = .01.


Dr. Sen                                  Assignment # 10
Fall '09                                   STA 3163                                                                Due on 11/12/09

Total 40 points

1. Do problem #'s 17.10 and 17.13 from the text.

Test Statistics for (g) is MS(Lack of Fit)/ MSE.


Dr. Sen                                  Assignment # 11
Fall '09                                   STA 3163                                                                Due on 11/19/09

Total 40 points

1. Do problems # 18.7 and 18.9 from the text.

2. Fat in diets. A researcher studied the effects of three experimental diets with varying fat contents on the total lipid (fat) level in plasma. Total lipid level is widely used predictor of coronary heart disease. Fifteen male subjects who were within 20% of their ideal body weight were grouped into five blocks according to age. Within each block, the three experimental diets were randomly assigned to the three subjects. Data on reduction in lipid level (in grams per liter) after the subjects were on the diet for a fixed period of time follow.
Fat content of diet
__________________________________________________________
Block (i)            Extremely low        Fairly low             Moderately low
Age                      j = 1                       j = 2                         j = 3
___________________________________________________________
15-24                   .73                         .67                          .15
25-34                   .86                         .75                          .21
35-44                   .94                         .81                          .26
45-54                 1.40                       1.32                          .75
55-64                 1.62                       1.41                          .78
___________________________________________________________

a. Why do you think that age of subject was used as a blocking variable?

b. Obtain the ANOVA table. Also get the means by the fat contents of the diet and by the age groups.

c. Test whether or not the mean reductions in lipid level differ from the three diets; use .05 significance level. Write alternative hypothesis, a decision rule and conclusion. What is the p-value of the test?

d. Test whether or not the blocking effects are present at .05 significance level. Also write Ha, a decision rule and conclusion.

e. Estimate the model parameters.

f. Use a multiple comparison method to detect which treatment means are different. The best method will have the lowest cut-off points. Compare LSD, Tukey, and Scheffe cut-off points and use the one with the smallest value.


Final Exam December 8, 3:00 - 4:50 P.M.

Link: Pali Sen's Home Page