Fall '09 STA
3163
Statistical
Methods I
Text: 1) Applied Regression Analysis and Other Multivariable
Methods, by Kleinbaum, D. G., Kupper, L. L.,Nizam, A.and Muller, K. E., 4th Edition, Duxbury Press
2) SAS / STAT User's Guide, Vol 1 and 2. (4th Edition)
Instructor : Dr. Sen
Office No. : 14/2706
Phone No. : 620 - 3724
Office Hr. : 3 - 4 P.M. on Mondays and Wednesdays, 2:30- #:30 P.M. on
Tuesdays, Thursdays and by appointments
Week Subject
Material
1 - 3 Review:
Chapters 1 - 3, Introduction to SAS,
4 - 9 Regression
Analysis: Linear Regression, Prediction: Chapters 4 - 7.
10 - 15
Analysis of Variance. One -way, two-way classification, Multiple
comparison
methods, Analysis for some standard designs: Chapters 17- 19.
Grading: Final grades will be computed based on a number of assignments,
a midterm and a final exam. The assignments will be graded mainly
on interpretation of results. For most assignments you will use
SAS. So you must get acquainted with it by the end of the 3rd
week. All assignments must be typed on regular papers, you may
use cut and paste to attach computer output. I do not need to
see your command statements but need to see the results from the
output. Output files should only include the pertinent information
related to the question. I will not accept any late assignment.
If you have trouble running your program you should contact me
or ask for help. 45% of your grade will come from assignments,
25% from midterm and the rest 30% from the final exam. The exams
are open books and open notes.
Midterm: October
22 (Thursday)
Last day of classes: December
4 (Friday)
Final Exam: December
08 (Tuesday) 3:00 - 4:50 P.M.
Holidays: Sept.
7 (Mon.), Nov. 11 (Wed.), Nov. 26-28 (Thur. - Sat.)
Last Day to Withdraw: November 6 (Friday),
2009
Dr. Sen Assignment
# 1
Fall '09 STA
3163 Due
on 09/03/09
Total: 20 points
For each of the problems show all work. To show the probability
shade the region in question. For a confidence interval state
the formula and for a testing problem write the proper steps.
You may use a calculator but you do not need any computer for
this assignment.
From chapter 3 # 3, 4, 5, 6, 9, 10.
Dr. Sen Assignment
# 2
Fall '09 STA
3163 Due
on 09/10/09
Total 50 points
Read Chapters 1 - 3 from text. Then do the following problems.
1. Consider only the SPB( systolic blood pressure) data given
in #2 on page 65.
a) Make class intervals of width 5.
b) Draw a relative frequency histogram for the classes.
c) Draw a cumulative relative frequency polygon for the data.
d) Calculate the mean and the standard deviation for the data.
Now let's consider the data above has a normal distribution with
a mean of 144 and a standard deviation of 14.
e) What proportion of the scores will exceed 177?
f) Less than 126?
g) Between 126 and 168?
h) What is the percentile rank for the score less than 179?
2. If we consider the data given in #1 is a population and we
want to take a sample of size 5 find the sampling distribution
of the sample mean.
3. Problem # 3.11, 3.13, 3.14, and 3.15 from text, page 32 &33 .
Dr. Sen Assignment
# 3
Fall '09 STA
3163 Due
on 09/17/09
Total 30 points
1. The lifetime (in years) of ten automobile batteries of a certain
brand are
2.4 1.9 2.0 2.1 1.8 2.3 2.1 2.3 1.7 2.0
a. Calculate the mean and the standard deviation for the data.
b. Estimate the mean lifetime, using a 95% confidence interval.
c. If the manufacturer of the batteries guarantees the average
lifetime of the batteries to be at least 2.0 years, do the data
support their claim at 1% significant level?
d. What is the p-value of the test?
2. A pollution -control inspector suspected that a riverside
community was releasing semi treated sewage into a river and this,
as a consequence, was changing the level of dissolved oxygen of
the river. To check this, he drew 5 randomly selected specimens
of river water at a location above the town and another 5 specimens
at a location below the town. The dissolved oxygen readings, in
parts million, are given in the accompanying table. Do the data
provide sufficient evidence to indicate a difference in mean oxygen
content between locations above and below the town at a 5% significance
level?
Above town | 4.8 5.2 5.0
4.9 5.1
___________ | _________________________________
Below town | 5.0 4.7 4.9
4.8 4.9
Find the p-value of the test.
3. Consider the data given in problem # 5.11 p. 79 in your text.
Plot the data using SAS for Gas type. Follow the instruction below.
Data two;
Input gas $ rate weight;
Cards;
a 3.85 4.0
b.............
proc plot;
plot rate*gas = '* ' weight*gas = '-' / overlay;
run;
data;
run;
These commands will give you a nice graph on a single page.
Dr. Sen Assignment
# 4
Fall '09 STA
3163 Due
on 09/24/09
Total 30 points
1. A regional IRS auditor ran a test on a sample of returns filed
by March 15 to determine whether the average refund for taxpayers
is larger this year than last year. Sample data are shown here
for a random sample of 100 returns for each year.
Last
year This
year
_________________________________________________
Mean
320 410
Variance 300 350
Sample size 100 100
_________________________________________________
Do the data support the claim at 5% significance level?
2. As part of a detailed driver-training program, school officials
are requiring teen-agers to take a depth-perception test. In one
phase of this test, the student is asked to judge the distance
between a parked vehicle and a pedestrian stationed a given distance
from the student. The recorded distances in feet are listed below
for 15 driver-education students.
5 8 7 7 10 6
4 11
6 8 4 9 9
6 5
Use these data to construct a 99% confidence interval for the
variance of the depth-perception distances.
3. A chemist at an iron ore mine suspects that the variance in
the amount (weight, in ounces) of iron oxide per pound of ore
tends to increase as the mean amount of iron oxide per pound increases.
To test this theory, ten one-pound specimens of iron ore are selected
at each of two locations, one, location 1, containing a much higher
mean content of iron oxide than the other, location 2. The amounts
of iron oxide contained in the ore specimens are shown in the
accompanying table.
Location 1 | 8.1 7.4 9.3 7.5
7.1 8.7 9.1 7.9
8.4 8.8
______________________________________________________________________
Location 2 | 3.9 4.4 4.7 3.6
4.1 3.9 4.6 3.5
4.0 4.2
(a) Test for the equality of the variances for the two locations
at 5% significance level.
(b) Do the data provide sufficient information to indicate that
the amount of iron oxide is higher at location 1 than at location
2. Use 1% significance level.
(c) What is the p-value for the test?
Dr. Sen Assignment
# 5
Fall '09 STA
3163 Due
on 10/01/09
Total 50 points
1. Consider the data in problem 5.2 on page 65 from text.
i) Use computer to do the following problems. Just cut out the
results from the computer printout and paste with your answers.
ii) Show all steps of your work. Only answers will not be graded.
iii) All papers must be stapled together.
iv) Type neatly with spacing between the problems.
v) For testing you must write the following steps
a) Statement of your hypotheses
b) Test Statistics
c) Critical Point(s) and Decision Rules
d) Calculations and a Conclusions
Questions:
1. Find the means and standard deviations for the three variables
SBP, QUET, AGE for smoking and nonsmoking group separately.
2. Estimate with 99% confidence intervals the average differences
between the smokers and the nonsmokers for the three variables
separately.
3. Test for no differences in the means for three variables for
smoker vs. nonsmoker. You may use 5% significant level. What do
you conclude?
4. Report the p-values for the above three tests.
5. Answer 5.2 (b) 1 - 7 from the text.
Dr. Sen Assignment
# 6
Fall '09 STA
3163 Due
on 10/08/09
Total points: 40 points
1. Consider the data for 5.5 on page 70, run the linear regression
model.
a) Use Plot statement in SAS and use overlay' to put the estimated
graph on the scatter plot.
b) Plot residuals vs. x and look for any violation of assumptions.
c) For the linear regression model answer c - e.
For c - d write the steps of testing. Explain your conclusion.
2. Use SAS to do problem # 5.8. Consult the class example for
your computer program. Using the results from your output to do
the followings:
a) Write the equation of the estimated regression line.
b) Test for the slope parameter equal to zero.
c) Construct a 90% confidence interval for the intercept parameter.
d - f ) answer the questions from the book.
g) Plot the predicted line with the observations on the same graph.
What do you find? Do you have a linear or a nonlinear line?
h) Plot residuals vs. x and look for any violation of assumptions.
Dr. Sen Assignment
# 7
Fall '09 STA
3163 Due
on 10/15/09
Total 40 points.
This homework set is a very good indicator for what to expect
in the test. You should
pay extra attention and take your time to do the assignment.
Use computer as much as possible to do the assigned problems
1. Problem # 5.6 (page 71) (a - f) from the text. For your graphs you may
Plot the observed, the predicted, the upper & the lower 95%
confidence bands against the independent variable on the same
page.
Plot the residuals separately.
Comment on the fit of your model. Is it a good fit? If not what
seems to be the
problem?
2 . Refer to problem # 6.7-12 on page 105. Use only the data in
#5.6
(a) - (c) same as in the text
(d) Based on your calculations in part (c) can you make an inference
on true correlation coefficient
being -.9 or not? Keep the significance levels at 5%.
3. Chapter 7 #11 page 112.
Dr. Sen Review
Sheet for Midterm (Oct 22)
Fall '09 STA
3163
10/15/09
1. Sixteen batches of the plastic were made, and from each batch
one test item was molded. Each test item was randomly assigned
to one of the four predetermined time levels, and the hardness
was measured after the assigned elapsed time. The results are
shown below: X is the elapsed time in hours, and Y hardness in
Brinell units.
i: 1 2 3
4 5 6
7 8
9 10
11 12 13
14 15 16
___________________________________________________________________
X: 16 16 16 16 24
24 24 24
32 32 32
32 40 40
40 40
Y: 199 205 196 200 218 220 215 223 237
234 235 230 250
248 253 246
___________________________________________________________________
a. Obtain the estimated regression function. Plot the estimated
regression function and the data. Does a linear function appear
to give a good fit here?
b. Obtain a point estimate of the mean hardness when X = 40 hours
c. Obtain a point estimate of the change in mean hardness when
X increases by one hour. Also construct a 99% confidence interval.
Interpret your result.
d. Conduct a test to determine whether or not there is a linear
association between X and Y here. State the alternatives, a decision
rule and conclusion. Use significance level at .10. What is the
p-value for your test?
e. The plastic manufacturer has stated that the mean hardness
should increase by 2 Brinell units per hour. Conduct a two-sided
test to decide whether this standard is being satisfied; use .01
significance level. State the alternatives, decision rule, and
conclusion. What is the p-value of the test?
f. Obtain a 98% prediction interval for the hardness of a newly
modeled test item with an elapsed time of 30 hours.
Dr. Sen Assignment
# 8
Fall '09 STA
3163 Due
on 10/29/09
Total 40 points
1. Do problem # 17.1 (a - d). For part (a) use computer to find
the mean and the standard deviation for data for each factor level.
1 (e). Assuming equal variances construct 99% confidence intervals
for the differences between pairs of means. Identify any different
means from your confidence intervals.
2. Thirty trainees are randomly divided into three groups of 10
and each is given instruction in the use of a different word processing
system. At the end of the training period, each trainee is given
the same "benchmark" word-processing project to complete
and the time required for completion is recorded. ANOVA model
will be used to test whether or not the mean time is the same
for the three systems. The data is given below.
Trainee
Instruction | 1
2 3 4
5
6 7
8 9 10
_________________________________________________________________
I
| 23
25 21 22
21 22 20
23 19
22
II |
28 27
27 29
26 29 27
30 28
27
III
| 23 20
25 21
22 23 21
20 19 20
_________________________________________________________________
a) Identify the 'dependent variable', 'factor studied' and the
'factor levels'.
b) Is the factor a random or a fixed factor? Would you answer
differently if each trainee had been allowed to select the word
processing system of his or her choice?
c) Obtain the analysis of variance table.
d) Conduct the F test for equality of factor level means at .01
significance level. State the alternatives, decision rule, and
conclusion. What is the p-value of the test?
Dr. Sen Assignment
# 9
Fall '09 STA
3163 Due
on 11/05/09
Total 30 points
1. Do problem #'s 17.8 and 17.14 from the text.
2. An experiment was conducted to test the effects of five different
diets in turkeys. Six turkeys were randomly assigned to each of
the five diet groups and were fed for a fixed period of time.
__________________________________________________________
Group Weight
Gained (pounds)
__________________________________________________________
Control diet 4.1,
3.3, 3.1,
4.2, 3.6,
4.4
Control diet + additive A 5.2, 4.8,
4.5, 6.8,
5.5, 6.2
Control diet + additive B 6.3, 6.5,
7.2, 7.4,
7.8, 6.7
Control diet + additive C 6.5, 6.8,
7.3, 7.5,
6.9, 7.0
Control diet + additive D 9.5,
9.6, 9.2, 9.1,
9.8, 9.1
___________________________________________________________
a) Assuming that the five groups were comparable with respect
to initial weights of the turkey's, can you conclude that the
diets have the equal effects on the weight gain? Use .01 significance
level for test. Make sure that you use all four steps for the
test. What is the p-value?
b) Use an appropriate multiple comparison method to test which
treatment means are different. Use = .01.
Dr. Sen Assignment
# 10
Fall '09 STA
3163 Due
on 11/12/09
Total 40 points
1. Do problem #'s 17.10 and 17.13 from the text.
Test Statistics for (g) is MS(Lack of Fit)/ MSE.
Dr. Sen Assignment
# 11
Fall '09 STA
3163 Due
on 11/19/09
Total 40 points
1. Do problems # 18.7 and 18.9 from the text.
2. Fat in diets. A researcher studied the effects of three experimental
diets with varying fat contents on the total lipid (fat) level
in plasma. Total lipid level is widely used predictor of coronary
heart disease. Fifteen male subjects who were within 20% of their
ideal body weight were grouped into five blocks according to age.
Within each block, the three experimental diets were randomly
assigned to the three subjects. Data on reduction in lipid level
(in grams per liter) after the subjects were on the diet for a
fixed period of time follow.
Fat content of diet
__________________________________________________________
Block (i)
Extremely low Fairly
low Moderately
low
Age j
= 1
j = 2
j = 3
___________________________________________________________
15-24
.73
.67
.15
25-34
.86
.75
.21
35-44 .94
.81 .26
45-54 1.40
1.32 .75
55-64 1.62
1.41
.78
___________________________________________________________
a. Why do you think that age of subject was used as a blocking
variable?
b. Obtain the ANOVA table. Also get the means by the fat contents
of the diet and by the age groups.
c. Test whether or not the mean reductions in lipid level differ
from the three diets; use .05 significance level. Write alternative
hypothesis, a decision rule and conclusion. What is the p-value
of the test?
d. Test whether or not the blocking effects are present at .05
significance level. Also write Ha, a decision rule and conclusion.
e. Estimate the model parameters.
f. Use a multiple comparison method to detect which treatment
means are different. The best method will have the lowest cut-off
points. Compare LSD, Tukey, and Scheffe cut-off points and use
the one with the smallest value.
Final Exam December 8, 3:00 - 4:50 P.M.
|