Title: | Sampling: Design and Analysis |
---|---|
Description: | Functions and Datasets from Lohr, S. (1999), Sampling: Design and Analysis, Duxbury. |
Authors: | Tobias Verbeke |
Maintainer: | Tobias Verbeke <[email protected]> |
License: | GPL-3 |
Version: | 0.1-5 |
Built: | 2025-02-23 04:46:45 UTC |
Source: | https://github.com/cran/SDaA |
Data from the U.S. 1992 Census of Agriculture
agpop
agpop
Data frame with the following 15 variables:
county name
state abbreviation
number of acres devoted to farms, 1992
number of acres devoted to farms, 1987
number of acres devoted to farms, 1982
number of farms, 1992
number of farms, 1987
number of farms, 1982
number of farms with 1000 acres or more, 1992
number of farms with 1000 acres or more, 1987
number of farms with 1000 acres or more, 1982
number of farms with 9 acres or fewer, 1992
number of farms with 9 acres or fewer, 1987
number of farms with 9 acres or fewer, 1982
factor with levels S
(south), W
(west),
NC
(north central), NE
(northeast)
U.S. 1992 Census of Agriculture
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 437.
Data from a SRS of size 300 from the U.S. 1992 Census of Agriculture
agsrs
agsrs
Data frame with the following 14 variables:
county name
state abbreviation
number of acres devoted to farms, 1992
number of acres devoted to farms, 1987
number of acres devoted to farms, 1982
number of farms, 1992
number of farms, 1987
number of farms, 1982
number of farms with 1000 acres or more, 1992
number of farms with 1000 acres or more, 1987
number of farms with 1000 acres or more, 1982
number of farms with 9 acres or fewer, 1992
number of farms with 9 acres or fewer, 1987
number of farms with 9 acres or fewer, 1982
U.S. 1992 Census of Agriculture
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 437.
Data from a stratified random sample of size 300 from the U.S. 1992 Census of Agriculture.
agstrat
agstrat
Data frame with the following 17 variables:
county name
state abbreviation
number of acres devoted to farms, 1992
number of acres devoted to farms, 1987
number of acres devoted to farms, 1982
number of farms, 1992
number of farms, 1987
number of farms, 1982
number of farms with 1000 acres or more, 1992
number of farms with 1000 acres or more, 1987
number of farms with 1000 acres or more, 1982
number of farms with 9 acres or fewer, 1992
number of farms with 9 acres or fewer, 1987
number of farms with 9 acres or fewer, 1982
factor with levels S
(south), W
(west),
NC
(north central), NE
(northeast)
random numbers used to select sample in each stratum
sampling weighs for each county in sample
U.S. 1992 Census of Agriculture
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 437.
Length of left middle finger and height for 3000 criminals
anthrop
anthrop
Data frame with the following 2 variables:
length of left middle finger (cm)
height (inches)
Macdonell, W. R. (1901). On criminal anthropometry and the identification of criminals, Biometrika, 1: 177–227.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 438.
Length of left middle finger and height for an SRS of 200 criminals from the anthrop dataset
anthsrs
anthsrs
Data frame with the following 2 variables:
length of left middle finger (cm)
height (inches)
Macdonell, W. R. (1901). On criminal anthropometry and the identification of criminals, Biometrika, 1: 177–227.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 438.
Length of left middle finger and height for an unequal-probability sample of criminals of size 200 from the anthrop dataset. The probability of selection, psi[i], was proportional to 24 for y < 65, 12 for y = 65, 2 for y = 66 or 67, and 1 for y > 67.
anthuneq
anthuneq
Data frame with the following 3 variables:
length of left middle finger (cm)
height (inches)
probability of selection
Macdonell, W. R. (1901). On criminal anthropometry and the identification of criminals, Biometrika, 1: 177–227.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 438.
Selection of Accounts for Audit in Example 6.11
audit
audit
Data frame with the following 6 variables:
audit unit
book value of account
cumulative book value
random number 1 selecting account
random number 2 selecting account
random number 3 selecting account
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 439.
Data from home owner's survey on total number of books
books
books
Data frame with the following 6 variables:
shelf number
number of the book selected
purchase cost of the book
replacement cost of book
Used in Exercise 6 of Chapter 5.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 439.
Data from the 1994 Survey of ASA Membership on Certification
certify
certify
Data frame with the following 11 variables:
should the ASA develop some form of certification? factor
with levels yes
, possibly
, noopinion
,
unlikely
and no
would you approve of a certification program similar to
that described in the July 1993 issue of Amstat News? factor
with levels yes
, possibly
, noopinion
,
unlikely
and no
Should there be specific certification programs for
statistics subdisciplines? factor with levels yes
,
possibly
, noopinion
, unlikely
and no
If the ASA developed a certification program, would you
attempt to become certified? factor with levels yes
,
possibly
, noopinion
, unlikely
and no
If the ASA offered certification, should recertification
be required every several years? factor with levels yes
,
possibly
, noopinion
, unlikely
and no
Major subdiscipline; factor with levels BA
(Bayesian),
BE
(business and economic), BI
(biometrics), BP
(biopharmaceutical), CM
(computing), EN
(environment),
EP
(epidemiology), GV
(government), MR
(marketing),
PE
(physical and engineering), QP
(quality and productivity),
SE
(statistical education), SG
(statistical graphics),
SP
(sports), SR
(survey research), SS
(social statistics),
TH
(teaching statistics in health sciences), O
(other)
Highest collegiate degree; factor with levels B
(BS or BA),
M
(MS), N
(none), P
(PhD) and O
(other)
Employment status; factor with levels E
(employed),
I
(in school), R
(retired), S
(self-employed),
U
(unemployed) and O
(other)
Primary work environment; factor with levels A
(academia),
G
(government), I
(industry), O
(other)
Primary work activity; factor with levels C
(consultant),
E
(educator), P
(practitioner), R
(researcher),
S
(student) and O
(other)
For how many years have you been a member of ASA?
The full dataset is on Statlib
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 439. http://lib.stat.cmu.edu/asacert/certsurvey
Selected information on egg size from coots, from a study by Arnold (1991). Data courtesy of Todd Arnold.
coots
coots
Data frame with the following 11 variables:
clutch number from which eggs were subsampled
number of eggs in clutch (Mi)
length of egg (mm)
maximum breadth of egg (mm)
calculated as 0.00507 x length x breadth^2
received supplemental feeding? factor with levels
no
and yes
Not all observations are used for this data set, so results may not agree with those in Arnold (1991)
Arnold, T.W. (1991). Intraclutch variation in egg size of American Coots, The Condor, 93: 19–27
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 440.
Data from an SRS of 100 of the 3141 Counties in the U.S.
counties
counties
Data frame with the following 18 variables:
random number used to select the country
state (two-letter abbreviation)
county
land area, 1990 (square miles)
total population, 1992
active nonfederal physicians on Jan. 1, 1990
school enrollment in elementary or high school, 1990
percent of school enrollment in public schools
civilian labor force, 1991
number unemployed, 1991
farm population, 1990
number of farms, 1987
acreage in farms, 1987
total expenditures in federal funds and grants, 1992 (millions of dollars)
civilians employed by federal government, 1990
military personnel, 1990
number of veterans, 1990
percentage of veterans from Vietnam era, 1990
U.S. Bureau of Census, 1994
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 440.
Data from a sample of divorce records for states in the Divorce Registration Area (National Center for Health Statistics 1987)
divorce
divorce
Data frame with the following 20 variables:
state name
state abbreviation
sampling rate for state
number of records sampled in state
number of records in sample with husband's age < 20
number of records with 20 <= husband's age <= 24
number of records with 25 <= husband's age <= 29
number of records with 30 <= husband's age <= 34
number of records with 35 <= husband's age <= 39
number of records with 40 <= husband's age <= 44
number of records with 45 <= husband's age <= 49
number of records with wife's age >= 50
number of records in sample with wife's age < 20
number of records with 20 <= wife's age <= 24
number of records with 25 <= wife's age <= 29
number of records with 30 <= wife's age <= 34
number of records with 35 <= wife's age <= 39
number of records with 40 <= wife's age <= 44
number of records with 45 <= wife's age <= 49
number of records with wife's age >= 50
National Center of Health Statistics (1987). TODO
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 440.
Simple Random Sample (SRS) of 120 golf courses taken from the population of the (now defunct) Website www.golfcourse.com
golfsrs
golfsrs
Data frame with the following 16 variables:
random number used to select golf course for sample
state name
number of holes
type of course; factor with levels priv
(private),
semi
(semi-private), pub
(public), mili
(military) and res
(resort)
year the course was built
greens fee for 18 holes during week
greens fee for 9 holes during week
greens fee for 18 holes on weekend
greens fee for 9 holes on weekend
back-tee yardage
course rating
par for course
golf cart rental fee for 18 holes
golf cart rental fee for 9 holes
Are caddies available? factor with levels yes
and no
Is a golf pro available? factor with levels yes
and no
The now defunct website golfcourse.com (https://web.archive.org/web/19991108203827/http://golfcourse.com/)
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and TODO.
Height and gender of 2000 persons in an artificial population
htpop
htpop
height of person, cm
factor with levels F
(female) and M
(male)
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. 230–234 and 441.
Height and gender for an SRS of 200 persons, taken from htpop
htsrs
htsrs
random number used to select the unit
height of person, cm
factor with levels F
(female) and M
(male)
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. 230–234 and 442.
Height and gender for a stratified random sample of 160 women and 40 men taken from the htpop population
htstrat
htstrat
random number used to select the unit
height of person, cm
factor with levels F
(female) and M
(male)
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. 230–234 and 442.
Types of Sampling Used for Articles in a Sample of Journals
journal
journal
Data frame with the following 3 variables:
number of articles in 1988 that used sampling
number of articlues that used probability sampling
number of articles that used nonprobability sampling
Jacoby and Handlin (1991). TODO
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 442.
Draw Samples Using Lahiri's Method
lahiri.design(relsize, n, clnames = seq(along = relsize))
lahiri.design(relsize, n, clnames = seq(along = relsize))
relsize |
vector of relative sizes of population PSUs |
n |
desired sample size |
clnames |
vector of PSU names for population |
clusters vector of n PSUs selected with replacement and with probability proportional to relsize
Original code from Lohr (1999), p. 452 – 453.
Sharon Lohr, slightly modified by Tobias Verbeke
Lahiri, D. B. (1951). A method of sample selection providing unbiased ratio estimates, Bulletin of the International Statistical Institute, 33: 133 – 140.
Roberts et al. (1995) report on the results of a survey of parents whose children had not been immunized against measles during a recent campaign to immunize all children in the first five years of secondary school.
measles
measles
Data frame with 11 variables. A parent who refused consent (variable 4) was asked why, with responses in variables 5-10. A parent could give more than one reason for not having the child immunized.
school attended by child
parent received consent form
parent returned consent form
parent gave consent for measles immunization
child had already had measles
child had been immunized against measles
parent concerned about side effects
parent wanted GP (general practitioner) to give vaccine
child did not want injection
parent thought measles not serious illness
GP advised that vaccine was not needed
The original data were unavailable; univariate and multivariate summary statistics from these artificial data, however, are consistent with those in the paper.
Roberts R. J. et al. (1995). Reasons for non-uptake of measles, mumps, and rubella catch up immunisation in a measles epidemic and side effects of the vaccine, British Medical Journal, 310, 1629–1632.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 442.
Selected variables for victimization incidents in the July-December 1989 NCVS. Note that some variables were recoded from the original data file.
ncvs
ncvs
Data frame with the following seven variables:
incident weight
factor with levels male
and female
violent crime? factor with levels no
and yes
did the victim have injuries? factor with levels no
and yes
factor with levels yes
if the victim received medical
care and no
otherwise
was the incident reported to the police? factor with levels
yes
and no
number of offenders involved in crime; factor with levels
one
, more
(more than one) and dontknow
Incident-level concatenated file, NCS8864I, in NCJ-130915, U.S. Department of Justice 1991.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 443.
Data collected in the New York Bight for June 1974 and June 1975 (Wilk et al. 1977)
nybight
nybight
Data frame with the following 7 variables:
year
stratum membership, based on depth
number of fish caught during trawl
total weight (kg) of fish caught during trawl
number of species of fish caught during trawl
depth of station (m)
surface temperature (degrees Celsius)
Two of the original strata were combined because of insufficient sample sizes.
Wilk, S.J. et al. (1977). Fishes and associated environmental data collected in New York bight, June 1974 - June 1975. NOAA Technical Report NMFS SSRF-716. Washington, D.C: Government Printing Office.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 443.
Data on number of holts (dens) in Shetland, United Kingdom used in Kruuk et al. (1989). (Data courtesy of Hans Kruuk).
otters
otters
Data frame with the following three variables:
coastline section
type of habitat (stratum)
number of holts
Kruuk, H.A. et al. (1989). An estimate of numbers and habitat preferences of otters Lutra lutra in Shetland, UK., Biological Conservation, 49: 241–254.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 443.
Hourly ozone readings in parts per billion (ppb) from Eskdalemuir, Scotland, for 1994 and 1995
ozone
ozone
Data frame with the following 25 variables:
date (day/month/year)
ozone reading at 1:00 GMT
ozone reading at 2:00 GMT
ozone reading at 3:00 GMT
ozone reading at 4:00 GMT
ozone reading at 5:00 GMT
ozone reading at 6:00 GMT
ozone reading at 7:00 GMT
ozone reading at 8:00 GMT
ozone reading at 9:00 GMT
ozone reading at 10:00 GMT
ozone reading at 11:00 GMT
ozone reading at 12:00 GMT
ozone reading at 13:00 GMT
ozone reading at 14:00 GMT
ozone reading at 15:00 GMT
ozone reading at 16:00 GMT
ozone reading at 17:00 GMT
ozone reading at 18:00 GMT
ozone reading at 19:00 GMT
ozone reading at 20:00 GMT
ozone reading at 21:00 GMT
ozone reading at 22:00 GMT
ozone reading at 23:00 GMT
ozone reading at 24:00 GMT
Air Quality Information Centre: retrieved from a now defunct URL (http://www.aeat.co.uk)
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 443.
All possible SRSs that can be generated from the population in Example 2.1 of Lohr(1999).
samples
samples
Data frame with the following 10 variables:
sample number
first unit in sample
second unit in sample
third unit in sample
fourth unit in sample
value for first unit in sample
value for second unit in sample
value for third unit in sample
value for fourth unit in sample
t hat, i.e. estimate of the population total based on the given sample
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. 26–27 and 444.
Data on number of breathing holes found in sampled areas of Svalbard fjords, reconstructed from summary statistics given in Lydersen and Ryg (1991)
seals
seals
Data frame with the following 2 variables:
zone number for sampled area
number of breathing holes Imjak found in area
The data are used in Chapter 4, Exercise 11.
Lydersen, C. and Ryg, M. (1991). Evaluating breeding habitat and populations of ringed seals Phoca hispida in Svalbard fjords, Polar Record, 27: 223–228.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 444.
Steps used in selecting the simple random sample (SRS) in Example 2.4 of Lohr(1999).
selectrs
selectrs
Data frame with the following 5 variables:
random number generated between 0 and 1
ceiling(3048*RN), with RN the random number
in column a
distinct values in column b
new values generated to replace duplicates in b
final set of distinct values to be used in sample
the set of indices in column e
was used to select
observations from agpop
into dataset agsrs
.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. 31–34 and 444.
counties selected with probability proportional to 1992 population
statepop
statepop
state abbreviation
county
land area of country, 1990 (square miles)
population of county, 1992
number of physicians, 1990
farm population, 1990
number of farms, 1987
number of acres devoted to farming, 1987
number of veterans, 1990
percent of veterans from Vietnam era, 1990
City and Counties Book, 1994
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. 190 – 192 and 444.
Number of counties, land area, and population for the 50 states plus the District of Columbia
statepps
statepps
Date frame with the following 7 variables:
state name
number of counties in state
cumulative number of counties
land area of state, 1990 (square miles)
cumulative land area
population of state, 1992
cumulative population
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 445.
The 1987 Survey of Youth in Custody sampled juveniles and young adults in long-term, state-operated juvenile institutions. Residents of facilities at the end of 1987 were interviewed about family background, previous criminal history, and drug and alcohol use. Selected variables from the survey are contained in the syc data frame.
syc
syc
stratum number
psu (facility) number
number of eligible residents in psu
initial weight
final weight
random group number
age of resident
race of resident: factor with levels 1
(white),
2
(black), 3
(Asian/Pacific Islander),
4
(American Indian, Aleut, Eskimo), 5
(other)
ethnicity; factor with levels hispanic
and
notHispanic
highest grade before sent to correctional institution; factor
with levels 0
(never attended), 1
-12
(highest grade
attended), 13
(GED), 14
(other)
factor with levels male
and female
factor with levels 1
(mother only), 2
(father only),
3
(both mother and father), 4
(grandparents), 5
(other relatives),
6
(friends), 7
(foster home), 8
(agency or institution),
9
(someone else)
Has anyone in your family, such as your mother, father, brother, sister,
ever served time in jail or prison? factor with levels yes
and no
most serious crime in current offense; one of violent
(e.g. murder,
rape, robbery, assault), property
(e.g. burglary, larceny, arson, fraud, motor
vehicle theft), drug
(drug possession or trafficking), publicorder
(weapons violation, perjury, failure to appear in court), juvenile
(juvenile-status
offense, e.g. truancy, running away, incorrigible behavior)
Ever put on probation or sent to correctional institution for violent
offense? factor with levels no
and yes
number of times arrested (integer)
number of times on probation
number of times previously committed to correctional institution
Prior to being sent here, did you ever serve time in a correctional
institution? factor with levels yes
and no
previously arrested for violent offense; factor with levels
no
and yes
previously arrested for property offense; factor with levels
no
and yes
previously arrested for drug offense; factor with levels
no
and yes
previously arrested for public-order offense; factor with levels
no
and yes
previously arrested for juvenile-status offense; factor with levels
no
and yes
age first arrested (integer)
Did you use a weapon... for this incident? factor with levels
yes
and no
Did you drink alcohol at all during the year before being sent
here this time? factor with levels yes
, noduringyear
, noatall
Ever used illegal drugs? factor with levels no
, yes
Inter-University Consortium on Political and Social Research, NCJ-130915, U.S. Department of Justice 1989.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. 235–239 and 445.
Selected variables from a study on elementary school teacher workload in Maricopa County, Arizona.
teachers
teachers
data frame with the following 6 variables:
school district size; factor with levels large
and
me/sm
(medium/small)
school identifier
number of hours required to work at school per week
class size
minutes spent per week in school on preparation
minutes per week that a teacher's aide works with the teacher in the classroom
The study is described in Exercise 16 of Chapter 15. The psu sizes
are given in teachmi
. The large stratum had 245 schools; the
small/medium stratum had 66 schools.
Data courtesy of Rita Gnap (1995).
Gnap, R. (1995). Teacher load in Arizona elementary school districts in Maricopa County. Ph.D. diss., Arizona State University
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 446.
Cluster sizes for the study on elementary school teacher workload in Maricopa County, Arizona.
teachmi
teachmi
data frame with the following 6 variables:
school district size; factor with levels large
and
me/sm
(medium/small)
school identifier
number of teachers in that school
number of surveys returned from that school
The study is described in Exercise 16 of Chapter 15. The
actual date are given in teachers
.
Data courtesy of Rita Gnap (1995).
Gnap, R. (1995). Teacher load in Arizona elementary school districts in Maricopa County. Ph.D. diss., Arizona State University
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 446.
Follow-up study of nonrespondents from the Gnap (1995) study on the workload of elementary school teachers in Maricopa County, Arizona.
teachnr
teachnr
data frame with the following 6 variables:
number of hours required to work at school per week
class size
minutes spent per week in school on preparation
minutes per week that a teacher's aide works with the teacher in the classroom
The study is described in Exercise 16 of Chapter 15. The
actual date are given in teachers
. Cluster size data for
the original study are given in teachmi
.
Data courtesy of Rita Gnap (1995).
Gnap, R. (1995). Teacher load in Arizona elementary school districts in Maricopa County. Ph.D. diss., Arizona State University
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 446.
Selected variables from the Arizona State University Winter Closure Survey, taken in January 1995. This survey was taken to investigate the attitudes and opinions of university employees toward the closing of the university between December 25 and January 1.
winter
winter
data frame with the following 6 variables:
stratum number; factor with levels faculty
,
classstaff
(classified staff), admstaff
(administrative
staff) and acprof
(academic professional)
factor with levels 1
(1-2 years), 2
(3-4 years),
3
(5-9 years), 4
(10-14 years) and 5
(15 or more years)
In the past, have you usually taken vacation days in
the entire period between December 25 and January 1? factor with levels
no
and yes
Did you work on campus during Winter Break Closure? factor with
levels no
and yes
Did the Winter Break Closure cause you any difficulty/concerns?
factor with levels no
and yes
Did the Winter Break Closure negatively affect your work
productivity? factor with levels no
and yes
I was unable to obtain staff support in my department/office.
factor with levels yes
and no
I was unable to obtain staff support in other departments/offices.
factor with levels yes
and no
I was unable to access computers, copy machine, etc. in my
department/office. factor with levels yes
and no
I was unable to endure environmental conditions - e.g., not
properly climatized. factor with levels yes
and no
I was unable to access university services necessary to my
work; factor with levels yes
and no
I was unable to work on my assignments because I work in
another department/office; factor with levels yes
and no
I was unable to work on my assignments because my office
was closed; factor with levels yes
and no
compared to other departments/offices, I feel staff in my
department/office were treated fairly; factor with levels strongagr
(strongly agree), agree
, undecided
, disagree
,
strdisagr
(strongly disagree)
compared to other people working in my department/office, I
feel I was treated fairly; factor with levels strongagr
(strongly agree), agree
, undecided
, disagree
,
strdisagr
(strongly disagree)
How satisfied are you with the process used to inform staff
about Winter Closure? factor with levels verysat
(very satisfied),
satisfied
, undecided
, dissatisfied
and verydissat
(very dissatisfied)
How satisfied are you with the fact that ASU had a Winter Break
Closure this year? factor with levels verysat
(very satisfied),
satisfied
, undecided
, dissatisfied
and verydissat
(very dissatisfied)
Would you want to have Winter Break Closure again? factor with
levels no
and yes
courtesy of the ASU Office of University Evaluation.
Lohr (1999). Sampling: Design and Analysis, Duxbury, p. TODO and 447–448.