| carb.adults.survey.methods {heR.ActivityData} | R Documentation |
A Report to California Air Resources Board Sacramento, California by the Survey Research Center of the University of California. Please note that the survey data base was revised and corrected subsequent to the completion of this report.
Thomas Piazza and Yu-Teh Cheng
Survey Research
Center, University of California, Berkeley
List of Tables:
The 1987-88 California Activity Pattern Survey was designed to measure the exposure of California residents to airborne toxics in the home and the workplace and to obtain some additional information about the respondents' opinions and behavior. The survey was funded by the Air Resources Board of the State of California.
The Survey Research Center of the University of California, Berke1ey, contracted to design a telephone sample of households of the State of California, assist in questionnaire construction, conduct the interviewing, and prepare the data for analysis. This report summarizes the sampling methods used for this study. The general design of the sample is given first. Then various aspects of the design are described in more detail.
The sample is a clustered random-digit telephone sample of all households in California. The sample was generated using procedures described by J. Waksberg ("Sampling Methods for Random Digit Dialing," Journal of the American Statistical Association, vol. 73, March l978, pp. 40-46). Households with no telephone, of course, are excluded. Households with no English-speaking adults were also excluded by design, in order to avoid the coat of translating the questionnaire and hiring bilingual or multilingual interviewers.
Prior to selection, all of the telephone exchanges in the state were grouped into three strata: Los Angeles and the South Coast, the San Francisco Bay Area, and the rest of the state. When clusters of telephone numbers were selected for the study, the sampling fraction was doubled for the Bay Area, in comparison with Los Angeles and the South Coast; the sampling fraction was doubled again for the rest of the state. Thss oversampling was carried out in order to spread the selected households more widely over a variety of climatic zones.
Within each selected household, one adult aged 18 or over was selected at random to be interviewed. Part of the adult interview included an enumeration of children aged 12 through 17 residing in the household. If a child in that age range resided there, permission was sought from the appropriate parent or guardian to administer a shortened version of the interview to the child. If more than one child in that age range resided in the household, one child, referred to as the 'youth respondent," was selected at random to be interviewed.
A goal of this study wa. to obtain information from households in a wide variety of climatic zones. Since most of the California population is clustered in a few metropolitan areas, an unrestricted sample would result in the completion of very few interviews in other more sparsely populated, but cl;satically diverse, areas of the state. A stratification of all the telephone exchanges in the state was carried out, therefore, in ordes to provide a means of oversampling the non-metropolitan area. and of distributing the sample over as many climatic zones as possible. There are two aspects of this stratification: the creation of three explicit major strats, and an implicit geographic stratification within each of the major strata. Let us review each of these aspects in turn.
3.1 Creation of Three Major Strata
A list of all the central office telephone codes (prefixes) in California was taken from the August 1987 American Telephone and Telegraph V & H Coordinate Tape (produced monthly by AT&T). The record for each prefix includes the area code, the prefix (first three digits of a phone n=ser), the name of the city or billing location, and two geographic coordinates north-south and east-west). After deletion of prefixes for directory assistance and time, and of a few other prefixes known to be non-residential, the remaining prefixes were divided into three groups or strata.
The first stratum was the South Coast area, comprising the Los Angeles air basin and San Diego County. Information on the boundaries of the Los Angeles air basin was obtained from the Los Angeles Air Quality Management District. That information was then compared with the city names on the prefix records after sorting them on geographic coordinates in order to decide into which stratum to place each telephone prefix. It turned out that the prefixes in the 818, 213, and 714 area codes cover that area almost exactly. As for San Diego County, prefixes in the southern part of the 619 area code were sorted from east to west; then place names were compared with a map; and the western portion was placed into the South Coast stratum.
The second stratum was the San Francisco Bay Area. Boundary information for that air basin was obtained from the Bay Area Air Quality Management District. The five-county center of that ares coincides with the 415 area code. However, the air basin also includes Napa County, the southern portions of Sonoma and Solano counties, and the northern portion of Santa Clara County (principally San Jose). Prefixes in the 707 and 408 area codes were therefore sorted from north to south; then place names were compared with a map; and the appropriate prefixes were placed into the Bay Area stratum.
The third major stratum consisted of all the California prefixes left over, after creating the first two strata. Because of the heterogeneity of this third stratum, we found it desirable to carry out some further stratification, as described next.
3.2 Further Implicit Stratification
Prior to the selection of primary clusters, prefixes within each area code (or within each part of an area code, if it had been divided between major strata) were sorted geographically, using the north-south and the east-west coordinates on the AT&T tape. The direction of the sort for each area code is shown in Table 1. For example, all of the prefixes in area code 818 fall within the South Coast major stratum, and they were sorted from north to south (n-s).
The purpose of this sorting was to distribute the sample proportionately over the various regions within each major stratum. The third stratum (in particular the "rest of the state") includes several regions of distinct interest. This sorting procedure insured that each of the regions were included in the sample in proportion to its number of telephone prefixes.
After the prefixes were sorted within area code, all of the prefix lists within a major stratum were put together into one list, in the order given in Table 1. The lists for each major stratum were then ready for the selection of primary clusters.
Table l Stratification of Area Codes and Prefixes
(Treatment of 10
California Area Codes)
1. South Coast
| Area Code | Portion | Sort | Location |
| (818) | all | n-s | Los Angeles Co. |
| (213) | all | n-s | Los Angeles Co. |
| (714) | all | w-e | Orange, Riverside, part of S. Bern. counties |
| (619) | SW part | e-w | San Diego County |
2. San Francisco Bay Area
| Area Code | Portion | Sort | Location |
| (707) | S part | n-s | Napa Co., S. parts of Sonoma and Solano |
| (415) | all | n-s | S.F., Alameda, Contra Costa, S. Mateo, Marin |
| (408) | N part | n-s | Santa Clara Co. |
3. Rest of State
| Area Code | Portion | Sort | Location |
| (707) | N part | s-n | North coast |
| (916) | all | n-s | Northern valley and mountains |
| (619) | N & SE | n-s | Desert |
| (209) | all | n-s | central valley |
| (805) | all | e-w | Central valley and coast |
| (408) | S part | s-n | Central coast |
Note: Prior to systematic random selection of primary clusters prefixes within each (part of an) area code were sorted geographically in the direction indicated; the sorted area code lists were then put together in the order shown into one list for each major stratum.
Our goal for the first stage of sampling was to identify approximately 250 clusters of residential telephone numbers throughout the state. Most random telephone numbers are either non-working, business, or government numbers. In order to identify 250 residential numbers, we estimated (based on past experience) that we should start with about 935 numbers. Since it is preferable to subsampie an equal number of units within each cluster, the oversampling of certain parts of the state was done at this stage of primary cluster selection.
Within each of the three major strata we selected a certain proportion of possible telephone numbers by systematic random sampling – that is, by setting a selection interval, taking a random start, and then selecting every nth number. The systematic nature of the procedure insured that the implicit geographic stratification of the prefixes would be preserved.
The proportion of telephone number. selected from each major stratum is shown in Table 2. In the South Coast stratum, for example, we selected 194 out of 1532 prefixes (each of which has 10,000 possible telephone numbers), or .127 of the prefixes. The proportion selected was doubled for the San Franciaco Bay Area, and doubled again for the rest of the state. This disproportionate sampling was carried out in order to spread the sample over a wide variety of climatic zones. Without such disproportionate selection, the sample would have been clustered primarily in a few large urban areas. Note that a weight inversely proportional to the rate of oversampling must be used in the data analysis if statewide estimates of statistics are made.
After the primary telephone numbers were selected, each was called and administered a short screening interview to determine if the number was a residence. If it was not, that cluster was dropped from the sample. If, on the other hand, the number was a residence, additional telephone numbers within that cluster were generated for the main study. Of the 936 original telephone numbers, 252 were determined to be residences and formed the clusters for our sample.
Table 2 Section of Primary Clusters
| Major Strata | No. Prefix | Selections | Fraction |
| South Coast | 1532 | 194 | .127 |
| S.F. Bay Area | 759 | 192 | .253 |
| Rest of State | 1085 | 550 | .507 |
| 3376 | 936 |
The telephone numbers within each cluster were generated by varying at random the last two digits of the primary number. For example, if the primary numbers for a cluster was (415) 642-6578, additional telephone numbers within the cluster were generated by replacing the "78" with one of the 99 other two-digit possibilities.
Under the clustered sampling procedure, a set of telephone numbers is prepared for interviewing from each cluster. If a telephone number turns out to be non-residential, it is replaced. The total number of residences in each cluster, consequently, remains fixed. The probability of selecting a household is constant across clusters (within major strata), provided that the same number of residential telephone numbers has been set up for interviewing. For this study, most clusters had 11 residential numbers, although a few clusters had a different number. A weight to adjust for this variation could be used in data analysis, although its effect would be negligible in this case. Weights to adjust for major differences in selection probabilities. are discussed next.