Planning and conducting a survey
Epidemiological surveys use various study designs and range widely in size. At one extreme a case-control investigation may include fewer than 50 subjects, while at the other, some large longitudinal studies follow up many thousands of people for several decades. The main study designs will be described in later chapters, but we here discuss important features that are common to the planning and execution of surveys, whatever their specific design. Early planning Every study needs a primary purpose. It is easy to argue "While we have the subjects there, let's also measure..."; but overloading, whether of investigators or subjects, must be avoided if it in any way threatens the primary purpose. Sometimes subsidiary objectives may be pursued in subsamples (every nth subject, or in a particular age group) or by recalling some subjects for a second examination: when their initial contact has been favourable then response to recall is usually good. Background reading Choice of examination methods Information abstracted from existing records The design of the abstraction form or of the computer program for inputting data should take into account the layout of the source material. Having to flick repeatedly backwards and forwards through the source record is not only tedious and time consuming, but may also increase the chance of error. Each abstracted record should be identified by a serial number, and should include sufficient information to permit easy access back to the source material for checking and to obt2in additional data if required. When data are not abstracted direct on to computer, later transfer to computer will often be facilitated by numerical coding, in which case coding boxes can be provided on the right hand side of the abstraction form. Some items of data (for example, dates of birth) can easily be written direct into the coding boxes. Others, such as occupation, may need to be recorded in words and coded later as a separate exercise. Time spent writing is minimised if non-numerical information is, when possible, ringed or ticked rather than having to be written out. To minimise the chance of error, any reformulation of numerical data (for example, derivation of age at hospital admission from date of birth and date of admission) should be carried out by the computer after date entry, and not as part of the abstraction process. When coding data, allowance must be made for the possibility of missing information.
Questionnaires Good design of questionnaires requires skill. The language used should be clear and simple. Two short questions, each covering one point, are better than one longer question which covers two points at once. A question that has been used successfully in a previous study has obvious advantages. The order of questions should take into account the sensitivities of the person to whom they are addressed - it is better to start with "What is your date of birth?" than launch straight into "Have you ever been treated for gonorrhoea?" - and should be designed to facilitate recall. For example, all questions relating to one phase of the person's life might be grouped together. As a check on the reliability of information, it may sometimes be helpful to include overlapping questions. In a study of risk factors for back pain, some people reported that their jobs entailed driving for more than four hours a day but did not involve more than two hours sitting. This suggests that they had not properly understood the questions. An important consideration is whether to use closed or open ended questions. Closed ended questions, with one box for each possible answer (including "don't know") are more readily answered and classified, but cannot always collect information in the detail that is required. When interviewers are used then the wording with which they ask questions should be standardised as far as is compatible with the need to obtain useful information. As in abstracting existing records, the forms used to record answers to questions should be designed for ease and accuracy of completion and to simplify subsequent coding and analysis. Physical examination and clinical investigations Whatever method of data collection is adopted, it is usually worth trying it out in a pilot survey before embarking on the main study. Identification of practical snags at this stage can save much difficulty later. In large studies the questionnaire or record design should be discussed with the statistician who will later be concerned in the analysis.
Staff and training The figure shows how two observers had distinct but opposite time trends in their performances during the early stages of a survey of skinfold thickness. Such training effects, which are common, should have been completed before the start of the main study: new staff need supervised practice under realistic field conditions followed by pre-survey testing.
Trend in mean values for triceps skinfold thickness obtained by two observers in the same survey Despite all precautions, observer differences may persist. Observers should therefore be allocated to subjects in a more or less random way: if, for example, one person examined most of the men, and another most of the women, then observer differences would be confounded with true sex differences. To maintain quality control throughout the survey each examiner's identity should be entered on the record, and results for different examiners may then be compared. Sampling Sample size
Techniques also exist for calculating sample sizes required for estimating, with specified precision, the mean value of a variable, or for identifying a given difference in prevalence or mean values between two populations. These techniques may be found in textbooks or (better) by consulting a statistician; but either way the investigators must first know exactly what they want to achieve. Sampling methods It may be that an investigator wishes to choose a sample in which certain subgroups (particular ages, for instance, or high risk categories) are relatively overrepresented. To achieve this he may divide the study population into subgroups (strata) and then draw a separate random sample from each, while adjusting the various sample sizes to suit the investigation's requirements. This is a stratified random sample . The study population may be large and widely scattered - for example, all the general practices in a city - but for the sake of convenience the investigator may wish to concentrate his survey in a few areas only. This can be done by drawing first a random sample of practices, and then, within these practices, drawing a random sample of individuals. Such two stage sampling works well, but there is some loss of statistical efficiency, especially if only a few units are selected at the first stage. Recruiting subjects Response must be made as easy as possible. If attendance at a centre is required, it is better to send everyone a provisional appointment than to expect them to reply to a letter asking whether they are willing to attend. Provision of transport may be welcomed. Often the difference between a mediocre response and a good one is tactful persistence, including second invitations (perhaps by recorded delivery), telephone calls, identifying the reasons for non-attendance, and home visits. Response rates What matters is how unrepresentative non-responders are in relation to the study question. It is not important whether they are atypical in other respects. In a survey to evaluate the association between serum IgE concentrations and ventilatory function it would not matter if non-responders had an unusually high frequency of respiratory disease, provided that the relation of their ventilatory function to IgE was not unrepresentative. Assessment of the likely bias resulting from incomplete response is ultimately a matter of judgement. However, two approaches may help the assessment. Firstly, a small random sample can be drawn from the non-responders, and particularly vigorous efforts made to encourage their participation, including home visits. The findings for this subsample will then indicate the extent of bias among nonresponders as a whole. Secondly, some information is generally available for all people listed in the study population. From this it will be possible to contrast responders and non-responders with respect to characteristics such as age, sex, and residence. Differences will alert the investigator to the possibility of bias. In addition, it may help to put absolute bounds on the uncertainty arising from non-response by making extreme assumptions about the non-responders. For example, if the aim of a survey were to estimate a disease prevalence, what would be the prevalence if all of the non-responders had the disease, or none of them?
Analysis The starting point for analysis by computer is the coding and entry of data. These procedures should be checked, usually by carrying them out in duplicate. In addition, once the data have been entered, further checks should be made to ensure that all codes are valid (for example, nobody should have 31 February as a birth date) and to look for any internal inconsistencies (such as a date of admission to hospital being earlier than the subject's date of birth). Statistical analysis should only begin when the data set is as "clean" as possible. With the ready availability of software packages, it is tempting for medical investigators to embark on analyses they do not fully understand, and in the process they may use inappropriate statistical techniques. For this reason it is preferable to obtain advice from a statistician when carrying out all but the simplest analyses. As with the earlier stages of data processing, statistical calculations should all be checked. ۰۰۰ |