A Preliminary Attempt to Use Climate Data and Satellite Imagery to Model the Abundance and Distribution of Culicoides Imicola (diptera: Ceratopogonidae) in Southern Africa

INTRODUCTION The biting midge Culicoides imicola is widely distributed in sub-Saharan Africa, parts of North Africa and southern Europe, and southern Asia 13. Within that wide geographical range, however, its distribution is patchy and its abundance varies dramatically from one location to another. For example, recent studies over 2 years at 49 sites in Morocco and Iberia identified 11 sites from which C. imicola appears to be absent, and a 2000-fold range in its abundance at sites where it is present 4. In southern Africa C. imicola is believed to be the major or only vector of the


INTRODUCTION
The biting midge Culicoides imicola is widely distributed in sub-Saharan Africa, parts of North Africa and southern Europe, and southern Asia 13 .Within that wide geographical range, however, its distribution is patchy and its abundance varies dramatically from one location to another.For example, recent studies over 2 years at 49 sites in Morocco and Iberia identified 11 sites from which C. imicola appears to be absent, and a 2000-fold range in its abundance at sites where it is present 4 .
In southern Africa C. imicola is believed to be the major or only vector of the viruses that cause several economicallyimportant diseases of livestock, including bluetongue, African horse sickness, equine encephalosis, bovine ephemeral fever and Akabane virus infection 14 .The considerable geographical variation in the abundance of C. imicola has significant implications for the risk of these diseases to livestock and greater knowledge of the smaller-scale distribution of this insect may help with more focused and effective vaccination, surveillance and control measures.
For many reasons, however, detailed mapping of the distribution and abundance of C. imicola (or other insects) countrywide is impractical.A better approach is to attempt to understand the causes of the geographical variation in abundance.For vector-borne diseases, a common approach here exploits the strong correlation between certain climatic factors and the distribution or abundance of most terrestrial arthropods 21 .Armed with a knowledge of climate, it is possible, in theory, to derive 'expected' distributions of vectors after identification of the most significant climatic correlates of their distribution or abundance.An added advantage is that such maps can be experimentally altered to anticipate the effects of climate change although, it must be noted, such predictions may be flawed if they do not consider other determinants of distribution 6 .
For climatic modelling of insect distributions, a particularly useful technique is the use of satellite imagery.Several earth-viewing satellite sensors record images that may act as surrogates for climatic variables -that is, the images are correlated to a greater or lesser degree with certain climatic variables recorded on the earth's surface 8 .One such surrogate climatic variable is the normalised difference vegetation index (NDVI), a measure of the photosynthetic activity of living green vegetation, and which is correlated with functions of moisture such as soil moisture 7,17 , saturation deficit 22 and rainfall 22,24 .Others include the land surface temperature (LST), which is correlated with temperature 19,26 , and cold cloud duration (CCD), which is correlated with rainfall 25 .The use of satellite imagery has several advantages.First, in some cases satellite images have proved more effective than groundmeasured climatic variables at modelling arthropod distributions 2 3 because, perhaps, of advantages of scale: satellite images are averages over large areas while weather stations record at very specific point locations that may not be representative of the general area.Other benefits include global coverage (data are available for the whole of the earth's surface so that interpolation between weather station sites is not necessary), there is usually a high temporal frequency of the imagery (dekadal -10 day -images are widely used) and that, for historical, less-detailed images at least, data are available either at low or no cost.
C. imicola abundance estimates for 33 sites in South Africa and 1 in Lesotho, together with basic climate data, were recently published 29 .Combined with access to NDVI and LST data for the same region, this prompted a preliminary investigation into the use of both climate and satellite data for modelling the distribution and abundance of this vector in southern Africa.

Light trap collections and abundances of C. imicola
Venter et al. 29 made light-trap collections at 34 sites in southern Africa between January 1984 and September 1986.At most sites (31) the light traps were positioned in the vicinity of livestock (usually cattle, sheep, goats or horses) and it follows that predictions about the abundance of C. imicola elsewhere in southern Africa apply only to similar situations.The sites studied by Venter et al. 29 covered much of the region, although they were not evenly distributed (Fig. 1); for example, there were 7 sites in the immediate vicinity of Onderstepoort but only 5 in the entire Northern and Western Cape provinces.
Average abundances of C. imicola were calculated from data presented by Venter et al. 29 by first calculating a mean Culicoides total per catch per site (i.e. total number of Culicoides divided by the number of collections; Table 3 in Venter et al. 29 ) and then multiplying by the proportion of the Culicoides catch per site that were C. imicola (Table 4 in Venter et al. 29 ).There was a large range in sampling effort: at 8 sites only 1 collection was made and at 14 sites fewer than 5 collections were made, while at 1 site there were 146 collections (median = 15).As the abundance of C. imicola is highly seasonal, and there is large daily variation in its activity rate 2 , estimates of abundance based on 1 or very few light-trap catches are likely to be inaccurate.

Climate data
Venter et al. 29 presented long-term average climate data, obtained from the South African Weather Bureau (SAWB), for the 34 sites.Distances between weather stations and trap sites are not given but were, in some cases, several kilometres.The 4 sites nearest to Onderstepoort share the same climate data.The climatic variables given by Venter et al. 29 are temperature (annual mean daily maximum and minimum, and annual minimum), number of days with temperature <0 °C, October-March rainfall, April-September rainfall and total annual rainfall.We calculate an 8th variable (annual mean daily average temperature) by averaging the annual mean daily maximum and minimum.

Satellite images
Satellite images of southern Africa were derived from NASA's Pathfinder advanced very high resolution radiometer (AVHRR) land (PAL) dataset 10 and were kindly supplied by the TALA Research Group, Oxford University.Images were of 2 kinds, normalised difference vegetation index (NDVI) and land surface temperature (LST).
NDVIs were calculated from data recorded by channels 1 and 2 of the AVHRR using the following equation: limits.The utility of this ratio as a measure of photosynthetic activity arises from the absorption of light in the visible red wavelength (corresponding to channel 1) by plant tissues for photosynthesis, and the reflection of the near-infrared (corresponding to channel 2) that would otherwise damage plant cells 8,9 .
Land surface temperature estimates (in degrees Kelvin) were calculated from data recorded by channels 4 and 5 of the AVHRR using an equation derived by Price 19 : LST = channel 4 + 3.33 (channel 5channel 4) In this equation, channel 4 gives a brightness temperature estimate (based on Planck's law); the 2nd part of the equation modifies the estimate to allow for attenuation of the signal by the atmosphere 8,9 .
All images were supplied as monthly maximum composites (i.e. the largest values per pixel per month, to allow for lost data caused by cloud cover etc.) from January 1988 to December 1990 (36 NDVI and 36 LST images) in the Goodes-Homolosine projection.Pixel sizes were approximately 7.6 × 7.6 km.Values for each of the 34 sites (using the coordinates given by Venter et al. 29 ) were obtained from the 36 monthly images; from these, annual maxima, means and minima were calculated for 1988, 1989 and 1990.The 3 maxima, 3 means and 3 minima were then averaged to give, for each site, an average annual maximum, mean and minimum NDVI and LST.Note that identical values were obtained for sites situated close to one another (e.g. at Onderstepoort).

Data analysis
Before analysis, the estimates of C. imicola abundance were subjected to a ln(n + 1) transformation to normalise the distribution of the data, which was otherwise strongly skewed towards low values.In the 1st stage of the analysis, the 15 predictor variables (8 climatic, altitude, 3 NDVI and 3 LST) were correlated with the C. imicola abundances.Thereafter, 2 models were developed.The 1st (model I) was permitted to contain only climatic variables and altitude as predictors, the 2nd (model II) was permitted to contain any variables as predictors.The objective was to compare models in which satellite imagery can be incorporated with those in which it is not.
Each of the 2 models was developed manually using established procedures 12 that test for a significant increase in the fit of a model from the inclusion of an additional x-variable, with the objective of deriving the most effective model from the fewest variables.First, we constructed a 'full' model in which all permitted x-variables were included in a regression on the y-variable (the transformed abundance of C. imicola).The mean square error (= residual SS/DF) of this model was taken to be the best estimate of random variability obtainable from the available data and was then used to test the x-variables individually and in combination.For this procedure, we began by testing the x-variables individually and obtained the best obtainable fit (in terms of model SS) for a 1-variable model.We then derived all possible 2-variable models and examined whether the increase in fit from the best 1-variable model was significant.If this was the case, we proceeded with adding 3rd variables.The procedure continued until no additional variables led to a significant increase in the model SS.In all cases, the test for a significant increase in the model SS was an F-test based on the difference in SS divided by the mean square error of the full model, the change in d.f.(= 1) and the error d.f. of the full model.
Since total annual rainfall is the sum of 2 other predictor variables, and annual mean temperature is the average of 2 others, there were problems of multicolinearity such that models including all variables (i.e. the full models) could not be fitted using standard software.Therefore, these data were adjusted by a small (maximum 0.3 %) random deviation.Furthermore, at some sites there were incomplete climate data: these sites were excluded from the analysis until appropriate models were developed and then included for the derivation of regression equations.

RESULTS
Table 1 gives the values of maximum, mean and minimum NDVI and LST used in the analyses here.Images of these satellite-derived variables for the southern Africa region are shown in Fig. 2A-F.The coldest area in the region (in terms of minimum LST) appears to be parts of the Karoo and areas of altitude in the Eastern Cape and Lesotho (Fig. 2A); the least cold areas are in the northern and eastern Northern Province, in the Kruger National Park in particular (Fig. 2A).The least hot area (in terms of maximum LST) is the south coast and the eastern third of the region; the hottest areas are the Northern Cape, and parts of Karoo, Northern Province, and Kruger National Park in Mpumalanga (Fig. 2C).The most barren area (in terms of minimum NDVI) is the Karoo and parts of the Kruger National Park; the least barren is the south and east coasts and the Drakensberg in Mpumalanga and the Northern Province (Fig. 2D).The least highly vegetated area (in terms of maximum NDVI) is the Karoo; the most highly vegetated area is the southwest of Western Cape, the south coast and the eastern parts of the region (Fig. 2F).

Correlations among predictor variables
There were strong correlations among many of the predictor variables (Table 2), and in particular among those of each type (e.g. the temperature variables derived from SAWB data, the LST variables and the NDVI variables).Winter and summer rainfall levels were not significantly correlated with each other, and total annual rainfall was strongly correlated with summer, but not winter rainfall.The LST variables were strongly positively correlated with temperature while correlations with rainfall were weaker and negative.Conversely, correlations between NDVI and temperature were mostly weak while correlations with rainfall (especially the annual total) were strong and positive.The NDVI variables were negatively correlated with the LST variables.

Correlations of predictor variables with C. imicola abundance
The average abundances of C. imicola are shown in Table 1.At one site (Rhodes) no C. imicola were caught; at the remaining sites mean catches of C. imicola ranged by 4 orders of magnitude from 4 to over 40 000 (median = 221).
Of the 15 predictor variables, the most strongly correlated with the abundance of C. imicola was the minimum LST (Table 3).There were also strong, positive correlations between abundance and the annual mean daily maximum, average and minimum temperatures.There were significant, negative correlations between abundance and the number of days <0 °C and with altitude (i.e. the more the days of frost, or the higher the altitude, the fewer the C. imicola).There were no significant correlations between abundance and either rainfall or NDVI.

Model I -climate data only
As individual predictor variables, altitude and all temperature variables were significant.The rain variables were not significant.The best 1-variable model was that of mean daily minimum temperature on C. imicola abundance (F1,19 = 13.4,P = 0.003; R 2 = 33.8%).The best 2-variable model combined mean daily average temperature and total annual rainfall (R 2 = 44.7%).However, the increase in fit from 33.8 to 44.7 % approached, but did   We conclude that of the climatic variables used here, therefore, the best model is of mean daily minimum temperature as a predictor of C. imicola abundance.The regression equation for this model is Ln(n + 1) = 0.76 + 0.397T where n is the abundance of C. imicola and T is the mean daily minimum temperature.The relationship between observed and predicted abundances is shown in Fig. 3A.

Model II -climate and satellite data
As in Model I, altitude and all temperature variables were significant as individual predictors and the rain variables were not significant.Five of the 6 satellite variables were also significant predictors of C. imicola abundance, the exception being the maximum LST.The best 1-variable model was that of minimum LST on C. imicola abundance (F1,13 = 32.2,P = 0.0002; R 2 = 38.2%).The best 2-variable model combined minimum LST and minimum NDVI (R 2 = 66.9 %) and the increase in fit was significant (F1,13 = 24.1,P = 0.0006).The best 3-variable model combined minimum LST, minimum NDVI and summer rainfall (R 2 = 71.3%).However, the increase in fit over the best 2-variable model was not significant (F1,13 = 3.7, P > 0.1).We conclude that of all variables used here, therefore, the best model combines the minimum LST and minimum NDVI as predictors of C. imicola abundance.The regression equation for this model is Ln(n + 1) = -94.0+ 0.323LST + 19.7 NDVI where n is the abundance of C. imicola, LST is the minimum LST and NDVI is the minimum NDVI.The relationship between observed and predicted abundances is shown in Fig. 3B.
Images of the minimum LST and minimum NDVI (Fig 2A ,D) were combined according to the above regression equation to generate a map of predicted C. imicola abundance in the southern Africa region (Fig. 4).For the purposes of interpretation it must be noted that the predicted abundances in Fig. 4 are for where light trapping occurs in the presence of domestic livestock (i.e.under the conditions used by Venter et al. 29 ).The high abundances  predicted for the northern Kruger National Park, for example, are not those actually expected in the wildlife reserve but are, rather, predictions for the theoretical case that the area be farmland instead.

DISCUSSION
The entomological data on which our study was based suffered from the limitation of very unequal sampling effort across sites.In particular, at 14 of the 34 sites our estimate of the 'abundance' of C. imicola was based on fewer than 5 lighttrap catches.Given the large day-to-day variation in the activity rate of C. imicola reported elsewhere 2 , and the strong seasonal trend in C. imicola abundance in South Africa 30 , such estimates must be prone to error.Factors not directly considered in our models, but which might be expected to affect the abundance of C. imicola on farms, are the types and number of livestock in the vicinity of the light-trap, farming practises (e.g.methods and extent of irrigation, use of insecticides, storage conditions of animal dung), soil type and moisture (i.e.suitability for breeding sites) and other, unmeasured climatic factors (e.g.wind speed).In the light of these reservations, it seems remarkable that simple 2-variable models comprising satellite imagery (LST, NDVI) can account for two-thirds of the variation in the abundance estimates.These findings underline the usefulness of satellite imagery as the basis for mapping the distribution of insect vectors.
Our results suggest at least 3 major advantages of using satellite imagery for climatic modelling.
Firstly, variables derived from satellite images performed better than the climatic variables in the predictive modelling.The reasons for this are unclear.The climate data, obtained from the SAWB, were from synoptic weather stations at some distance from the trap sites and it is likely that the modelling would have been more effective had weather data been available for the trap sites themselves.It is also possible that the satellite images recorded more biologically relevant data than the weather stations.LST, a measure of the temperature at the earth's surface, may be more equivalent to soil temperature than air temperature in areas of little vegetation cover (and hence the very high values in some areas) 19 , while NDVI, an index of vegetational activity, is a particularly good measure of soil moisture 7,17 .It has been suggested recently that soil conditions are particularly important in determining the distribution and abundance of C. imicola 3,15,16 through effects on the juvenile stages, and it is possible, therefore, that satellite images performed better than the climatic variables because of their more direct measure of these conditions.On that basis, one might expect climatic modelling to be improved were soil temperature and soil moisture to be recorded directly, in place of air temperature and rainfall.Such data are relatively difficult and expensive to obtain and, anyway, suffer the disadvantage that both can vary considerably over small areas of ground 1,19 .Indeed, the main reason that the satellite imagery performs well in modelling of this type may be that, as a result of its limited resolution, it records averages over relatively large areas that may better reflect the conditions experienced by insects than the very location-specific data recorded by weather stations.
The 2nd major advantage of using satellite imagery is the relative ease and low cost of obtaining appropriate data.In the study of Venter et al. 29 , climate data were obtained easily and cheaply from the SAWB but, in many countries, synoptic weather stations may be very few and the available data may have to be purchased.An alternative approach, used in the studies of Rawlings et al. 20 and Baylis et al. 3 , was to install and operate weather stations as part of the scientific project, but this was at a cost of several tens of thousands of pounds sterling, and imposed very heavy demands on the running of the project.By contrast, the satellite images used here can be readily obtained over the internet, free of charge a .Such images are obtained in a crude form and require extensive processing, which in turn demands appropriate computer software and expertise, before they can be used.Processing need be done once only, however, and after processing, data can be obtained for any site in Africa.
The 3rd major advantage of using satellite imagery in predictive modelling is its global coverage.A limitation of models based on climate data is that, once a model has been developed on the basis of observations at certain study sites, it is not straightforward to make predictions about other sites for which climate data are not available.In theory, we could have proceeded to obtain temperature and rainfall data for all 116 sites monitored by the SAWB, and to have then used an interpolation routine to generate a predictive map.Such a map would have been based on 116 sites in an area of approximately 1 220 000 km 2 , or over 10 000 km 2 per site.By contrast, satellite images generally provide information about entire regions and, as we have demon-strated here, they can be readily combined using GIS software to generate a predictive ' risk map'.In our map, which uses relatively low-resolution imagery, each pixel has an area of about 60 km 2 , but it is also possible to produce such maps with a pixel size of about 1.2 km 2 .Data from other satellite platforms can be used for even greater resolutions 8 .
The map of predicted abundances of C. imicola presented here is the first of its kind for any Culicoides.As suggested earlier, its statistical basis suffers from an uneven distribution of sites in southern Africa, with particularly poor representation in the western half of the region.Furthermore, the statistical model failed to explain about 33 % of the observed variation in C. imicola abundance, this percentage being attributable, presumably, to a combination of chance and the aforementioned factors not included in our model.To what extent, then, does the map agree or disagree with our knowledge of the distribution and abundance of C. imicola in southern Africa, and what are its limitations?A recently published study declared the virtual absence of C. imicola in the colder, high-lying area of the eastern Free State 27 .The study area was, roughly, just to the north of the northern tip of Lesotho; in Fig. 4 predicted abundances in this area are in the range 4-54.Single night catches ranged, in fact, from 0-128, with a detransformed mean (after log-transforming the catches) of 6.4.An earlier study at 2 localities in the southwestern Free State 11 reported only 44 C. imicola in 40 nights of light-trapping (albeit with a less efficient light trap than that of Venter et al. 29 i.e. not blacklight); in Fig. 4 the predicted abundances at these sites are in the range 1-24.For these studies, therefore, the predictions are of reasonable accuracy.Two other studies in South Africa 18,28 examined C. imicola abundances in areas (Stellenbosch and Onderstepoort) that were also examined by Venter et al. 29 and hence should not be compared, since data for those areas helped develop the predictive model.Considering unpublished data, Fig. 4 predicts a very low abundance, or even absence of C. imicola in the Great Karoo and Namaqualand and at the eastern tip of Lesotho, an intermediate abundance in the southwest of the Western Cape and the northeast of Northern Cape, and a high abundance just to the north of Swaziland (all areas not studied in detail by Venter et al. 29 ); these predictions largely concur with the findings of recent studies (R M, unpubl.data, 1998).
However, discrepancies have also been found between the actual abundances of C. imicola on the ground and those predicted here.For example, at opposite ends of a 350 km transect stretching from Colesberg (in the light blue area just north of Middelburg (EC) in Fig. 4) to Port Elizabeth on the southeastern coast, hundreds to thousands of C. imicola were captured at various sites around Colesberg, whereas a total of only 1 individual was captured at 7 sites around Port Elizabeth 16 .These data are the inverse of what is predicted in Fig. 4 Finally, but of importance, the current model has been generated from data collected during years of average rainfall.As shown by Meiswinkel 16 , C. imicola is a species that can increase 200-fold in seasons of above-average precipitation.The extent to which C. imicola penetrates areas peripheral to its normal distribution during these periods is unknown.
A C. imicola-free region of South Africa (an approximately 300 km 2 area at Port Elizabeth) was recently reported 15 .Subsequently, individual specimens of C. imicola have been caught in the region (R M, unpubl.data, 1998), demonstrating that although the area may not actually be C. imicola-free, the vector is nevertheless exceedingly rare.Fig. 4, however, suggests that C. imicola should be present at Port Elizabeth and the green pixels indicate an annual mean abundance in the range of 24-54.One reason for this disparity, which underlines an important limitation of satellite-derived maps like Fig. 4, is that coastal pixels are generally inaccurate: the inclusion of a large amount of water in an area being measured for a terrestrial variable (such as NDVI) inevitably leads to error.A 2nd factor to be considered is the importance of sandy versus clayey soils.The absence or near-absence of C. imicola at Port Elizabeth has been attributed to the sandiness of the soil, leading to low soil moisture and poor microorganism content 15 .Subsequently, soil sandiness has been negatively related to C. imicola catches from a 1996 survey at 47 sites in South Africa 16 .NDVI must be, in part, dependent on soil sandiness via the response of vegetation to drainage 7 , but this example indicates that soil sandiness, in combination with rainfall, may be a better predictor of the abundance of C. imicola than NDVI alone.It is worth noting, however, that there may be other factors that contribute to the near absence of C. imicola from Port Elizabeth apart from sandiness.Recent studies have suggested that higher wind speeds increase the mortality rates of adult C. imicola 5 and that this may lead to lower population sizes 3 .Wind speeds at Port Elizabeth are the highest of 34 sites currently under study in South Africa (M B, unpubl.data, 1998).
The value of Fig. 4 lies in its predictive capabilities.To this end, it is noteworthy that roughly in the centre of Fig. 4 (corresponding approximately to Jan Kempdorp, north of Kimberley) is an area of high predicted abundance of C. imicola, surrounded by significantly lower predicted abundances.This area may merit surveying for C. imicola, both for epidemiological purposes (to identify a possible area of higher disease risk) and as a partial test of the validity of the predictive map.
The best model of C. imicola abundance combined the minima of LST and NDVI while the means and maxima of these variables were less effective as predictors.Baylis et al. 3 also found the minimum, as opposed to the mean or maximum, NDVI to be the more effective predictor of C. imicola abundance in Morocco.They suggested that this arises from the concurrence in late summer of peak numbers of C. imicola (and, hence, breeding site demand) with the most barren time of year (and, hence, the minimum NDVI).Soil that is sufficiently moist to act as a breeding site in late summer or autumn is also soil capable of supporting some vegetation at that time of year, and this relationship is manifested in a correlation between abundance and minimum NDVI.The significance of the minimum LST is, at present, unclear.The minimum must correspond to winter temperatures, however, and it may reflect the ability of larvae to successfully overwinter.

Fig. 1 :
Fig. 1: Distribution of the 34 sites sampled for Culicoides by Venter et al. 29 .The location of Port Elizabeth, which is nearly C. imicola-free, is also shown.

0038- 2809 Fig. 2 :
Fig. 2: Satellite-derived variables investigated as predictors of Culicoides imicola abundance.LST land surface temperature, a measure of soil temperature.Scale is in degrees Celsius.NDVI normalised difference vegetation index, a measure of vegetation levels and, indirectly, soil moisture.

Fig. 3 :
Fig. 3: Predicted versus observed abundances of Culicoides imicola.A: Model I, predicted abundances derived from 1-variable model using mean daily minimum temperature.B: Model II, predicted abundances derived from 2-variable model combining the minimum LST and minimum NDVI.

Fig. 4 :
Fig. 4: Predicted abundances of Culicoides imicola in southern Africa based on the 2-variable model combining minimum LST and minimum NDVI.Values are the predicted annual mean light-trap catch of the vector when following the methods of Venter et al. 29 .

Table 1 : Average catches of Culicoides imicola at the 34 sites studied by Venter et al. 29 , and the annual maximum, mean and minimum NDVI and LST at the same sites.
a Normalised difference vegetation index.b Land surface temperature.

Table 3 : Descriptive statistics of the predictor variables obtained for the 34 sites studied by Venter et al. 29
Correlation coefficient with the annual mean abundance of C. imicola.
a Number of sites.b Mean and range from among the 34 sites.c . Such anomalies not only indicate that the current data set lacks detail but suggests too that the ecological requirements of C. imicola may not yet be fully understood.Two important parameters omitted in the present model are soil type and wind speed and are discussed further below.It should also be borne in mind that escalation in numbers of C. imicola at specific sites can be artificial, that is, C. imicola can become superabundant where livestock are kept on irrigated pastures, especially if these pastures occur on clayey, moisture-retentive soils.