GIS Scripts For Plotting CSP Data and Spatial Analysis
The following description is reproduced with permission and in whole from Jim Keron's MA thesis, Iroquoian Chert Acquisition: Changing Patterns in the Late Woodland of Southwestern Ontario (2003) on file at the University of Western Ontario. One chapter of this thesis was published in KEWA 03-4/5 and the following description is the detailed documentation of the methodology used to conduct the internal spatial analysis in that article. It would be best understood when accompanied by the article in KEWA as there are five examples of the application of the GIS procedures. Back issues of KEWA can be ordered form the London chapter and ordering information is available elsewhere in this web site.
The following reproduces Appendix B (pages 160-167) of Keron's thesis.
Appendix B: Spatial Analysis with GIS
This appendix provides the detailed description of how the intra-site spatial analysis was done using a Geographic Information System (GIS), MFWorks, by Keigan Systems of London, Ontario. The mathematics used and the GIS scripts are included. This discussion assumes some knowledge of MFWorks. There are basically three steps to this analysis.
1. Assignment of Village Space
The ideal situation for conducting internal spatial analysis would be where we are dealing with fully excavated sites with complete settlement pattern data. Unfortunately, there is only one fully excavated village site in the London area, the Calvert Site (Timmins 1997). While there are several fully excavated agricultural cabin sites, these are not suitable for the purpose of determining intra-village differential access to chert sources since they usually consist of one or two long houses with an associated midden. Furthermore, these sites would be best interpreted as being occupied by a single lineage. Partial excavations, such as the two midden samples from Harrietsville that initiated this investigation, can be indicative of internal patterning but only tell a partial story and are highly dependent on the areas actually excavated. The only other source of data, then, are the controlled surface pick-ups (CSP) of village sites. While a CSP fails to identify the internal house structure, it defines the site boundaries and can generally define the midden areas. The middens can generally be related to nearby longhouses and consequently to the occupants of those houses. Thus, the midden areas, at least, could be used to define discrete spatial units. The areas between the middens are more problematic with the difficulty coming from not knowing the house orientation. However, if the analysis is restricted to the middens much other information located in the non-midden areas of the site would be discarded. Thus, we are faced with the problem of assigning space within the village but not knowing the underlying settlement patterns.
One possible technique for assigning spatial categories to non-midden areas would be to assign the non-midden area within the site boundary to the nearest midden. With this assignment, various categories of artefacts could then be analyzed using this division of the village as the spatial control. This procedure involves carving up the internal village space and assigning it to the nearest midden by constructing a Voronoi network (Chrisman 1997: 152 ) around the middens with the MFworks operation “Fence”. Another term for this carving up of space is Theisen polygons (see Hodder and Orton  for another archaeological use of this technique). The script that creates these areas follows.
AssignedSpace = Fence NumberedMiddens;
HighMiddens = NumberedMiddens +100;
AAS = Cover AssignedSpace with HighMiddens;
IAA = Cover AAS with Sitemask;
IAA = Cover MapLayer2 with Sitemask;
CA = Recode IAA assigning void to 9999 carryover;
CulturalAreas = Trunc (CA);
Briefly, this script breaks all the area within the village into a number of sub-areas. Each midden is a sub-area and all of the interior space within the village is assigned to the nearest midden giving twice as many sub-areas as there are middens. The script requires two input maps, NumberedMiddens and SiteMask. The first map shows only the midden locations and was created from a map that plots all artefacts recovered with the CSP by tracing out the midden areas. Everything inside the midden boundaries is numbered preferably with the assigned midden numbers (e.g. Midden 1 etc.) and everything else is "void". The second map defines the site boundary and has a value of “1” outside the village and "void" inside. This map was created by tracing the site boundary as was evident in the original CSP. The result of this process is a map labeled CulturalAreas, and it defines the divisions used in this analysis. This map is used as input to the internal spatial analysis.
2. XYZ Input Files
In order to prepare the data required for the analysis, the chert types of the debitage were determined and a spreadsheet constructed showing the artefact type, chert type, and the location in Cartesian Co-ordinates. As normal CSP practices involve recording transit readings (distance and direction from a known point) or compass readings (two directions from each of two known points), it is first necessary to convert the transit or compass readings into Cartesian co-ordinates as most mapping programs require this format to define position and MFworks is no exception. This calculation is done simply using spreadsheets (Keron and Prowse 2001) available on the London Chapter, OAS web site. A sample set of spreadsheet data follows:
The data are then imported into the GIS system by importing a "XYZ file". The X and Y are the two Cartesian Co-ordinates from the above spreadsheet and the Y value is the actual count of artefacts at each spot. In the analysis conducted here, there is only one item in each row of the spreadsheet so the value is set to "1".
A second issue arising from CSP field methodology occurs when several items are cataloged at the same pick-up point. While this procedure greatly expedites the time required in the field to collect the original data, it complicates plotting as several items map to the same physical location. The GIS script is capable of dealing with multiple finds at the same point since the "Score" function can accept the totals at each find spot. However, it is desirable visually to see the actual distributions used. In order to break up multiple finds at the same location, a short GIS script was run after each XYZ file was imported into the GIS. This script will take the value of a specific point and create the same number of individual points within a couple of meters (the GIS provides the scaling) of it, thus, creating a visual representation of the original density of recovered material. The script has one minor drawback in that a recurring pattern is created rather than a random pattern that would be visually more appealing.
The script follows:
/* CSPMAP SCRIPT
This script is will break up multiple occurrences at the same plot point and create one dot per artefact. */
AP1 = recode MapLayer1 assigning 0.0 to void, carryover;
AP2 = Filter AP1 Mask ScatterFilter LowPass;
Flakes = recode AP2 assigning 1 to 1...400;
Maplayer1 is the default name of the imported XYZ file. It will contain the number of artefacts recorded at each find spot after the import of the XYZ file. The result of the filter operation will be one dot on the map for each artefact recorded at that spot. These dots will be within two metres of the point recorded as the find spot. This scattering is accomplished with a set of calculated values in the filter map that will lead to a series of numbers in the resulting map that vary above and below the value "1" at predefined points as defined in the filter. The script can accommodate up to twenty items at the same point and the result will have exactly the same number points equal to or greater than "1" as is represented at the particular find point. For example, if the value is "4", there will be four points equal to or greater than "1" and sixteen less than "1". All points less than "1" are dropped and all points greater than or equal to 1 are changed to 1 in the final "Recode" operation. Thus the value of "4" in the example becomes four individual points with the value "1".
The preceding discussion, particularly, assumes a knowledge of MFWorks in general and the "Filter" operation in particular. Without that knowledge, the preceding paragraph will not make sense. It has been included here as documentation for how the script works.
3. Spatial Analysis
With an assignment of the internal space of the site in place and the artefacts being analyzed imported, the next step is to examine the artefact distribution over these areas looking for patterns. This process simply involves counting the number of flakes of each type in each sub-area and then calculating the percentage of each source type by sub-area. As the flakes found within each area can be considered a sample, in the statistical sense, from that area, it is necessary to allow for sampling error to determine whether or not the differences are statistically significant. To do these calculations, more complex statistics are not required and simple confidence intervals can demonstrate non-random variation. The use of confidence intervals brings some assumptions about the nature of the data being used as the confidence interval is a parametric measure. The primary issue from the statistical perspective is that of the randomness of the sample. In the case of a CSP, if the entire site is clearly covered, that is we are not dealing with part of the site being inaccessible due to different crop cover or a bush lot or the use of different methods such as a CSP in a ploughed field combined with test pitting in a bush lot, and the entire CSP has been executed at the same point in time, then it is reasonable to assume that the sample is representative of the entire site. A CSP should meet the requirements of the confidence interval statistic.
The data from the CSP as described above is plotted against the various spatial units as defined in the map CulturalAreas by selecting various types and entering it into the GIS as individual maps (i.e. one for each kind of chert). An analysis can then be run showing summaries of total type and percentage by each zone. Once the total of each type has been calculated for each spatial unit, a confidence interval for the percentage of each category, such as Kettle Point chert, is calculated using the following formula (Wonnacott and Wonnacott 1990: 5):
Where the greek letter pi is the actual proportion of the total population.
P is the observed proportion in the sample
n is the size of the sample
What the confidence interval means is that the real value of the entire population being measured falls within the range of the specified confidence interval 95% of the time. The calculated range is similar to the range established for a radio-carbon date except that the radio-carbon dates are expressed as one standard deviation and thus the real date lies within the range only 66% of the time. The size of the confidence interval is inversely proportional to the size of the sample. Bigger samples result in a narrower range. Thus, each spatial unit has a confidence interval assigned and it is then necessary to compare the ranges of the intervals against each other. In the simple case, if two middens have confidence intervals that do not overlap, then there is a statistically significant difference in the distributions. For example, if one area of a site has 60 flakes of Onondaga chert out of a total of 396, the confidence interval is 15 +/- 3 %. If another area has 15 flakes out of 617, the confidence interval is 2 +/- 1 %. The two intervals do not overlap and the difference between the middens is statistically significant.
This process is hypothesis testing with , the null hypothesis, stating that the percentages in each spatial unit are similar to each other. Any observed differences are simply the result of sampling error. The hypothesis being tested is that there are significant internal differences in the distribution of material over the site.
To implement the calculations in MFWorks requires the use of a number of mathematical functions of the GIS. The site data described above is entered into MFWorks using a "XYZ" file that allows a surface scatter to be plotted. These maps must be aligned properly with the CulturalAreas map that allocated the village space. The amount of each chert type is then counted by running a “Score” operation against each area which totals the number of each type per sub-area. These numbers are then used to calculate the confidence interval values for each sub area of the village. The final "Combine" is simply used to create a single legend with all of the pertinent data. The script to do these calculations follows:
/* Kettle Point Chert Distribution Analysis Script
Input - four maps, one for each chert type <
- the CuturalAreas map to be used.
Count the occurrence of each type within each sub-area of the site and calculate the total of all types */
KPScore = Score CulturalAreas by KettlePointChert total;
LTCScore = Score CulturalAreas by LTChert total;
OnScore = Score CulturalAreas by OnondagaChert total;
OtScore = Score CulturalAreas by OtherChert total;
TotalScore = KPScore + LTCScore + OnScore + OtScore;
/* Calculate the frequency and the confidence interval of one type for each area and turn the results into percentages */
Freq = (KPScore * 1.0) / TotalScore;
StandardError = (1.96 * (Freq * (1-Freq) / TotalScore )^ .5);
Percent = (Trunc (Freq * 1000 + 0.5))/10.0;
SEPercent = (Trunc (StandardError * 1000 +0.5))/10.0;
/* Combine the results to create a single legend */
FA1 = Combine CulturalAreas with TotalScore with SEPercent with Percent;
/* Plot the individual points for this execution on the resulting map. */
BA1 = spread KettlePointChert to 2.5;
BA2 = recode BA1 assigning 100 to 0...3;
KPAnalysis = Cover FA1 with BA2;
As noted above, when the confidence intervals do not overlap the determination of statistical significance is easy and can be made directly from a review of the legend. However, a problem arises when there is partial overlap. In this case, more statistical calculations are necessary to determine whether or not the differences are statistically significant. It was not possible to implement these calculations in the GIS as it involves comparison of each area of the site with all other areas of the site. In order to calculate whether the differences between areas were statistically significant the data from the legend produced by the preceding script were entered into an Excel spreadsheet that performed the calculations using the following formula to compare each pair of areas. The formula is taken from Wonnacott and Wonnacott (1990):
Where the greek letters pi-sub1 and pi-sub2 are the real proportions in the populations from which the samples are drawn.
P1 and P2 are the frequency of the particular item (e.g. Kettle Point chert) each of the two areas being compared.
n1 and n2 are the total number of flakes in each area.
Interpreting the results of this calculation is simply answering the question, "Is zero included within the resulting confidence interval. If the answer is "yes" the differences are not statistically significant. If the answer is "No" the differences are statistically significant. The results of these calculations on the individual sites are included in Appendix C: Chapter Five Tables and the differences that are significant are highlighted.
The result of this analysis is that the differences in percentage of various site areas can be quickly calculated and compared. Once the initial maps of artefact distributions are prepared it is relatively simple to run a number of iterations on the analysis simply by creating different maps defining the cultural areas.
Copywrite 2003 - James R. Keron