Proportional Split Analysis Using ESRI’s ModelBuilder
- Details
- Thursday, 12 January 2012 17:05 GISEC Staff Hits: 791
Often GIS analysts who work with vector data, particularly demographic vector data, are interested in the proportion of a variable within a defined distance of a particular feature or set of features. An example of this may be the number of households within ½ mile of a park using census tracts. Depending on the data scale, performing such analyses may not be as simple as a spatial join, in which case a proportional split model can be used to correctly account for dissimilar polygon shapes and overlap.
Specifically, a buffered feature may potentially intersect multiple polygons, with different proportions of each of the feature included within the buffer. A proportional split method can statistically correct for the varying proportional differences of the polygons included in the buffer. This is computed by multiplying the proportion overlap percentage by the z variable of interest and then summing those proportions for a final statistic. Please note that the final statistic may still be biased because it cannot typically be assumed that there is uniform distribution of a specific variable (e.g. households) throughout the polygon.
Unfortunately ESRI’s ArcMap 10 does not have a proportional split tool available in any of their toolboxes or extension licenses. Creating the tool in ESRI’s Model Builder is easy as long as one knows the basics of the tools and analyses to be performed. If you are not familiar with Model Builder, it can be used to combine many tools from the ArcGIS toolbox into one tool that can be used to systematically run all the tools consecutively to save time and prevent error. What’s more, it can be used to run iterations of subsets of a large data set, which would normally overwhelm ArcMap. It is important to note that it is not necessary to use Model Builder to perform the analysis if only dealing with a small set of polygons that do not overwhelm certain tools, unlike the following example.
Example
As part of the City and County of San Francisco’s Pedestrian Safety Task Force, geographic data is being used to understand the correlation between pedestrian injury and demographic, built environment, and collision characteristics. One of the variables used in the overall analysis is the proportion of land zoned for residential, commercial, and other uses within ¼ mile of each of San Francisco’s 9,340 intersections. A proportional split tool was needed to perform the analysis because the land use zoning shapefile’s geographic scale was only available at the block level (each residential lot was dissolved within each block by zoning code). Because the data was presented at a smaller geographic scale, it was necessary to proportionally account for the percentage of the block that lies within the buffered distance (1/4 mile) of the intersections. If the data were available at the residential lot level the ‘spatial join’ tool would have been an easy and a nearly accurate alternative to perform this analysis. However, as you can see in Image 1, buffered intersections do not uniformly overlap the varying sizes of the census block groups.
Data Sources and Tools
Data Sources: Point shapefile of San Francisco’s intersections (San Francisco Department of Public Works, 2010), Polygon shapefile of the city’s land use classifications (San Francisco Planning Department, 2011)
Tools: ‘Buffer’ (Toolbox/Analysis Tools/Proximity/Buffer), ‘Intersect’ (Toolbox/Analysis Tools/Overlay/Intersect), ‘Add Field’ (Toolbox/Data Management/Fields/Add Field), ‘Calculate Area’ (Toolbox/Spatial Statistics Tools/Utilities/Calculate Areas), ‘Calculate Field’ (Toolbox/Data Management/Fields/Calculate Field), ‘Summary Statistic’ (Toolbox/Analysis Tools/Statistics/Summary Statistics), and ‘Iteration’ (Model Builder/Insert/Iterator/Feature Classes)
Methods
1. Preparing the Data: The polygon data was first simplified from 58 zoning codes into 8 discretionary general land use categories of residential, mixed-use, residential mixed-use, commercial, neighborhood-commercial, redevelopment, industrial, and public-use (see image). The polygon data was then split by the new land use category to be used in 8 separate proportional split models. Next, the ‘buffer’ tool was used to buffer the intersections a ¼ mile and a field was added to calculate area geometry (square feet).
2. Build the Model:
A new toolbox was created and a new model was added into it (right click -> edit will open the model builder in design view). For more information on the basics of using Model Builder consult the ESRI users online help. An ‘iteration’ was added to the model to allow the model to run a series of shapefile or table inputs in a folder in one model run, creating multiple outputs. For the zoning proportional split, the buffered intersections had to be split into nine 1,000 record shapefiles for the model to run. Tip: Make sure to read the ESRI online help about iterations. Use the naming convention “…._%Name%.ext” for each of the outputs to create a unique name that includes the original “Name” in the new file. Otherwise the model will return an error because the iterator would be attempting to overwrite a file of the same name (see image above).
3. Building the Proportional Split Model:
a. ‘Intersect’ the buffered Intersection polygon and zoning polygon shapefiles
-> Intersection_Zoning_Intersect_%Name%.shp
b. ‘Calculate Area’ of Intersection_Zoning_Intersect_%Name%.shp
-> Intersection_Zoning_Area_%Name%.shp
c. ‘Add a field’ to Intersection_Zoning_Area_%Name%.shp named ‘ZoneProp’
d. Field calculate ‘ZoneProp’ by dividing the area of the intersected polygon by the original buffered intersection area. The ‘ZoneProp’ field is the proportion of the zoning category within buffered intersection that belongs to that intersected sliver
e. Calculate a Statistical Summary for the Intersection_Zoning_Area_%Name%.shp. The new zoning proportion was summed and intersection ID was used as the case field to be dissolved by (the unique identifier for the intersection point shapefile).
-> Intersection_Zoning_Summary_%Name%.dbf
6. As part of the final steps, all of the summary .dbf’s were merged into one summary table with the intersection ID field, and the proportion of that land use zoned within ¼ mile for each of the eight zoning categories being analyzed (e.g. commercial zoning). Then once all of the zones were been run, they were joined back to a San Francisco intersection shapefile based on the Intersection ID.
Please note that if all of the land use proportions were summed for each intersection, they will rarely add to 1. This is because typically about 30% of the land use is actually roads and sidewalks.
Here is an example of the final commercial zoning proportional split model used for the discussed analysis:![]()
Written By:
Sarah A. Bergquist, Research Associate
Program on Health, Equity and Sustainability
San Francisco Department of Public Health
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
www.sfphes.org
Sarah Bergquist is a research associate with the City and County of San Francisco, Department of Public Health (SFDPH), Program on Health Equity and Sustainability. Her background includes a B.A. in Human Geography and Planning from California State University, Chico. Additional education included a Geographic Information Systems (GIS) Certificate, Rural and Town Planning Certificate, and a minor in statistics. Currently her work at SFDPH includes GIS and data analysis for pedestrian injury prevention in San Francisco.







