Modeling Document Counts in Indian Trust Archive Box Searches
Edward Mulrow, NORC at the University of Chicago 
Santanu Pramanik, NORC at the University of Chicago 

Keywords: zero-inflated model, mixture model, archive search

Zero-inflated models, which are a mixture of separate data generation processes, can be effective for modeling count data that contain an excess of zeros and are over dispersed. We found this type of model to be useful in helping The Department of the Interior (DOI) Office of Historical Trust Accounting (OHTA) consider alternative ways to proceed with a court ordered search of over 34,000 archive boxes that might contain Indian Trust documents related to a litigating tribe. Data from an initial search was used to fit a set of zero-inflated models. Details related to fitting these models and criteria for choosing the most appropriate model are provided. Based on the selected model, we provide point and interval estimates for the number of documents likely to be found in the remaining unsearched boxes, and the number of boxes with no documents. We also recommend a revised box search order that could enable OHTA to find most of the remaining documents in a shorter amount of time.