Statistical Challenges in Analyzing Government R&D Project Data for the Effective Budget CompilationYongjung Kim, KISTEP
Heunggwon Lee, KISTEP
*Hyunsook Lee, KISTEP
Keywords: Government R&D project data, budget compilation, multivariate data analysis, benchmarking, machine learning, econometrics
There are about forty thousand government financed R&D projects in Korea. For a decade, KISTEP has collected diverse input and output information of these projects to help the government overseeing the fast growing number of projects. Normally, the attributes of this data set range from project budget size to the number of patents or papers while variables are added and modified annually. Analyzing this complex evolving multidimensional data has been a key element for efficient budget compilation for the following year budget. However, the analysis strategy has been confined to exploring variable by variable, or simple exploratory data analysis on selected individual variables. Rigorous statistical inference or data mining techniques are rarely applied on this huge data set. We just began such attempts in 2011 and results from utilizing these methods from benchmarking, machine learning, econometrics have been provided to government officials for the YR 2012 budget compilation. In this presentation, we will show these preliminary results, from the statistical methodology standpoint.