Online Program

Saturday, February 22
CS19 Problems of Size and Variety Sat, Feb 22, 9:00 AM - 10:30 AM
Bayshore VI

LP Comoment Multivariate Mixed Data Modeling and Application to Big Data (302698)

*Subhadeep (Deep) Mukhopadhyay, Temple University, Fox Business School 

Keywords: Big Data, Mixed Data Problem, LP Comoment, specially design orthonormal score functions

What are the unique challenges of Big Data mining? The "‘3 Vs": variety, velocity, and volume. In the past decade, there has been tremendous progress made in the last two categories, which created tools such as NoSQL, Hadoop, and MapReduce. The problem of variety: finding relations, patterns from large pool of complex heterogeneous data sets, which can be used for better business value, decisionmaking, customer service, or risk analytics. In this article, we propose a solution for the problem of variety - The Mixed Data Problem. Our algorithm gives the ability to tackle and process different types of data simultaneously to extract knowledge. We call this United Statistical Algorithm for BigData Analysis. Our proposed statistical solution to the multivariate mixed data problem is based on a modern tool, LP Comoment. Fundamental role played by our specially designed (data-adaptive) orthogonormal polynomials – LP Score functions – for parsimonious representation of complex data. We develop a novel correlation measure called LPINFOR for finding nonlinear patterns in large data sets in a completely nonparametric and robust way, which can be interpreted from an information theoretic point of view.