|Saturday, February 22|
|CS19 Problems of Size and Variety||
Sat, Feb 22, 9:00 AM - 10:30 AM
LP Comoment Multivariate Mixed Data Modeling and Application to BigData (302698)*Subhadeep (Deep) Mukhopadhyay, Temple University, Fox Business School
Keywords: BigData, Mixed Data Problem, LP Comoment, specially design orthonormal score functions
What are the unique challenges of BigData Mining? ‘3 V’s’ : Variety, Velocity and Volume. In the past decade there has been tremendous progress made in the last two categories, which created tools like NoSQL, Hadoop and MapReduce. The problem of `variety’ : Finding relations, patterns from large pool of complex heterogeneous data sets, which can be used better business value, decision making, customer service or risk analytics. In this article, we propose a solution for the problem of ``variety’’ - The Mixed Data Problem. Our algorithm gives the ability to tackle and process different types of data simultaneously to extract knowledge. We call this United Statistical Algorithm for BigData Analysis. Our proposed statistical solution to the Multivariate Mixed Data Problem based on a modern tool - LP Comoment. Fundamental role played by our specially designed (data-adaptive) orthogonormal polynomials - LP Score functions – for parsimonious representation of complex data. We develop a novel correlation measure called `LPINFOR’ for finding nonlinear pattern in large data sets in a completely nonparametric and robust way, which can be interpreted from information theoretic point of view.