|Friday, February 21|
|CS07 Big Data in the Real World||
Fri, Feb 21, 11:00 AM - 12:30 PM
Working with Complex Sizeable (i.e. Gigabyte) Data on a PC: A Case Study (302718)*Pete Michael Sherick, Lubrizol Corporation
Keywords: large data, applied statistics, statistical engineering, SAS, R, Lubrizol, Regression
On its grandest scale, machine learning, data mining and cloud/distributed computing techniques are changing the way we gather and process information. This is the Big Data revolution that is transforming marketing, health care, banking, industry, manufacturing, government and countless other sectors. On a smaller scale, as the computational power and storage capacity of personal computers has increased in the past decade, so has the potential for larger and larger statistical analysis. However, analysis of even a few gigabytes of data brings a number of obstacles to extracting meaningful and usable information. This talk will feature an example analysis of kinematic viscosity results for over 100,000 automotive lubricant formulations encompassing more than 10,000 components. I will discuss our current process including retrieval, formatting, cleaning, analyzing, and ultimately uploading the resulting model into user accessible applications. Issues that statisticians may run into with data of this size and possible solutions will be suggested. SAS and R software packages will be the primary focus and their respective benefits, limitations will be discussed.