Sunday, November 15, 2015

CP7003 DATA ANALYSIS AND BUSINESS INTELLIGENCE

CP7003     DATA ANALYSIS AND BUSINESS INTELLIGENCE

UNIT I            LINEAR REGRESSION

Introduction to data analysis – Statistical processes – statistical models – statistical inference – review of random variables and probability distributions – linear regression – one predictor – multiple predictors – prediction and validation – linear transformations – centering and standardizing – correlation – logarithmic transformations – other transformations – building regression models – fitting a series of regressions    

UNIT II        LOGISTIC AND GENERALIZED LINEAR MODELS 

Logistic regression – logistic regression coefficients – latent-data formulation – building a logistic regression model – logistic regression with interactions – evaluating, checking, and comparing fitted logistic regressions – identifiability and separation – Poisson regression – logistic-binomial model – Probit regression – multinomial regression – robust regression using t model – building complex generalized linear models – constructive choice models    
UNIT III      SIMULATION AND CAUSAL INFERENCE                                                                   9 Simulation of probability models – summarizing linear regressions – simulation of non-linear predictions – predictive simulation for generalized linear models – fake-data simulation – simulating and comparing to actual data – predictive simulation to check the fit of a time-series model – causal inference – randomized experiments – observational studies – causal inference using advanced models – matching – instrumental variables    

UNIT IV       MULTILEVEL REGRESSION

Multilevel structures – clustered data – multilevel linear models – partial pooling – group-level predictors – model building and statistical significance – varying intercepts and slopes – scaled inverse-Wishart distribution – non-nested models – multi-level logistic regression – multi-level generalized linear models    

UNIT V       DATA COLLECTION AND MODEL UNDERSTANDING

Design of data collection – classical power calculations – multilevel power calculations – power calculation using fake-data simulation – understanding and summarizing fitted models – uncertainty and variability – variances – R2 and explained variance – multiple comparisons and statistical significance – analysis of variance – ANOVA and multilevel linear and general linear models – missing data imputation

REFERENCES: 

1. Andrew Gelman and Jennifer Hill, "Data Analysis using Regression and multilevel/Hierarchical Models", Cambridge University Press, 2006. 
2. Philipp K. Janert, "Data Analysis with Open Source Tools", O'Reilley, 2010.  
3. Wes McKinney, "Python for Data Analysis", O'Reilley, 2012. 
4. Davinderjit Sivia and John Skilling, "Data Analysis: A Bayesian Tutorial", Second Edition, Oxford University Press, 2006.  
5. Robert Nisbelt, John Elder, and Gary Miner, "Handbook of statistical analysis and data mining applications", Academic Press, 2009.  
6. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. 
7. John Maindonald and W. John Braun, "Data Analysis and Graphics Using R: An Examplebased Approach", Third Edition, Cambridge University Press, 2010.  
8. David Ruppert, "Statistics and Data Analysis for Financial Engineering", Springer, 2011. 


No comments:

Post a Comment