CP7003 DATA ANALYSIS AND BUSINESS INTELLIGENCE
UNIT I LINEAR REGRESSION
Introduction to data analysis – Statistical processes – statistical models – statistical inference – review of random variables and probability distributions – linear regression – one predictor – multiple predictors – prediction and validation – linear transformations – centering and standardizing – correlation – logarithmic transformations – other transformations – building regression models – fitting a series of regressions
UNIT II LOGISTIC AND GENERALIZED LINEAR MODELS
Logistic regression – logistic regression coefficients – latent-data formulation – building a logistic regression model – logistic regression with interactions – evaluating, checking, and comparing fitted logistic regressions – identifiability and separation – Poisson regression – logistic-binomial model – Probit regression – multinomial regression – robust regression using t model – building complex generalized linear models – constructive choice models
UNIT III SIMULATION AND CAUSAL INFERENCE 9 Simulation of probability models – summarizing linear regressions – simulation of non-linear predictions – predictive simulation for generalized linear models – fake-data simulation – simulating and comparing to actual data – predictive simulation to check the fit of a time-series model – causal inference – randomized experiments – observational studies – causal inference using advanced models – matching – instrumental variables
UNIT IV MULTILEVEL REGRESSION
Multilevel structures – clustered data – multilevel linear models – partial pooling – group-level predictors – model building and statistical significance – varying intercepts and slopes – scaled inverse-Wishart distribution – non-nested models – multi-level logistic regression – multi-level generalized linear models
UNIT V DATA COLLECTION AND MODEL UNDERSTANDING
Design of data collection – classical power calculations – multilevel power calculations – power calculation using fake-data simulation – understanding and summarizing fitted models – uncertainty and variability – variances – R2 and explained variance – multiple comparisons and statistical significance – analysis of variance – ANOVA and multilevel linear and general linear models – missing data imputation
REFERENCES:
1. Andrew Gelman and Jennifer Hill, "Data Analysis using Regression and multilevel/Hierarchical Models", Cambridge University Press, 2006.
2. Philipp K. Janert, "Data Analysis with Open Source Tools", O'Reilley, 2010.
3. Wes McKinney, "Python for Data Analysis", O'Reilley, 2012.
4. Davinderjit Sivia and John Skilling, "Data Analysis: A Bayesian Tutorial", Second Edition, Oxford University Press, 2006.
5. Robert Nisbelt, John Elder, and Gary Miner, "Handbook of statistical analysis and data mining applications", Academic Press, 2009.
6. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013.
7. John Maindonald and W. John Braun, "Data Analysis and Graphics Using R: An Examplebased Approach", Third Edition, Cambridge University Press, 2010.
8. David Ruppert, "Statistics and Data Analysis for Financial Engineering", Springer, 2011.
No comments:
Post a Comment