Friday, December 11, 2015

CP7019 MANAGING BIG DATA

CP7019         MANAGING BIG DATA

UNIT I            UNDERSTANDING BIG DATA

What is big data – why big data – convergence of key trends – unstructured data – industry examples of big data – web analytics – big data and marketing – fraud and big data – risk and big data – credit risk management – big data and algorithmic trading – big data and healthcare – big data in medicine – advertising and big data – big data technologies – introduction to Hadoop – open source technologies – cloud and big data – mobile business intelligence – Crowd sourcing analytics – inter and trans firewall analytics 
  
UNIT II         NOSQL DATA MANAGEMENT

Introduction to NoSQL – aggregate data models – aggregates – key-value and document data models – relationships – graph databases – schemaless databases – materialized views – distribution models – sharding – master-slave replication – peer-peer replication – sharding and replication – consistency – relaxing consistency – version stamps – map-reduce – partitioning and combining – composing map-reduce calculations   

UNIT III         BASICS OF HADOOP

Data format – analyzing data with Hadoop – scaling out – Hadoop streaming – Hadoop pipes – design of Hadoop distributed file system (HDFS) – HDFS concepts – Java interface – data flow – Hadoop I/O – data integrity – compression – serialization – Avro – file-based data structures   

UNIT IV       MAPREDUCE APPLICATIONS 

 MapReduce workflows – unit tests with MRUnit – test data and local tests – anatomy of MapReduce job run – classic Map-reduce – YARN – failures in classic Map-reduce and YARN – job scheduling – shuffle and sort – task execution – MapReduce types – input formats – output formats 
  
UNIT V        HADOOP RELATED TOOLS

Hbase – data model and implementations – Hbase clients – Hbase examples – praxis.Cassandra – cassandra data model – cassandra examples – cassandra clients – Hadoop integration. Pig – Grunt – pig data model – Pig Latin – developing and testing Pig Latin scripts. Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries. 

REFERENCES: 

1. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. 
2. P. J. Sadalage and M. Fowler, "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence", Addison-Wesley Professional, 2012. 
3. Tom White, "Hadoop: The Definitive Guide", Third Edition, O'Reilley, 2012. 
4. Eric Sammer, "Hadoop Operations", O'Reilley, 2012. 
5. E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012. 
6. Lars George, "HBase: The Definitive Guide", O'Reilley, 2011. 
7. Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010. 8. Alan Gates, "Programming Pig", O'Reilley, 2011.

No comments:

Post a Comment