BIG DATA ANALYTICS

Department: AI&DS, Class: B.Tech, Semester: 6, Year : 3

Lecture Slides | Syllabus | Java Installation in Ubuntu |Week1

HADOOP

Experiments : 5, 6


BDA LAB: Syllabus

UNIT I

Introduction to Big data: Types of Digital Data, Classification of Digital Data, Characteristics of Data, Evolution of Big Data, Definition of Big Data, Challenges with Big Data, What is Big Data?, Other Characteristics of Data Which are not Definitional Traits of Big Data, Why Big Data?, analyzing Data with Unix tools, Analyzing Data with Hadoop, Hadoop Streaming, Hadoop Echo System.

UNIT II

Hadoop Distributed File System: The Design of HDFS, HDFS Concepts, Command Line Interface, Hadoop file system interfaces, Dataflow, Data Ingestion with Sqoop and Hadoop archives, Hadoop I/O: Compression, Serialization, Avroand File-Based Data structures.

UNIT III

Map Reduce Technique: How Map Reduce works?, Anatomy of a Map Reduce Job Run, Failures, Job Scheduling, Shuffle and Sort, Task Execution, Map Reduce Types and Formats, Map Reduce Features.

UNIT IV

Structured Data Processing Tools Hive: Installation, Running Hive, Hive QL, Tables, Querying Data, User Defined functions Sqoop: Introduction, generate code, Database import, working with imported data, Importing large objects , performing an exports

UNIT V

Semi-structured and unstructured Data Processing Tools Pig: Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig Latin, User Defined Functions, Data Processing operators. HBase: Basics, Concepts, Clients , Example ,HBase Versus RDBMS..

TEXT BOOKS:

1. Tom White" Hadoop: The Definitive Guide" Third Edit,O'reilyMedia,2012.

2. Big Data and Analytics, 2ed Seema Acharya, Subhashini Chellappan,Wiley2015.


REFERENCE BOOKS:

1. Michael Berthold,DavidJ.Hand,"IntelligentDataAnalysis",Springer,2007.

2. Jay Liebowitz ,"Big Data and Business Analytics" Auerbach Publications ,CRCpress (2013)

3. Tom Plunkett, Mark Hornick, "Using R to Unlock the Value of Big Data: Big Data Analytics with Oracle R Enterprise and Oracle R Connector for Hadoop",McGraw-Hill/Osborne Media(2013),Oracle press.

4. AnandRajaramanandJefreyDavidUlman,"MiningofMassiveDatasets",Cambri dgeUniversityPress,2012.

BDA: Syllabus

Pre-requisite: Should have knowledge of one Programming Language (Java

preferably), Practice of SQL (queries and sub queries), exposure to Linux Environment.

Course Educational Objective: The Objective of the course is to provide practical,

foundation level training that enables immediate and effective participation in Big Data

and other Analytics projects using Hadoop and Data Visualization using Tableau.

Course Outcomes(CO): At the end of this course, the student will be able to:


CO1: Demonstrate the installation of Big data analytic tools.(Understand–L2)

CO2: Apply data modeling techniques to large datasets. (Apply–L3)

CO3: Conduct exploratory data analysis using visualization.(Understand–L2)

CO4: Improve individual / teamwork skills, communication & report writing skills with

ethical values.


List of Experiments

1. Refreshing Linux Commands and Installation of Hadoop

2. Implementation of Run a basic Word Count Map Reduce program

3. Implementation of Matrix Multiplication with Hadoop Map Reduce.

4. Implementation of Weather mining by taking weather dataset using Map Reduce.

5. Installation of Hive along with practice examples.

6. Installation of Sqoop along with Practice examples.

7. Downloading and installing Tableau Understanding about importing data,

saving,opening, and sharing work books.

8. Data Preparation with Tableau

9. Charts: Bar Charts, Legends, Filters ,and Hierarchies ,Step Charts, Line Charts.

10. Maps: Symbol Maps, Filled Maps, Density Maps, Maps with Pie Charts

11. Interactive Dash boards

TEXTBOOKS

1. Seema Acharya, Subhasini Chellappan,"BigDataAnalytics"Wiley2015

2. Alexander Loth,“Visual Analytics withTableau”,ISBN:978-1-119-56020-3, Wiley2019

REFERENCES

1. Tom White, “Hadoop: The Definitive Guide”,ThirdEditon,O‘reilyMedia,2012

2. Michael Berthold, David J. Hand, "Intelligent Data Analysis”Springer,2007.

3. JayLiebowitz,“BigDataandBusinessAnalytics”AuerbachPublications,CRC press(2013).

4. Anand Rajaraman and Jefrey David Ulman, “Mining of Massive Datasets”,