605.788.81 - Big Data Processing Using Hadoop

Computer Science
Spring 2024


Organizations today are generating massive amounts of data that are too large and too unwieldy to fit in relational databases. Therefore, organizations and enterprises are turning to massively parallel computing solutions such as Hadoop for help. The Apache Hadoop platform, with Hadoop Distributed File System (HDFS) and MapReduce (M/R) framework at its core, allows for distributed processing of large data sets across clusters of computers using the map and reduce programming model. It is designed to scale up from a single server to thousands of machines, offering local computation and storage. The Hadoop ecosystem is sizable in nature and includes many subprojects such as Hive and Pig for big data analytics, HBase for real-time access to big data, Zookeeper for distributed transaction process management, and Oozie for workflow. This course breaks down the walls of complexity of distributed processing of big data by providing a practical approach to developing applications on top of the Hadoop platform. By completing this course, students will gain an in-depth understanding of how MapReduce and Distributed File Systems work. In addition, they will be able to author Hadoop-based MapReduce applications in Java and also leverage Hadoop subprojects to build powerful data processing applications. Course Note(s): This course may be counted toward a threecourse track in Data Science and Cloud Computing.


Course Structure

The course materials are divided into modules which can be accessed by clicking Course Modules on the course menu. A module will have several sections including the overview, content, readings, discussions, and assignments. You are encouraged to preview all sections of the module before starting. Most modules run for a period of seven (7) days, exceptions are noted in the Course Outline. You should regularly check the Calendar and Announcements for assignment due dates.

Course Topics

Hadoop Architecture and Ecosystem
Setting up Hadoop
Hadoop Distributed File System (HDFS) Architecture
HDFS Programming
YARN and MapReduce Architecture
MapReduce Programming
Data Analysis using Hive
Data Analysis using Pig
Hadoop NoSQL Database HBase
Spark Overview
Collaboration and Big Data
Hadoop In Production
Miscellaneous Hadoop Topics

Course Goals

Students should be able to describe the Apache Hadoop Architecture including HDFS and MapReduce, apply the HDFS and MapReduce Programming models leveraging the Hadoop Java APIs, use Pig and Hive for data analysis, implement big table schema using HBase, and build Spark applications that can be deployed in Yarn.

Course Learning Outcomes (CLOs)


not required

Required Software

Oracle VirtualBox (free download)

Student Coursework Requirements

Each module will open on Wednesday and will close on Tuesday at 11:59pm ET. Students are required to participate in discussions, submit assignments, take quizzes, and complete a semester project. There will be one or two discussion topics for each module and students are required to make an initial post and at least one reply another students post for each discussion topic. There will be 10 assignments and 8 quizzes given during the 14 weeks. Students will provide a project proposal during module 7 and will submit a recorded presentation at the end of module 14. 

 Coursework  Weighting 
 Discussions 11%
 Assignments  50%
 Quizzes 24%
 Project 15%

Grading Policy

Discussions will be graded on timeliness and quality. Students should have their initial post to each topic during the first half of the module week so that other students have time to read and respond. A students post should be informative, concise, and add to the discussion.

All assignments are due according to the dates in the Course Outline. Late submissions will be reduced by 10 points for each week late (no exceptions without prior coordination with the instructors).

Each quiz must be taken by the due date. There is only one attempt allowed so students should study the material before attempting the quiz.

I use the EP standard grading policy based upon the students weighted score (as described above).

Score RangeLetter Grade
100-97= A+
96-93= A
= A-
89-87= B+
86-83= B
82-80= B-
79-77= C+
76-73= C
72-70= D+
69-67= D
<63= F

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.

This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.

For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, ep-disability-svcs@jhu.edu.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. 

For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. 
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.