685.652.8VL - Data Engineering Principles and Practice

Data Science
Summer 2024

Description

Data Engineering is the ingestion, transformation, storage and serving of data in ways that enable data scientists or applications to use and derive insights from data. In this course, we will look at various file-based data formats, data collection, data cleansing, data transformation, and data modeling for both relational and NoSQL databases. The course will also cover movement of data into data warehouses and/or data lakes using pipelines and workflow automation. Finally, we will discuss data security, governance, and compliance. The format of this course will be a mix of lectures, hands-on demos, and labs. Upon completing this course, students will have a deeper understanding of what a data engineer does and the various technologies that make up data engineering, along with hands-on experience working with various tools and processes.

Instructors

Default placeholder image. No profile image found for Eli Schultz.

Eli Schultz

schultzelib@gmail.com

Default placeholder image. No profile image found for James Mosko.

James Mosko

Course Structure

The course materials are divided into modules which can be accessed by clicking Modules on the course menu. A module will have several sections including the overview, content, readings, discussions, and assignments. You are encouraged to preview all sections of the module before starting. Most modules run for a period of seven (7) days, exceptions are noted in the Course Outline. You should regularly check the Calendar and Announcements for assignment due dates.

Course Topics

  1. Data Engineering Overview
  2. Data and File Formats
  3. Data Processing using Unix Tools
  4. RDBMS and Data Modeling
  5. RDBMS Select, Data Definition, Manipulation and Control
  6. Data Processing using Python Pandas
  7. Workflow Management with Cron and Airflow
  8. Web Scraping and RESTful Services
  9. Blob Storage and NoSQL Databases
  10. Streaming and Containerization
  11. Data Security, Data Privacy and Data Governance / Data Lake and Data Warehouse
  12. Cloud and Big Data

Course Goals

To identify, describe the responsibilities and tools for data engineering and then apply that knowledge to demonstrate the skills learned across the different steps in the data engineering lifecycle

Textbooks

not required

Required Software

Docker
You will need to install Docker Desktop. Information on how to install and configure will be provided in Module 1, don't install now - please wait until then.

pgAmin4
You will need to install pgAdmin4. Information on how to install and configure will be provided in Module 1, don't install now - please wait until then.

Student Coursework Requirements

This course will consist of four basic student requirements:

Requirement 1: Participation (Class Attendance & Discussions) (10% of Final Grade Calculation)


Class attendance is mandatory. Occasionally, health, family or personal matters may interfere with your ability to attend class. In this situation, you are expected to notify your professors as soon as you’re able about missing class and discuss how to make up missed class time or assignments.

In the Discussions area of the course, you, as a student, can interact with your instructor and classmates to explore questions and comments related to the content of this course. Discussions will always close Tuesday, 11:59 P.M. of that week.

 
Requirement 2: Assignments (39% of Final Grade Calculation)

There will be 12 assignments during the term of 12 weeks. The assignment details will be listed in the assignment section of the respective modules.  All assignments are due at the end of each module, Tuesday, 11:59 PM. The assignment details will be listed in the assignment section of the respective modules. All assignments are due according to the dates in the Course Outline. Late submissions will be reduced by one letter grade for each day late (no exceptions without prior coordination with the instructors).

Requirement 3: Quizzes (39% of Final Grade Calculation)

There will be 12 quizzes during the term of 12 weeks. The quizzes may be combinations of True/False, multiple choices, fill in the blanks etc. Check the course outline to see the due dates for these quizzes. All quizzes are due at the end of each module, Tuesday, 11:59 PM.


Requirement 4: Class Project (12 % of Final Grade Calculation)

A class project will be assigned in the fourth module and more details will be given in that module.

Grading Policy

Score RangeLetter Grade
100-97= A+
96-93= A
92-90= A−
89-87= B+
86-83= B
82-80= B−
79-77= C+
76-73= C
72-70= C−
69-67= D+
66-63= D
<63= F

Course Policies

All discussions, assignments, and quizzes are due at the end of each module. Late assignments will lose 10% per day.

Students are expected to submit the following to receive a grade for the course:

Not submitting a deliverable will receive a grade of 0 for that activity be it quiz, discussion, assignment, or project.

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.

This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.

For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, ep-disability-svcs@jhu.edu.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. 

For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. 
 
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.