685.648.82 - Data Science

Data Science
Summer 2026

Description

This course will cover the core concepts and skills in the interdisciplinary field of data science. These include problem identification and communication, probability, statistical inference, visualization, extract/transform/load (ETL), exploratory data analysis (EDA), linear and logistic regression, model evaluation and various machine learning algorithms such as random forests, k-means clustering, and association rules. The course recognizes that although data science uses machine learning techniques, it is not synonymous with machine learning. The course emphasizes an understanding of both data (through the use of systems theory, probability, and simulation) and algorithms (through the use of synthetic and real data sets). The guiding principles throughout are communication and reproducibility. The course is geared towards giving students direct experience in solving the programming and analytical challenges associated with data science. The assignments weight conceptual (assessments) and practical (labs, problem sets) understanding equally. Prerequisite(s): A working knowledge of Python scripting and SQL is assumed as all assignments are completed in Python.

Instructor

Profile photo of Andrew Stewart.

Andrew Stewart

andrew.stewart@jhu.edu

Course Structure

The course materials are divided into modules which can be accessed by clicking Modules on the menu in Canvas. A module will have several sections including the overview, content, readings, discussions, and assignments. You are encouraged to preview all sections of the module before starting. All modules run for a period of seven (7) days, Wednesday to Tuesday.

You should check Canvas for announcements every day. Additionally, make sure that the email on file with the Registrar and associated with your student account is either forwarded to an email you check everyday or is an email you check every day.

Online courses are much more challenging than in-person courses. We both know this from personal experience. Additionally, text communication lacks nuance so the opportunities for misunderstanding or misinterpretation are increased. Try both to be on your best behavior, take an extra minute to think “is this really the way I want to phrase this?” before posting. Be magnanimous in the interpretations of others intentions. Or, as they say, never attribute to malice what can be attributed to an honest mistake.

We are a class of students, colleagues, interested in learning to do data science and we should try to help each other the best we can (although see the specific restrictions below). In that vein, use of unofficial channels outside the official Canvas and Teams channels will be considered to be violations of the EP JHU Academic Misconduct Policy.

We send an Announcement at the start of a Module. You should also check the Calendar for due dates. There is also important information in the Course Outline of the Syllabus.

Course Topics

Course Goals

The goal of this course is to introduce you to the foundational topics and skills of a Data Scientist. At the end of the course, you should be able to determine if a problem is appropriate to Data Science, produce a plan for solving the problem and execute on that plan following the Data Science Process.

Course Learning Outcomes (CLOs)

Textbooks

There is no required textbook for this course because no such textbook exists. There are quite a few “popular” books on Data Science published by O’Reilly but none are suitable for a graduate level course in Data Science.

Instead, there are course notes in the form of a book entitled, Fundamentals of Data Science or Fundamentals, for short. The book is in a constant state of editing and updating and is always more current than the recorded lectures. You should focus on the readings.

Required Software

You can find the software instructions here:

https://jhep-datascience.github.io/EN.685.648.82/

Student Coursework Requirements

The suggested course load at JHU EP is one course per semester. Additionally, this a graduate course, not an undergraduate course and therefore the expectations are higher. Note that full time students in graduate programs at Homewood take 2-3 courses per semester. If you are taking that many as a part time student, with a full time job, you can easily over leverage yourself.

You must make sure you have completed the respective Core Foundation courses and that you meet the technical requirements of the course (competency with Python, SQL, etc).

There is the additional challenge that completing assignments that involve both data and programming is fairly non-deterministic. One mistake can set you back an hour or more. It is advised to work more slowly and methodically to finish faster.

This course is highly structured and there are different kinds of assignments due every week, all with different learning goals.

Lectures and Readings

Each week starts with the lectures and readings. The goal of the lectures and readings is to introduce the topics and concepts of the module and give you examples of their application using various frameworks, guidelines, and processes. Pay special attention to frameworks (you will need to be able to apply them) and examples (they are there as references). The Lectures are recorded. The readings are in the corresponding chapters of Fundamentals, both the PDF and the corresponding notebooks. The PDF/notebooks are always more up-to-date than the Lectures because the text Fundamentals is under constant revision. 

Some modules may also include supplemental readings which can enhance your understanding of the materials but which are not covered in corresponding assignments.

Assessments

Each Module will include an Assessment. The purpose of Assessments is to gauge your conceptual understanding of the topics covered. "Quizzing" also helps you retain information better. These Assessments are cumulative. They will contain 10 questions from the current Module and 5 questions from any of the previous Modules (15 total). The Assessment is time limited to 30 minutes. It will automatically submit after that time. You have only one attempt. You will be shown one question at a time with no backtracking.

The Assessment is due on Tuesday (you can always do it earlier). Assessments are individual effort. They are not group projects (yes, we have to say this).

Suggestion: Fundamentals contains review questions at the end of each chapter.  You are highly encouraged to think through them, both by yourself and in group discussion.

Labs

Each week there is a Lab that asks you to apply the methods and concepts covered in the Module. The goal of the Lab is to demonstrate the ability to do data science, or at least that part of data science covered in the Module. The Lab is simultaneously a group and individual effort, as explicitly noted in the sections below. You should actually review the Lab before you start the lectures and readings for the week so that you get a sense of where the topic is going (and the Review questions in Fundamentals!).

The entirety of the process below factors into your Lab Participation grade.  While there may seem like a lot of content described here, the process essentially amounts to a structured set of discussion posts.

Lab Groups Discussion

You will undoubtedly have questions about the concepts and the Lab. At the start of the semester, you will be assigned to a Lab Group of about 3-6 other students. If you have any questions about the Module materials, readings, concepts, and especially the Lab, you should ask questions in your Lab Group. You should not post a question about the Module content to the instructor that you did not already ask in your Lab Group.

Here's the basic weekly discussion schedule...

If other members of your Lab Group have questions, you should endeavor to answer them. Class participation depends on both asking and answering questions. However, you should not give away the answers to any assignment. Do not just paste code. You should understand what you’re doing and why; you should always understand what the code is doing.

The pre-submission discussion will take place on the group discussion medium selected for the semester (Canvas/Teams/Slack). We hope that is will lead to better and more immediate discussions.

Lab Notebook Submissions (Individual)

On Sunday you will submit your individual Lab notebook to Canvas as either a PDF or HTML export of your Jupyter notebook.

Labs will be assigned a pass/fail grade based on timely submission and best effort, despite not counting towards your final grade calculation.  Note that Labs are part of the learning process in this course, and it is entirely okay if there are parts that you are unable to accurately complete by the Sunday due date - it is far more important that you explain your approach the best that you can and submit your notebook on time so that you can participate in the remedial discussions than it is to submit a perfect notebook that is late.  With that said, do not leave your notebook blank either. You must complete all labs in the course.

Lab Solutions, Self-Checks and Peer Reviews

On Monday, a solution to the Lab will be released, and you may freely share and discuss your Lab with the rest of your Lab Group. You should compare your submission against the provided Solution, taking note of any errors, misconceptions or other discrepancies in your attempt.  You are encouraged to leave notes in the comments of your own submission.

For most Labs, we will use Peer Reviews in Canvas to facilitate discussion. You will be assigned submissions from two other students in the class that you can review (for your own edification) and provide feedback based on comparison with the posted Lab Solution.  For Peer Reviews, the goal isn't really to conduct a comprehensive grading, but rather to provide feedback as you would if it were a discussion board comment.  Your feedback should be substantive - don't just comment with "Good job!"

The Peer Review is due by Tuesday. Evaluations and Peer Reviews count as part of your own Lab score so if you complete a Lab and fail to participate in the feedback discussions, you get a 0 for the Lab.

Lab Reports

By the end of the module, each lab group will submit a brief Lab Report summarizing the group’s collective understanding of the lab. The purpose of the Lab Report is not to redo the lab or provide a polished write-up, but to synthesize results, reflect on methods, and surface unresolved questions to guide discussion during office hours.  Realistically, we are talking about as much content as a discussion board post or maybe 2-3 slides worth of content.  

The Lab Report is submitted by a rotating Lab Report Lead on behalf of the group and should be written directly in the Canvas submission page. Lab Groups may decide this rotation however they choose to, and are encouraged to write down a schedule at the start of the semester.

Lab Reports should follow a short IMRaD-style structure:

Lab Reports should be concise and focused. They are intended to support collective reflection and in-class discussion, not formal publication.  They are due by Wednesday's Office Hours where they will be used to guide feedback and discussion. Group members are encouraged to briefly present or discuss their report (including their individual Lab Notebooks) during office hours.

Also by Wednesday, you should have reviewed all the Lab Group discussions and any Peer Reviews to see if there's anything you may have missed in your own submission.  If you have any remaining questions or points of confusion, these are typically good topics to bring to Office Hours.

The quality of the course and learning experience depends on the quality of the group interactions, which depends on:

1. Asking and answering questions in a skillful way.
2. Posting your Lab on time.
3. Posting a substantive evaluation of your peer(s) on time.
4. Reviewing all group discussions on time.

Exams / Problem Sets

The course builds by starting with lectures and readings, Labs and discussion, assessments on concepts, and culminates in Exams consisting of Problem Sets. Each problem set covers the materials of designated module. These are the real exams of the course, and are tests of your ability to do data science.

There will be a Midterm exam and a Final exam, each consisting of three problem sets.

You should always refer to the corresponding Module Labs, Lab Solutions, and Examples in Fundamentals when completing a Problem Set. As with exams, we give you less direction. This is why the Labs are so important. If, on a Problem Set, we give you a data set and say, “Do EDA”, you should be able to do it according to the framework presented in the course. Problem Sets are exams and should be treated as such. They are individual effort. They are not group projects.

Grading Policy

Assignments are due according to the dates posted on Canvas or Course Outline. You may check these due dates in the Course Calendar or the Assignments in the corresponding modules. We will try to post grades one week after assignment due dates but no later than two weeks.

It’s worth noting that the due date is not the do date. You can and should start much earlier in the week on your assignments.

Grading Standards


“A” – Excellent. You completed the assignment in a timely manner, demonstrating a thorough understanding of the technique, tool or concept and conducted an excellent exploration of its use. If it is a discussion, your post was substantive, did not just quote other materials, and contributed to the on-going discussion. You went above what was required, asked for or expected. Over the course of the semester, this means consistent excellence and distinction throughout the course—that is, conspicuous excellence in all aspects of assignments and discussion in every week.

“B” – Satisfactory. You completed the assignment in a timely manner, you did exactly what was requested, demonstrating a sufficient understanding of the technique, tool, or concept. There may have been minor deficiencies. If it was a discussion post, the post contributed to the discussion but it may have been a reference to other materials or perhaps even slightly off topic. You may have done more too much in the hopes that something was correct. Verbosity is often a sign of some confusion. Over the course of the semester, this means work that meets all course requirements on a level appropriate for graduate academic work.

“C”, “D” – Unsatisfactory. You either did not complete the assignment, it was not timely or you did what was minimally required. There are significant areas of confusion. A lack of exploration or curiosity about the concept, tool or technique. If it was a discussion post, it may have been off topic. Listing many things, hoping that one is correct, is often a sign of confusion.

“F” - Oops. You did not submit the assignment on-time or post on-time or no bona fide effort was evident.

We cannot stress this enough, merely working hard is not grounds for an A. You have to do the right thing in the right way. Writing too much is often a sign of confusion.

We generally do not directly grade spelling and grammar. However, egregious violations of the rules of the English language will be noted without comment. Consistently poor performance in either spelling or grammar is taken as an indication of poor written communication ability that may detract from your grade.

However, communication is very important in Data Science so we will tend to be a bit more picky about formatting, grammar and spelling. If your submissions look like a ransom note, however correct they might otherwise be, they will be counted as wrong.

Grading System

We use a classic letter grading system in this course. You must have sufficient mastery over the topics to get a B, or an A.

Here is the breakdown for the final grade calculations:


Lab Participation10%
Assessments10%
Midterm Exam40%
Final Exam40%

A few assignments are binary (pass/fail). We merely note if you turned them in or if they had an acceptable level of effort (an incomplete Lab might be a 0, for example).

Course Policies

Grade Revision Policy

If you score a C or D on any problem set within the Midterm Exam, you may be offered the opportunity to revise your submission for a potential increase of up to one letter grade improvement of that problem set.  Eligibility for revisions will depend on your lab and class participation regarding the relevant course module(s).

As the semester unfolds, we may find it necessary to adjust both the assignments, criteria or both. We may award "pluses and minuses" at our discretion.  When issued, these are generally used to indicate which direction the letter grade "leans".

Late Policy

We do not accept late submissions for a grade without prior consultation, except in the case of extreme emergencies (the birth of a child, incapacitating illness, etc). The following are not legitimate reasons: work, taking other classes, weddings, family reunions, holidays, anniversaries, vacations, etc. However, emergencies of all stripes to arise. The key here is prior consultation.

The main issue is that lateness on Module 5’s assignments snowballs into Module 6’s assignments, etc. To fall behind may mean never being able to catch up. Additionally, with things like the Lab Groups, other students are counting on your participation.

Code Attribution and Citation Policy

This course is not a programming or software engineering course. Most assignment code is provided to students, and students are not evaluated on their ability to invent novel algorithms or write large amounts of original code. Instead, the emphasis is on understanding, applying, and correctly interpreting methods introduced in the course.

Accordingly, all submitted code must be clearly commented to indicate which course methods, concepts, or materials are being applied. When a solution relies on ideas from lectures, readings, labs, or provided examples, students must include brief comments citing the relevant source (for example, module name, lab number, or section title).

Any use of external references (for example, Stack Overflow, documentation sites, or blog posts) must also be explicitly cited in code comments with a link or clear reference. Such external references should be rare and used only when appropriate; students are expected to rely primarily on course materials.

Failure to provide appropriate attribution in code comments will be treated as a documentation and academic integrity issue, not merely a style concern. Clear attribution is a required component of professional data science practice and is part of what is being assessed in this course.

Artificial Intelligence (AI) Policy

The use of generative AI tools (including, but not limited to, ChatGPT, Claude, Gemini, Copilot, or similar systems) is not permitted on any course assignments. All submitted work must be the student’s own original thinking, analysis, and writing.

Students are advised that generative AI tools frequently produce outputs that are factually incorrect, logically inconsistent, or misapplied to the specific context of this course. Submissions that show evidence of AI-generated content will be treated as academic dishonesty and handled in accordance with university policy.

That said, students are

When in doubt, assume that any use of AI beyond passive learning or clarification is prohibited. Students are responsible for understanding and complying with this policy.

Academic Policies

Deadlines for Adding, Dropping, and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar. Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course. 

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students. This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Our courses are designed with a proactive approach to accessibility to minimize the need for disability disclosure and accommodation requests, but we recognize that you may need additional support. Students with disabilities (including those with psychological conditions, medical conditions, and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.  For further information or to start the process of requesting accommodations, please contact EP Student Disability Services at ep-disability-svcs@jhu.edu

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. For a full description of the code please visit the Student Conduct Code website.

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team (EP-Registration@exchange.johnshopkins.edu) in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.