This course will cover the core concepts and skills in the interdisciplinary field of data science. These include problem identification and communication, probability, statistical inference, visualization, extract/transform/load (ETL), exploratory data analysis (EDA), linear and logistic regression, model evaluation and various machine learning algorithms such as random forests, k-means clustering, and association rules. The course recognizes that although data science uses machine learning techniques, it is not synonymous with machine learning. The course emphasizes an understanding of both data (through the use of systems theory, probability, and simulation) and algorithms (through the use of synthetic and real data sets). The guiding principles throughout are communication and reproducibility. The course is geared towards giving students direct experience in solving the programming and analytical challenges associated with data science. The assignments weight conceptual (assessments) and practical (labs, problem sets) understanding equally. Prerequisite(s): A working knowledge of Python scripting and SQL is assumed as all assignments are completed in Python.
The course materials are divided into modules which can be accessed by clicking Modules on the menu in Canvas. A module will have several sections including the overview, content, readings, discussions, and assignments. You are encouraged to preview all sections of the module before starting. All modules run for a period of seven (7) days, Wednesday to Tuesday.
You should check Canvas for announcements every day. Additionally, make sure that the email on file with the Registrar and associated with your student account is either forwarded to an email you check everyday or is an email you check every day.
Online courses are much more challenging than in-person courses. We both know this from personal experience. Additionally, text communication lacks nuance so the opportunities for misunderstanding or misinterpretation are increased. Try both to be on your best behavior, take an extra minute to think “is this really the way I want to phrase this?” before posting. Be magnanimous in the interpretations of others intentions. Or, as they say, never attribute to malice what can be attributed to an honest mistake.
We are a class of students, colleagues, interested in learning to do data science and we should try to help each other the best we can (although see the specific restrictions below). In that vein, use of unofficial channels outside the official Canvas and Teams channels will be considered to be violations of the EP JHU Academic Misconduct Policy.
We send an Announcement at the start of a Module. You should also check the Calendar for due dates. There is also important information in the Course Outline of the Syllabus.
The goal of this course is to introduce you to the foundational topics and skills of a Data Scientist. At the end of the course, you should be able to determine if a problem is appropriate to Data Science, produce a plan for solving the problem and execute on that plan following the Data Science Process.
There is no required textbook for this course because no such textbook exists. There are quite a few “popular” books on Data Science published by O’Reilly but none are suitable for a graduate level course in Data Science.
Instead, there are course notes in the form of a book entitled, Fundamentals of Data Science or Fundamentals, for short. The book is in a constant state of editing and updating and is always more current than the recorded lectures. You should focus on the readings.
You can find the software instructions here:
The suggested course load at JHU EP is one course per semester. Additionally, this a graduate course, not an undergraduate course and therefore the expectations are higher. Note that full time students in graduate programs at Homewood take 2-3 courses per semester. If you are taking that many as a part time student, with a full time job, you can easily over leverage yourself.
You must make sure you have completed the respective Core Foundation courses and that you meet the technical requirements of the course (competency with Python, SQL, etc).
There is the additional challenge that completing assignments that involve both data and programming is fairly non-deterministic. One mistake can set you back an hour or more. It is advised to work more slowly and methodically to finish faster.
This course is highly structured and there are different kinds of assignments due every week, all with different learning goals.
Lectures and ReadingsEach week starts with the lectures and readings. The goal of the lectures and readings is to introduce the topics and concepts of the module and give you examples of their application using various frameworks, guidelines, and processes. Pay special attention to frameworks (you will need to be able to apply them) and examples (they are there as references). The Lectures are recorded. The readings are in the corresponding chapters of Fundamentals, both the PDF and the corresponding notebooks. The PDF/notebooks are always more up-to-date than the Lectures because the text Fundamentals is under constant revision. Some modules may also include supplemental readings which can enhance your understanding of the materials but which are not covered in corresponding assignments.
AssessmentsEach Module will include an Assessment. The purpose of Assessments is to gauge your conceptual understanding of the topics covered. "Quizzing" also helps you retain information better. These Assessments are cumulative. They will contain 10 questions from the current Module and 5 questions from any of the previous Modules (15 total). The Assessment is time limited to 30 minutes. It will automatically submit after that time. You have only one attempt. You will be shown one question at a time with no backtracking.
The Assessment is due on Tuesday (you can always do it earlier). Assessments are individual effort. They are not group projects (yes, we have to say this).
Exceptions...Fall/Spring: the first Assessment has 10 questions and each is worth 1.5 points. Summer: There may be two assessments during a week with combined modules.
Suggestions...Fundamentals contains review questions at the end of each chapter. You are highly encouraged to think through them, both by yourself and in group discussion.
Each week there is a Lab that asks you to apply the methods and concepts covered in the Module. The goal of the Lab is to demonstrate the ability to do data science, or at least that part of data science covered in the Module. The Lab is to be individual effort, unless explicitly noted otherwise. You should actually review the Lab before you start the lectures and readings for the week so that you get a sense of where the topic is going (and the Review questions in Fundamentals!).
Labs will be assigned a letter grade, despite not counting towards your final grade calculation. The purpose is to provide you with frequent feedback on what constitutes an A, B, C, etc when it comes to the Exams. You must complete all labs in the course.
You will undoubtedly have questions about the concepts and the Lab. At the start of the semester, you will be assigned to a Lab Group of about 3-6 other students. If you have any questions about the Module materials, readings, concepts, and especially the Lab, you should ask questions in your Lab Group. You should not post a question about the Module content to the instructor that you did not already ask in your Lab Group.
Here's the basic weekly discussion schedule...
* Discussions should start as students work through the material at the start of the new Module week.
* Labs are due in Canvas by Sunday.
* Lab solutions are released Monday, at which point you may openly discuss with your Lab Group.
* If Peer Reviews in Canvas are required for a lab, they will be due by end of the module.
If other members of your Lab Group have questions, you should endeavor to answer them. Class participation depends on both asking and answering questions. However, you should not give away the answers to any assignment. Do not just paste code. You should understand what you’re doing and why; you should always understand what the code is doing.
The pre-submission discussion will take place on the group discussion medium selected for the semester (Canvas/Teams/Slack). We hope that is will lead to better and more immediate discussions.
On Monday, you may share and discuss your Lab with the rest of your Lab Group. For some Labs, we may use Peer Reviews in Canvas to facilitate discussion. For Peer Reviews, the goal isn't really to assign a grade but rather to provide feedback based on the provided Lab Solution (if applicable). The Peer Review is due by Tuesday. Evaluations count as part of your own Lab score so if you complete a Lab and fail to complete an Evaluation, you get a 0 for the assignment.
On Wednesday, you should have reviewed all the Lab Group discussions and any Peer Reviews to see if there's anything you may have missed in your own submission. If you have any remaining questions or points of confusion, these are typically good topics to bring to Office Hours.
The quality of the course and learning experience depends on the quality of the group interactions, which depends on:
1. Asking and answering questions in a skillful way.
2. Posting your Lab on time.
3. Posting a substantive evaluation of your peer(s) on time.
4. Reviewing all group discussions on time.
Exams / Problem Sets
The course builds by starting with lectures and readings, Labs and discussion, assessments on concepts, and culminates in Exams consisting of Problem Sets. Each problem set covers the materials of designated module. These are the real exams of the course, and are tests of your ability to do data science.
There will be a Midterm exam and a Final exam, each consisting of three problem sets.
You should always refer to the corresponding Module Labs, Lab Solutions, and Examples in Fundamentals when completing a Problem Set. As with exams, we give you less direction. This is why the Labs are so important. If, on a Problem Set, we give you a data set and say, “Do EDA”, you should be able to do it according to the framework presented in the course. Problem Sets are exams and should be treated as such. They are individual effort. They are not group projects.
Assignments are due according to the dates posted on Canvas or Course Outline. You may check these due dates in the Course Calendar or the Assignments in the corresponding modules. We will try to post grades one week after assignment due dates but no later than two weeks.
It’s worth noting that the due date is not the do date. You can and should start much earlier in the week on your assignments.
Grading Standards
“A” – Excellent. You completed the assignment in a timely manner, demonstrating a thorough understanding of the technique, tool or concept and conducted an excellent exploration of its use. If it is a discussion, your post was substantive, did not just quote other materials, and contributed to the on-going discussion. You went above what was required, asked for or expected. Over the course of the semester, this means consistent excellence and distinction throughout the course—that is, conspicuous excellence in all aspects of assignments and discussion in every week.
“B” – Satisfactory. You completed the assignment in a timely manner, you did exactly what was requested, demonstrating a sufficient understanding of the technique, tool, or concept. There may have been minor deficiencies. If it was a discussion post, the post contributed to the discussion but it may have been a reference to other materials or perhaps even slightly off topic. You may have done more too much in the hopes that something was correct. Verbosity is often a sign of some confusion. Over the course of the semester, this means work that meets all course requirements on a level appropriate for graduate academic work.
“C”, “D” – Unsatisfactory. You either did not complete the assignment, it was not timely or you did what was minimally required. There are significant areas of confusion. A lack of exploration or curiosity about the concept, tool or technique. If it was a discussion post, it may have been off topic. Listing many things, hoping that one is correct, is often a sign of confusion.
“F” - Oops. You did not submit the assignment on-time or post on-time or no bona fide effort was evident.
We cannot stress this enough, merely working hard is not grounds for an A. You have to do the right thing in the right way. Writing too much is often a sign of confusion as much as is writing too little.
We generally do not directly grade spelling and grammar. However, egregious violations of the rules of the English language will be noted without comment. Consistently poor performance in either spelling or grammar is taken as an indication of poor written communication ability that may detract from your grade.
However, communication is very important in Data Science so we will tend to be a bit more picky about formatting, grammar and spelling. If your submissions look like a ransom note, however correct they might otherwise be, they will be counted as wrong.
Grading SystemWe use a classic letter grading system in this course. You must have sufficient mastery over the topics to get a B, or an A.
Here is the breakdown for the final grade calculations:
Labs | |
Assessments | 20% |
Midterm | 40% |
Final | 40% |
If you score a C or lower on the Midterm, you may be asked to revise your submission for a potential increase of up to one letter grade improvement. Eligibility for revisions will depend on your lab and class participation regarding the relevant course module(s).
As the semester unfolds, we may find it necessary to adjust both the assignments, criteria or both. We may award "pluses and minuses" at our discretion. We may, at our discretion, change the thresholds down.
Late Policy
We do not accept late submissions for a grade without prior consultation, except in the case of extreme emergencies (the birth of a child, incapacitating illness, etc). The following are not legitimate reasons: work, taking other classes, weddings, family reunions, holidays, anniversaries, vacations, etc. However, emergencies of all stripes to arise. The key here is prior consultation.
The main issue is that lateness on Module 5’s programming assignment snowballs into Module 6’s programming assignment. To fall behind may mean never being able to catch up. Additionally, with things like the Lab Groups, other students are counting on your participation.
Deadlines for Adding, Dropping and Withdrawing from Courses
Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.
Academic Misconduct Policy
All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students. This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.
Students with Disabilities - Accommodations and Accessibility
Johns Hopkins University is committed to providing welcoming, equitable, and accessible educational experiences for all students. If disability accommodations are needed for this course, students should request accommodations through Student Disability Services (SDS) as early as possible to provide time for effective communication and arrangements. For further information about this process, please refer to the SDS Website.
Student Conduct Code
The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/
Classroom Climate
JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).
Course Auditing
When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.