Analyzing large data sets (“Big Data”), is an increasingly important skill set. One of the disciplines being relied upon for such analysis is machine learning. In this course, we will approach machine learning from a practitioner’s perspective. We will examine the issues that impact our ability to learn good models (e.g., inductive bias, the curse of dimensionality, the bias-variance dilemma, and no free lunch). We will then examine a variety of approaches to learning models, covering the spectrum from unsupervised to supervised learning, as well as parametric versus non-parametric methods. Students will explore and implement several learning algorithms, including logistic regression, nearest neighbor, decision trees, and feed-forward neural networks, and will incorporate strategies for addressing the issues impacting performance (e.g., regularization, clustering, and dimensionality reduction). In addition, students will engage in online discussions, focusing on the key questions in developing learning systems. At the end of this course, students will be able to implement and apply a variety of machine learning methods to real-world problems, as well as be able to assess the performance of these algorithms on different types of data sets. Prerequisite(s): EN.605.202 – Data Structures or equivalent.
Details on the course structure can be found in the Course Outline. Each course module runs for a period of seven (7)days, i.e., one week. Due dates for readings and other assignments are referred to by the day of the module week inwhich they are due. For example, if a reading assignment is to be completed by Day 3 and the module started onMonday, then the reading assignment should be completed by Wednesday or the 3rd day of the module. Please refer tothe Course Outline for the specific start and end dates for each module in this course.
IMPORTANT: The course design has recently been updated to provide more opportunities for students to manage their time. In particular, all programming assignments will be allotted three weeks for completion; however, you will be required to submit your code by the end of the second week. The third week is designed to be used for creating your video and writing your paper. That said, if you need to update your code, you may do so; however, whether you update or not, resubmit your code at the end of the third week.
To develop broad understanding of the issues in developing and implementing machine learning algorithms and systems, especially as they related to modern, data-intensive problems.
Required
Alpaydin, E. (2020). Introduction to machine learning(4th ed.). Cambridge, MA: The MIT Press. ISBN 9780262043793. (Note: Earlier editions should be used at your own risk.)
Prior editions of this book are strongly discouraged and may be used only at your own risk.
Mitchell, T. M. (1997). Machine learning. New York, NY: McGraw-Hill. ISBN-10: 0070428077; ISBN-13: 9780070428072.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York, NY: Springer, ISBN-10: 0387310738; ISBN-13: 9780387310732.
Speakers/audio output (or headsets) and microphones are required for this course. A webcam is required for use during office hours.
Course Expectations
This is a very high workload course and is designed as a graduate-level computer science course. It has also been designed to prepare students interested in a more technical/research-based experience in the design of their degree programs. All students, but especially those in programs other than the computer science MS or postgraduate certificate program, should therefore, be aware of the following expectations.
The programming assignments are designed to give you experience implementing key machine learning algorithms from scratch. You will implement the algorithms and then test their performance on several data sets from the UCI Machine Learning Repository. For these assignments, you are required to submit source code, short videos that demonstrate proper functioning of the code, and a brief paper describing of the results of your experiments.
You may use any of the following higher-order programming language you wish, Java, Python, or C#; however, you are not permitted to use available machine learning libraries such as Matlab toolbox, WEKA, RapidMiner, scikit-learn, TensorFlow, etc. You are not permitted to seek out and use any code found in online code repositories that appear to be similar or the same as the programs assigned in this course. You are not permitted to use the following languages for implementing the algorithms, but you may use them to support analysis of the results: SQL, Matlab, Maple, Mathematica, R. All algorithms must be implemented from scratch by you. Basic libraries for managing data structures and fundamental math operations (e.g., NumPy or Pandas) are permitted.
To facilitate the grading of programming assignments, please adhere to the following:
The report you provide should be done with a word processor or Latex. We recommend using Overleaf, but this is not required. Use the equation and pseudo-code editing capabilities of whatever tool you use. As this is a graduate level course, it is expected that you follow appropriate conventions related to incorporating mathematics, pseudocode, and figures into the paper. Make sure you submit a PDF of your report. MS Word, Open Office/Libre Office, and Latex files will not be accepted. Submit your report separately, i.e., do NOT include in in your zip file.
Your report must include the following: [1]
[1] A sample report is provided in Canvas. Note that the sample report specifies requirements for a more formal reportthan what is specified here. You are only required to satisfy the requirements above.
Participation Grading CriteriaActive student participation is an essential part of any online course. Therefore, part of the student's grade (30%) will bebased on class participation. There are two components to the class participation grade – muddy posts/responses and small group discussions.
As a reminder, the use of generative AI (e.g., Gemini or ChatGPT) is not permitted when participating in any of the discussion exercises. The point is for you to express what you know, not what an online AI tool knows.
CAUTION: Be advised that you will be assigned to two different groups – one for muddy point exercises and one for small group assignments. Members of these groups are not the same, and either or both may change as the semester progresses.
Muddy Point/Response
At various points in the class, students will be required to post a "Muddiest Point" message to the muddy point discussion forum associated. Specifically, the student shall post a comment that identifies a part of the module that was particularly confusing, thus needing clarification. This must be done by Day 3 of the week when the discussion takes place but can cover anything since the previous muddy point exercise. Note, however, that only one point/question is to be posted. Students will be paired with one or two other students (called muddy buddies), and one of the partners will post a clarifying response as a reply to the muddy point within two days of the initial posting (i.e., by Day 5). This response must constitute a serious, substantive attempt to answer the question posed in the muddy point and will be graded accordingly. Simply referring to an external website (e.g., Wikipedia) is not sufficient. The responder must demonstrate that they have attempted to gain a solid understanding of the answer. Thus, students will be evaluated based on timeliness in posting the Muddiest Point as well as their ability to provide clarifying responses to their partner’s Muddiest Point. Later, the instructor will add additional clarifying information if necessary.
Types of questions that are not accepted for muddy points include the following:
For grading, each muddy response will be scored based on timeliness, completeness, and correctness (1: on time, complete, and correct, 0.5: late, incorrect, or answers a question not asked, 0: unacceptable). Incomplete, incorrect, or deficient answers will be graded using a 0.6–0.9 multiplier; however, once the instructor responds with an answer, no credit will be provided for any subsequent responses. Each muddy post will also receive one of three scores (1: on time and substantive, 0.5: late or not substantive, 0: no post or unacceptable). The response is weighted 50% more than the initial post. Each week’s muddy score is calculated as the post score plus the 1.5 times the response score, and this total divided by 2.5 to obtain a percentage. This score (out of 100) is what will be posted in Canvas. All of the muddy scores are averaged for the final muddy point grade. As mentioned above, a muddy point may contain one, and only one, target question to be addressed. If multiple questions are posted, a penalty will be applied to the muddy point part of the grade. Furthermore, the muddy buddy responding may then choose which question to answer and is not required to address all of them.
When there are groups of three people, a “round robin” response policy is enforced. This means that everyone needs to be the primary responder to one other person in the group. The specific order of response (i.e., who responds to whom) will be indicated during each muddy point exercise.
As a reminder, generative AI tools such as ChatGPT or Gemini are not permitted to be used when answering a muddy post. The response must be written entirely by the person responding to the muddy point.
The muddy point/response part of the course is worth 15% of the final grade.
Small Group Discussions
In addition to the muddy point exercises, at various points in the class, an open discussion question will be posted in the main discussion forums. Each student will be grouped with two or three other students and assigned to a "Group" within Canvas. During the week, the group is to engage in an ongoing discussion on the question posted. Each student is required to post substantively at least five times, and these posts must occur on at least three different days. Non-substantive posts (e.g., "I agree," or "I need to think about that more") will not be counted. Substantive posts must address the original prompt or response to the initial prompt. Note that a single response should not address the entire prompt.
During each discussion, each substantive post will receive 1 point (up to 5 points total), and each day posted will receive1 point (up to 3 points total). Thus, full credit will be 8 points. The score posted in Canvas will be a percentage (a score out of 100) based on this 8-point total.
As a reminder, generative AI tools such as ChatGPT or Gemini are not permitted to be used when posting in the small group discussion. The response must be written entirely by the person making the contribution.
The small group discussion part of the course is worth 15% of the final grade.
Short Quiz RequirementsAt various points during the course, students will be required to complete a short, 15-question objective-style quiz. The point of the quiz is to provide a "formative assessment" where both the student and the instructor can gain a sense for how well students are learning the material. Because the quizzes are formative, they only account for 10% of the final grade.
Each student will have 30 minutes to complete each quiz, and two attempts will be permitted. All of the questions will be objective-style (multiple choice or true-false), and there will only be 15 questions. No mechanism will be employed to determine if the student is using outside resources to take the quiz (e.g., the book, notes, the web, etc.); however, students are asked to take the quiz with no such resources. This is the best way for the instructor to gain a sense of the level of knowledge of the student.
Remember to answer all questions on both attempts. The system is set up to take your latest attempt, not your best attempt. It also does not average attempts.
The quizzes are designed to show all 15 questions all at once. Answers will be provided once the student submits their answers. While the time for the quiz is set for 30 minutes, it is possible to take longer. Even so, the student should strive to complete the quiz within the 30-minute timeframe.
Quiz feedback will be released automatically after the due date passes.
Grading will be based on biweekly programming assignments, small group discussions, and short quizzes. Final grades will be determined by the following weighting:
| Item | % of Grade |
| Muddy Post Discussions (4) | 15% |
| Small Group Discussions (4) | 15% |
| Short Quizzes (4) | 10% |
| Programming Projects (4) | 60% |
| Score Range | Grade |
| [93,100] | A |
| [90,93) | A- |
| [87,90) | B+ |
| [83,87) | B |
| [80,83) | B- |
| [70,80) | C |
| [0,70) | F |
Late Submission Policy
Being that we are all working professionals, and time management is of critical importance, the purpose of this document is to lay out the course policy with respect to completing course assignments.
The policy of this course is that no late submissions will be accepted.
Note that I recognize exceptional circumstances may arise, and I am willing to work with students when they do. Therefore, the following additional requirements are put in place:
Questions about this policy should be directed to the instructor.
Deadlines for Adding, Dropping, and Withdrawing from Courses
Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar. Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.
Academic Misconduct Policy
Students with Disabilities - Accommodations and Accessibility
Student Conduct Code
Classroom Climate
JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).
Course Auditing
When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team (EP-Registration@exchange.johnshopkins.edu) in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.