Analyzing large data sets (“Big Data”), is an increasingly important skill set. One of the disciplines being relied upon for such analysis is machine learning. In this course, we will approach machine learning from a practitioner’s perspective. We will examine the issues that impact our ability to learn good models (e.g., inductive bias, the curse of dimensionality, the bias-variance dilemma, and no free lunch). We will then examine a variety of approaches to learning models, covering the spectrum from unsupervised to supervised learning, as well as parametric versus non-parametric methods. Students will explore and implement several learning algorithms, including logistic regression, nearest neighbor, decision trees, and feed-forward neural networks, and will incorporate strategies for addressing the issues impacting performance (e.g., regularization, clustering, and dimensionality reduction). In addition, students will engage in online discussions, focusing on the key questions in developing learning systems. At the end of this course, students will be able to implement and apply a variety of machine learning methods to real-world problems, as well as be able to assess the performance of these algorithms on different types of data sets. Prerequisite(s): EN.605.202 – Data Structures or equivalent.
Details on the course structure can be found in the Course Outline. Each course module runs for a period of seven (7) days, i.e., one week. Due dates for readings and other assignments are referred to by the day of the module week in which they are due. For example, if a reading assignment is to be completed by Day 3 and the module started on Monday, then the reading assignment should be completed by Wednesday or the 3rd day of the module. Please refer to the Course Outline for the specific start and end dates for each module in this course.
IMPORTANT: The course design has recently been updated to provide more opportunities for students to manage their time. In particular, all programming assignments will be allotted four weeks for completion; however, there will be a one-week overlap between programming assignment #1 and programming assignment #2. There will also be a one-week overlap between programming assignment #3 and programming assignment #4. It is up to the student to decide how to use that overlapped time, whether to finish the prior assignment or to get a start on the next assignment.
To develop broad understanding of the issues in developing and implementing machine learning algorithms and systems, especially as they related to modern, data-intensive problems.
Speakers/audio output (or headsets), webcam, and microphones are required for this course.
This is a very high workload course and is designed as a graduate-level computer science course. It has also been designed to prepare students interested in a more technical/research-based experience in the design of their degree programs. All students, but especially those in programs other than the computer science MS or postgraduate certificate program, should therefore, be aware of the following expectations.
The programming assignments are designed to give you experience implementing key machine learning algorithms from scratch. You will implement the algorithms and then test their performance on several data sets from the UCI Machine Learning Repository. For these assignments, you are required to submit source code, short videos that demonstrate proper functioning of the code, and a brief paper describing of the results of your experiments.
You may use any of the following higher-order programming language you wish, Java, Python, or C#; however, you are not permitted to use available machine learning libraries such as Matlab toolbox, WEKA, RapidMiner, scikit-learn, TensorFlow, etc. You are not permitted to seek out and use any code found in online code repositories that appear to be similar or the same as the programs assigned in this course. You are not permitted to use the following languages for implementing the algorithms, but you may use them to support analysis of the results: SQL, Matlab, Maple, Mathematica, R. All algorithms must be implemented from scratch by you. Basic libraries for managing data structures and fundamental math operations (e.g., NumPy or Pandas) are permitted.
To facilitate the grading of programming assignments, please adhere to the following:
Your report must include the following:
Active student participation is an essential part of any online course. Therefore, part of the student's grade (30%) will be based on class participation. There are two components to the class participation grade – muddy posts/responses and small group discussions.
CAUTION: Be advised that you will be assigned to two different groups – one for muddy point exercises and one for small group assignments. Members of these groups are not the same, and either or both may change as the semester progresses.
At various points in the class, students will be required to post a "Muddiest Point" message to the muddy point discussion forum associated. Specifically, the student shall post a comment that identifies a part of the module that was particularly confusing, thus needing clarification. This must be done by Day 3 of the week when the discussion takes place but can cover anything since the previous muddy point exercise. Note, however, that only one point/question is to be posted. Students will be paired with one or two other students (called muddy buddies), and one of the partners will post a clarifying response as a reply to the muddy point within two days of the initial posting (i.e., by Day 5). This response must constitute a serious, substantive attempt to answer the question posed in the muddy point and will be graded accordingly. Simply referring to an external website (e.g., Wikipedia) is not sufficient. The responder must demonstrate that they have attempted to gain a solid understanding of the answer. Thus students will be evaluated based on timeliness in posting the Muddiest Point as well as their ability to provide clarifying responses to their partner’s Muddiest Point. Later, the instructor will add additional clarifying information if necessary.
Types of questions that are not accepted for muddy points include the following:
For grading, each muddy response will be scored based on timeliness, completeness, and correctness (1: on time, complete, and correct, 0.5: late, incorrect, or answers a question not asked, 0: unacceptable). Incomplete, incorrect, or deficient answers will be graded using a 0.6–0.9 multiplier; however, once the instructor responds with an answer, no credit will be provided for any subsequent responses. Each muddy post will also receive one of three scores (1: on time and substantive, 0.5: late or not substantive, 0: no post or unacceptable). The response is weighted 50% more than the initial post. Each week’s muddy score is calculated as the post score plus the 1.5 times the response score, and this total divided by 2.5 to obtain a percentage. This score (out of 100) is what will be posted in Canvas. All of the muddy scores are averaged for the final muddy point grade.
As mentioned above, a muddy point may contain one, and only one, target question to be addressed. If multiple questions are posted, a penalty will be applied to the muddy point part of the grade. Furthermore, the muddy buddy responding may then choose which question to answer and is not required to address all of them.
When there are groups of three people, a “round robin” response policy is enforced. This means that everyone needs to be the primary responder to one other person in the group. Suppose a group is made up of Alice, John, and Bob. Two possible scenarios are possible: 1) Alice responds to Bob who responds to John who responds to Alice, or 2) Alice responds to John who responds to Bob who responds to Alice. The order chosen is entirely up to the group and may change from week to week if so desired. But anyone who violates round robin will automatically receive a zero on the response part of the assignment.
The muddy point/response part of the course is worth 15% of the final grade.
In addition to the muddy point exercises, at various points in the class, an open discussion question will be posted in the main discussion forums. Each student will be grouped with two or three other students and assigned to a "Group" within Canvas. During the week, the group is to engage in an ongoing discussion on the question posted. Each student is required to post substantively at least five times, and these posts must occur on at least three different days. Non-substantive posts (e.g., "I agree," or "I need to think about that more") will not be counted. Substantive posts must address the original prompt or response to the initial prompt. Note that a single response should not address the entire prompt.
During each discussion, each substantive post will receive 1 point (up to 5 points total), and each day posted will receive 1 point (up to 3 points total). Thus full credit will be 8 points. The score posted in Canvas will be a percentage (a score out of 100) based on this 8-point total.
The small group discussion part of the course is worth 15% of the final grade.
At various points during the course, students will be required to complete a short, 15-question objective-style quiz. The point of the quiz is to provide a "formative assessment" where both the student and the instructor can gain a sense for how well students are learning the material. Because the quizzes are formative, they only account for 10% of the final grade.
Each student will have 30 minutes to complete each quiz, and two attempts will be permitted. All of the questions will be objective-style (multiple choice or true-false), and there will only be 15 questions. No mechanism will be employed to determine if the student is using outside resources to take the quiz (e.g., the book, notes, the web, etc.); however, students are asked to take the quiz with no such resources. This is the best way for the instructor to gain a sense of the level of knowledge of the student.
Remember to answer all questions on both attempts. The system is set up to take your latest attempt, not your best attempt. It also does not average attempts.
The quizzes are designed to show all 15 questions all at once. Answers will be provided once the student submits their answers. While the time for the quiz is set for 30 minutes, it is possible to take longer. Even so, the student should strive to complete the quiz within the 30 minute timeframe.
Quiz feedback will be released automatically after the due date passes.
Grading will be based on biweekly programming assignments, small group discussions, and short quizzes. Final grades will be determined by the following weighting:
|% of Grade
|Muddy Post Discussions (4)
|Small Group Discussions (4)
|Short Quizzes (4)
|Programming Projects (4)
Each programming assignment will outline the specific requirements and steps to be taken to complete the assignment with associated weights.
In terms of assigning a final letter grade, the following is provided as the default scheme. If deemed appropriate, the instructor may decide to curve final grades, but this is not guaranteed.
Students at risk of receiving a C or lower in the class are identified at midterm through the registrar’s office as part of the course roster verification. If you do not receive notice from the university or the professor following midterm, you are on track to receive at least a B- in the class. If you are concerned about your grade at any point during the semester, feel free to reach out to the instructor.
Being that we are all working professionals, and time management is of critical importance, the purpose of this document is to lay out the course policy with respect to completing course assignments.
The default policy of this course is that no late submissions will be accepted.
Note that I recognize exceptional circumstances may arise, and I am willing to work with students when they do. Therefore, the following modifications to the default policy are put in place:
This class will strive to create an environment that fosters learning, critical thinking, effective communication, and technology development. To achieve these goals, using AI-based tools such as ChatGPT, Copilot, or similar are prohibited in this course.
While ChatGPT and other large language models can be powerful and useful tools in certain contexts, relying on them for this course undermines the learning objectives. You are being trained to understand the fundamentals of machine learning, rather than to become a user of AI or ML tools. This approach involves developing skills in independent thinking, problem-solving, and engagement with the subject matter. By restricting the use of large language models, your knowledge, creativity, and critical analysis will be used to complete all assignments and actively participate in class discussions.
It is important to note that this requirement applies to all aspects of the course, including programming assignments, writing assignments, discussion assignments, quizzes, and any form of communication related to the course content. Any use of generative AI tools during these activities will be considered a violation in the student code of conduct and will be reported as academic misconduct.
For purposes of project management and version control, code repositories such as GitHub and GitLab have demonstrated tremendous benefit. Because of this, students are strongly encouraged to use such repositories for their projects. Be that as it may, there is evidence that such code repositories can also be abused with respect to making previous assignments available to the public, thus fostering academic misconduct. For these reasons the following policies with respect to code repositories are put in place.
Questions about these policies should be directed to the instructor.
Deadlines for Adding, Dropping and Withdrawing from Courses
Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.
Academic Misconduct Policy
All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.
This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at firstname.lastname@example.org.
Students with Disabilities - Accommodations and Accessibility
Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.
For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, email@example.com.
Student Conduct Code
The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically.
For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/
JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity.
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).
When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.