605.649.81 - Introduction to Machine Learning

Computer Science
Spring 2024

Description

Analyzing large data sets (“Big Data”), is an increasingly important skill set. One of the disciplines being relied upon for such analysis is machine learning. In this course, we will approach machine learning from a practitioner’s perspective. We will examine the issues that impact our ability to learn good models (e.g., inductive bias, the curse of dimensionality, the bias-variance dilemma, and no free lunch). We will then examine a variety of approaches to learning models, covering the spectrum from unsupervised to supervised learning, as well as parametric versus non-parametric methods. Students will explore and implement several learning algorithms, including logistic regression, nearest neighbor, decision trees, and feed-forward neural networks, and will incorporate strategies for addressing the issues impacting performance (e.g., regularization, clustering, and dimensionality reduction). In addition, students will engage in online discussions, focusing on the key questions in developing learning systems. At the end of this course, students will be able to implement and apply a variety of machine learning methods to real-world problems, as well as be able to assess the performance of these algorithms on different types of data sets. Prerequisite(s): EN.605.202 – Data Structures or equivalent.

Instructor

Course Structure

Details on the course structure can be found in the Course Outline. Each course module runs for a period of seven (7) days, i.e., one week. Due dates for readings and other assignments are referred to by the day of the module week in which they are due. For example, if a reading assignment is to be completed by Day 3 and the module started on Monday, then the reading assignment should be completed by Wednesday or the 3rd day of the module. Please refer to the Course Outline for the specific start and end dates for each module in this course.

IMPORTANT: The course design has recently been updated to provide more opportunities for students to manage their time. In particular, all programming assignments will be allotted four weeks for completion; however, there will be a one-week overlap between programming assignment #1 and programming assignment #2. There will also be a one-week overlap between programming assignment #3 and programming assignment #4. It is up to the student to decide how to use that overlapped time, whether to finish the prior assignment or to get a start on the next assignment.

Course Topics

  1. Non parametric Learning
  2. Clustering
  3. Bayesian Learning
  4. Decision Trees
  5. Ensembles
  6. Dimensionality Reduction
  7. Rule Learning
  8. Linear Models
  9. Linear Networks
  10. Multi-Layer Networks
  11. Deep Learning
  12. Reinforcement Learning
  13. Temporal Difference Learning
  14. Deep Reinforcement Learning

Course Goals

To develop broad understanding of the issues in developing and implementing machine learning algorithms and systems, especially as they related to modern, data-intensive problems.

Course Learning Outcomes (CLOs)

Textbooks

Required:

Optional:

Other Materials & Online Resources

Speakers/audio output (or headsets), webcam, and microphones are required for this course.

Student Coursework Requirements

Course Expectations

This is a very high workload course and is designed as a graduate-level computer science course. It has also been designed to prepare students interested in a more technical/research-based experience in the design of their degree programs. All students, but especially those in programs other than the computer science MS or postgraduate certificate program, should therefore, be aware of the following expectations.

Programming Assignments Guidelines

The programming assignments are designed to give you experience implementing key machine learning algorithms from scratch. You will implement the algorithms and then test their performance on several data sets from the UCI Machine Learning Repository. For these assignments, you are required to submit source code, short videos that demonstrate proper functioning of the code, and a brief paper describing of the results of your experiments.

You may use any of the following higher-order programming language you wish, Java, Python, or C#; however, you are not permitted to use available machine learning libraries such as Matlab toolbox, WEKA, RapidMiner, scikit-learn, TensorFlow, etc. You are not permitted to seek out and use any code found in online code repositories that appear to be similar or the same as the programs assigned in this course. You are not permitted to use the following languages for implementing the algorithms, but you may use them to support analysis of the results: SQL, Matlab, Maple, Mathematica, R. All algorithms must be implemented from scratch by you. Basic libraries for managing data structures and fundamental math operations (e.g., NumPy or Pandas) are permitted.

To facilitate the grading of programming assignments, please adhere to the following:

The report you provide should be done with a word processor or Latex. We recommend using Overleaf, but this is not required. Use the equation and pseudo-code editing capabilities of whatever tool you use. As this is a graduate level course, it is expected that you follow appropriate conventions related to incorporating mathematics, pseudocode, and figures into the paper. Make sure you submit a PDF of your report. MS Word, Open Office/Libre Office, and Latex files will not be acceptedSubmit your report separately, i.e., do NOT include in in your zip file.

Your report must include the following:[1]

[1] A sample report is provided in Canvas. Note that the sample report specifies requirements for a more formal report than what is specified here. You are only required to satisfy the requirements above.

Participation Grading Criteria

Active student participation is an essential part of any online course. Therefore, part of the student's grade (30%) will be based on class participation. There are two components to the class participation grade – muddy posts/responses and small group discussions.

CAUTION: Be advised that you will be assigned to two different groups – one for muddy point exercises and one for small group assignments. Members of these groups are not the sameand either or both may change as the semester progresses.

Muddy Point/Response

At various points in the class, students will be required to post a "Muddiest Point" message to the muddy point discussion forum associated. Specifically, the student shall post a comment that identifies a part of the module that was particularly confusing, thus needing clarification. This must be done by Day 3 of the week when the discussion takes place but can cover anything since the previous muddy point exercise. Note, however, that only one point/question is to be posted. Students will be paired with one or two other students (called muddy buddies), and one of the partners will post a clarifying response as a reply to the muddy point within two days of the initial posting (i.e., by Day 5). This response must constitute a serious, substantive attempt to answer the question posed in the muddy point and will be graded accordingly. Simply referring to an external website (e.g., Wikipedia) is not sufficient. The responder must demonstrate that they have attempted to gain a solid understanding of the answer. Thus students will be evaluated based on timeliness in posting the Muddiest Point as well as their ability to provide clarifying responses to their partner’s Muddiest Point. Later, the instructor will add additional clarifying information if necessary.

Types of questions that are not accepted for muddy points include the following:

For grading, each muddy response will be scored based on timeliness, completeness, and correctness (1: on time, complete, and correct, 0.5: late, incorrect, or answers a question not asked, 0: unacceptable). Incomplete, incorrect, or deficient answers will be graded using a 0.6–0.9 multiplier; however, once the instructor responds with an answer, no credit will be provided for any subsequent responses. Each muddy post will also receive one of three scores (1: on time and substantive, 0.5: late or not substantive, 0: no post or unacceptable). The response is weighted 50% more than the initial post. Each week’s muddy score is calculated as the post score plus the 1.5 times the response score, and this total divided by 2.5 to obtain a percentage. This score (out of 100) is what will be posted in Canvas. All of the muddy scores are averaged for the final muddy point grade.

As mentioned above, a muddy point may contain one, and only one, target question to be addressed. If multiple questions are posted, a penalty will be applied to the muddy point part of the grade. Furthermore, the muddy buddy responding may then choose which question to answer and is not required to address all of them.

When there are groups of three people, a “round robin” response policy is enforced. This means that everyone needs to be the primary responder to one other person in the group. Suppose a group is made up of Alice, John, and Bob. Two possible scenarios are possible: 1) Alice responds to Bob who responds to John who responds to Alice, or 2) Alice responds to John who responds to Bob who responds to Alice. The order chosen is entirely up to the group and may change from week to week if so desired. But anyone who violates round robin will automatically receive a zero on the response part of the assignment.

The muddy point/response part of the course is worth 15% of the final grade.

Small Group Discussions

In addition to the muddy point exercises, at various points in the class, an open discussion question will be posted in the main discussion forums. Each student will be grouped with two or three other students and assigned to a "Group" within Canvas. During the week, the group is to engage in an ongoing discussion on the question posted. Each student is required to post substantively at least five times, and these posts must occur on at least three different days. Non-substantive posts (e.g., "I agree," or "I need to think about that more") will not be counted. Substantive posts must address the original prompt or response to the initial prompt. Note that a single response should not address the entire prompt.

During each discussion, each substantive post will receive 1 point (up to 5 points total), and each day posted will receive 1 point (up to 3 points total). Thus full credit will be 8 points. The score posted in Canvas will be a percentage (a score out of 100) based on this 8-point total.

The small group discussion part of the course is worth 15% of the final grade.

Short Quiz Guidelines

At various points during the course, students will be required to complete a short, 15-question objective-style quiz. The point of the quiz is to provide a "formative assessment" where both the student and the instructor can gain a sense for how well students are learning the material. Because the quizzes are formative, they only account for 10% of the final grade.

Each student will have 30 minutes to complete each quiz, and two attempts will be permitted. All of the questions will be objective-style (multiple choice or true-false), and there will only be 15 questions. No mechanism will be employed to determine if the student is using outside resources to take the quiz (e.g., the book, notes, the web, etc.); however, students are asked to take the quiz with no such resources. This is the best way for the instructor to gain a sense of the level of knowledge of the student.

Remember to answer all questions on both attempts. The system is set up to take your latest attempt, not your best attempt. It also does not average attempts.

The quizzes are designed to show all 15 questions all at once. Answers will be provided once the student submits their answers. While the time for the quiz is set for 30 minutes, it is possible to take longer. Even so, the student should strive to complete the quiz within the 30 minute timeframe.

Quiz feedback will be released automatically after the due date passes.

Grading Policy

Grading will be based on biweekly programming assignments, small group discussions, and short quizzes. Final grades will be determined by the following weighting:

Item
% of Grade
Muddy Post Discussions (4)
15%
Small Group Discussions (4)
15%
Short Quizzes (4)
10%
Programming Projects (4)
60%

Each programming assignment will outline the specific requirements and steps to be taken to complete the assignment with associated weights.

In terms of assigning a final letter grade, the following is provided as the default scheme. If deemed appropriate, the instructor may decide to curve final grades, but this is not guaranteed.


Score Range
Grade
[93,100]
A
[90,93)
A-
[87,90)
B+
[83,87)
B
[80,83)
B-
[70,80)
C
[0,70)
F

Students at risk of receiving a C or lower in the class are identified at midterm through the registrar’s office as part of the course roster verification. If you do not receive notice from the university or the professor following midterm, you are on track to receive at least a B- in the class. If you are concerned about your grade at any point during the semester, feel free to reach out to the instructor.

Course Policies

Late Submission Policy

Being that we are all working professionals, and time management is of critical importance, the purpose of this document is to lay out the course policy with respect to completing course assignments.

The default policy of this course is that no late submissions will be accepted.

Note that I recognize exceptional circumstances may arise, and I am willing to work with students when they do. Therefore, the following modifications to the default policy are put in place:

  1. If you must travel for business and will have limited Internet connectivity, then you must notify the instructor at leastone week prior to travel to make arrangements for handling assignments. Failure to provide this advanced notice will result in all of the original due dates being enforced.
  2. If you are traveling on vacation, then all of the original due dates remain enforced. Personal, recreational travel is not an excuse to relax the due dates.
  3. If there is a family emergency (e.g., a death in the family or a serious illness), then you must notify the instructor as soon as possible to make arrangements for handling the assignments. Note that, under such circumstances, considerable flexibility is possible.
  4. If you become personally ill, then it is important for you to take care of your health; however, since we are not meeting in person in a classroom, meaning that spread of disease is not an issue, only illnesses or injuries that require professional medical attention will receive special handling. Otherwise, all of the original due dates will be enforced.
  5. Under no circumstances will time management issues result in a relaxation of the due dates. Poor time management is never an acceptable excuse.
  6. Special accommodations are available for students who register a disability with the university. Those accommodations will be worked out with SDS.

Policy on Using AI Large Language Models and Generative AI

This class will strive to create an environment that fosters learning, critical thinking, effective communication, and technology development. To achieve these goals, using AI-based tools such as ChatGPT, Copilot, or similar are prohibited in this course.

While ChatGPT and other large language models can be powerful and useful tools in certain contexts, relying on them for this course undermines the learning objectives. You are being trained to understand the fundamentals of machine learning, rather than to become a user of AI or ML tools. This approach involves developing skills in independent thinking, problem-solving, and engagement with the subject matter. By restricting the use of large language models, your knowledge, creativity, and critical analysis will be used to complete all assignments and actively participate in class discussions.

It is important to note that this requirement applies to all aspects of the course, including programming assignments, writing assignments, discussion assignments, quizzes, and any form of communication related to the course content. Any use of generative AI tools during these activities will be considered a violation in the student code of conduct and will be reported as academic misconduct.

Policy on the Use of Code Repositories

For purposes of project management and version control, code repositories such as GitHub and GitLab have demonstrated tremendous benefit. Because of this, students are strongly encouraged to use such repositories for their projects. Be that as it may, there is evidence that such code repositories can also be abused with respect to making previous assignments available to the public, thus fostering academic misconduct. For these reasons the following policies with respect to code repositories are put in place.

  1. Should a student decide to use a code repository to manage their programming projects, those repositories must be private and must remain private beyond the end of the course.
  2. Re-emphasizing the above, under no circumstances may code from this course be made public, either through a public repository or through posting to another site that collects assignments or code.
  3. Under no circumstances may the assignments themselves be posted to a public repository or public site.
  4. Should a student discover the assignments have been posted somewhere on the web, the student is asked to report the site to the instructor as soon as possible.
  5. Should a student discover the assignments in a public place, under no circumstances shall the student view, download, or otherwise use the information contained on that site.



Questions about these policies should be directed to the instructor.

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.

This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.

For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, ep-disability-svcs@jhu.edu.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. 

For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. 
 
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.