Analyzing large data sets (“Big Data”), is an increasingly important skill set. One of the disciplines being relied upon for such analysis is machine learning. In this course, we will approach machine learning from a practitioner’s perspective. We will examine the issues that impact our ability to learn good models (e.g., inductive bias, the curse of dimensionality, the bias-variance dilemma, and no free lunch). We will then examine a variety of approaches to learning models, covering the spectrum from unsupervised to supervised learning, as well as parametric versus non-parametric methods. Students will explore and implement several learning algorithms, including logistic regression, nearest neighbor, decision trees, and feed-forward neural networks, and will incorporate strategies for addressing the issues impacting performance (e.g., regularization, clustering, and dimensionality reduction). In addition, students will engage in online discussions, focusing on the key questions in developing learning systems. At the end of this course, students will be able to implement and apply a variety of machine learning methods to real-world problems, as well as be able to assess the performance of these algorithms on different types of data sets. Prerequisite(s): EN.605.202 – Data Structures or equivalent.
Details on the course structure can be found in the Course Outline. Each course module runs for a period of seven (7) days, i.e., one week. Due dates for readings and other assignments are referred to by the day of the module week in which they are due. For example, if a reading assignment is to be completed by Day 3 and the module started on Monday, then the reading assignment should be completed by Wednesday or the 3rd day of the module. Please refer to the Course Outline for the specific start and end dates for each module in this course.
All students must complete a quiz on the Syllabus and Course Information and receive a score of at least 90% before the content modules will be released. Questions about the syllabus and course information should be asked of the instructor prior to taking the quiz. Once the quiz is passed, the instructor will assume all students understand the course expectations.
Non parametric Learning
Temporal Difference Learning
Deep Reinforcement Learning
To develop broad understanding of the issues in developing and implementing machine learning algorithms and systems, especially as they related to modern, data-intensive problems.
Alpaydin, E. (2020). Introduction to machine learning(4th ed.). Cambridge, MA: The MIT Press. ISBN 9780262043793.
Prior editions of this book are strongly discouraged and may be used only at your own risk.
Mitchell, T. M. (1997). Machine learning. New York, NY: McGraw-Hill. ISBN-10: 0070428077; ISBN-13: 9780070428072.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York, NY: Springer, ISBN-10: 0387310738; ISBN-13: 9780387310732.
Speakers/audio output (or headsets), webcam, and microphones are required for this course.
This is a very high workload course and is designed as a graduate-level computer science course. It has also been designed to prepare students interested in a more technical/research-based experience in the design of their degree programs. All students, but
especially those in programs other than the computer science MS or postgraduate certificate program, should therefore, be aware of the following expectations.
The programming assignments are designed to give you experience implementing key machine learning algorithms from scratch. You will implement the algorithms to test their performance on several data sets from the UCI Machine Learning Repository. For these assignments, you are required to submit source code, short videos that demonstrate proper functioning of the code, and a brief paper describing of the results of your experiments.
You may use one of the following higher-order programming language you wish (e.g., Java, Python, C#); however, you are not permitted to use available machine learning libraries such as Matlab toolbox, WEKA, RapidMiner, scikit-learn, TensorFlow, etc. You are not permitted to use the following languages for implementing the algorithms, but you may use them to support analysis of the results: SQL, Matlab, Maple, Mathematica, R. All algorithms must be implemented from scratch by you. Basic libraries for managing data structures and fundamental math operations (e.g., NumPy or Pandas) are permitted. You are not permitted to use Stack Overflow or any of the Stack Exchange forums under any circumstances.
To facilitate the grading of programming assignments, please adhere to the following:
The report you provided should be done with a word processor or Latex. Use the equation and pseudo-code editing capabilities of whatever tool you use. Make sure you submit a PDF of your report. MS Word, Open Office/Libre Office, and Latex files will not be accepted. Submit your report separately, i.e., do NOT include in in your zip file.
Your report must include the following:
Active student participation is an essential part of any online course. Therefore, part of the student's grade (30%) will be based on class participation. There are two components to the class participation grade – muddy posts/responses and small group discussions.
CAUTION: Be advised that you will be assigned to two different groups – one for muddy point exercises and one for small group assignments. Members of these groups are not the same and either or both may change as the semester progresses.
During the last module of every module group, students will be required to post a "Muddiest Point" message to the open discussion forum associated with the topic. Specifically, the student shall post a comment that identifies a part of the module that was particularly confusing, thus needing clarification. This must be done by Day 3 of the week when the module is presented. Students will be paired with one or two other students, and one of the partners will post a clarifying response in the same open discussion forum within two days of the initial posting (i.e., by Day 5). This response must constitute a serious, substantive attempt to answer the question posed in the muddy point and will be graded accordingly. Simply referring to an external website (e.g., Wikipedia) is not sufficient. The responder must demonstrate that they have attempted to gain a solid understanding of the answer. Thus students will be evaluated based on timeliness in posting the Muddiest Point as well as their ability to provide clarifying responses to their partners' Muddiest Points. Later, the instructor will add additional clarifying information if necessary.
Types of questions that are not accepted for muddy points include the following:
For grading, each muddy response will be scored based on timeliness, completeness, and correctness (1: on time, complete, and correct, 0.5: late, incorrect, or answers a question not asked, 0: unacceptable, 0.6–0.9: late or on time but with some deficiencies). Each muddy post will also receive one of three scores (1: on time and substantive, 0.5: late or not substantive, 0: no post or unacceptable). The response is weighted 50% more than the initial post. Each week’s muddy score is calculated as the post score plus the 1.5 times the response score, and this total divided by 2.5 to obtain a percentage. This score (out of 100) is what will be posted in Canvas. All of the muddy scores are averaged for the final muddy point grade.
A muddy point may contain one, and only one, target question to be addressed. If multiple questions are posted, a penalty will be applied to the muddy point part of the grade. Furthermore, the muddy buddy responding may then choose which question to answer and is not required to address all of them.
When there are groups of three people, a “round robin” response policy is enforced. This means that everyone needs to be the primary responder to one other person in the group. Suppose a group is made up of Alice, John, and Bob. Two possible scenarios are possible: 1) Alice responds to Bob who responds to John who responds to Alice, or 2) Alice responds to John who responds to Bob who responds to Alice. The order chosen is entirely up to the group and may change from week to week if so desired. But anyone who violates round robin will automatically receive a zero on the response part of the assignment.
The muddy point/response part of the course is with 15% of the final grade.
In addition to the muddy point exercises, during the first module in each module group, an open discussion question will be posted in the main discussion forums (except for the first module pair). Each student will be group with two or three other students and assigned to a “Group” within Canvas. During the week, the group is to engage in an ongoing discussion on the question posted. You are asked to limit the discussion to appear in one thread per assignment. Each student is required to post at least five times on at least three different days. Non-substantive posts (e.g., “I agree,” or “I need to think about that more”) will not be counted.
During each discussion, each substantive post will receive 1 point (up to 5 points total), and each day posted will receive 1 point (up to 3 points total). Thus full credit will be 8 points. The score posted in Canvas will be a percentage (a score out of 100) based on this 8-point total. Up to two additional points may be assigned at the instructor’s discretion for particularly lively discussions. These additional points will not be “extra credit” but can be used to “make up” points lost elsewhere.
The small group discussion part of the course is worth 15% of the final grade.
During the second module in each module pair (01-02, 03-04, etc.), students will be required to complete a short, 10-question objective-style quiz. The point of the quiz is to provide a “formative assessment” where both the student and the instructor can gain a sense for how well students are learning the material. Because the quizzes are formative, they only account for 10% of the final grade.
Each student will have 30 minutes to complete each quiz, and two attempts will be permitted. All of the questions will be objective-style (multiple choice or true-false), and there will only be 10 questions. No mechanism will be employed to determine if the student is using outside resources to take the quiz (e.g, the book, notes, the web, etc.); however, students are asked to take the quiz with no such resources. This is the best way for the instructor to gain a sense of the level of knowledge of the student.
Remember to answer all questions on both attempts. The system is set up to take your best attempt. It also does not average attempts.
The quizzes are designed to show all 10 questions all at once. Answers will be provided once the student submits their answers. While the time for the quiz is set for 30 minutes, it is possible to take longer. Even so, the student should strive to complete the quiz within the 30 minute timeframe.
Quiz feedback will be released automatically after the due date passes.
Grading will be based on biweekly programming assignments, small group discussions, and short quizzes. Final grades will be determined by the following weighting:
|% of Grade
|Muddy Post Discussions (6)
|Small Group Discussions (5)
|Short Quizzes (6)
|Programming Projects (5)
Each programming assignment will outline the specific requirements and steps to be taken to complete the assignment with associated weights.In terms of assigning a final letter grade, the following is provided as the default scheme. If deemed appropriate, the instructor may adjust these grades downward, for example, to achieve a target of 20% A’s.
For purposes of project management and version control, code repositories such as GitHub and GitLab have demonstrated tremendous benefit. Because of this, students are strongly encouraged to use such repositories for their projects. Be that as it may, there is evidence that such code repositories can also be abused with respect to making previous assignments available to the public, thus fostering academic misconduct. For these reasons the following policies with respect to code repositories are put in place.
1. Should a student decide to use a code repository to manage their programming projects, those repositories must be private and must remain private beyond the end of the course.
2. Re-emphasizing the above, under no circumstances may code from this course be made public, either through a public repository or through posting to another site that collects assignments or code.
3. Under no circumstances may the assignments themselves be posted to a public repository or public site.
4. Should a student discover the assignments have been posted somewhere on the web, the student is asked to report the site to the instructor as soon as possible.
5. Should a student discover the assignments in a public place, under no circumstances shall the student view, download, or otherwise use the information contained on that site.
Questions about these policies should be directed to the instructor.
Deadlines for Adding, Dropping and Withdrawing from Courses
Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.
Academic Misconduct Policy
All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.
This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at email@example.com.
Students with Disabilities - Accommodations and Accessibility
Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.
For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, firstname.lastname@example.org.
Student Conduct Code
The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically.
For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/
JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity.
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).
When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.