In recent times, Large Language Models (LLMs) have earned the attention of the world. OpenAI’s infamous generative LLM, ChatGPT, became the fastest-growing consumer application in history in only two months–and the feverish interest around LLMs continues to grow. Large self-supervised (pre-trained) models (such as LLMs) have transformed various data-driven fields, such as natural language processing (NLP). In this course, students will gain a thorough introduction to self-supervised learning techniques for NLP applications. Through lectures, assignments, and a final project, students will learn the necessary skills to design, implement, and analyze their own self-supervised neural network models using the PyTorch framework.
Although there are no formal prerequisites, students should be comfortable with:
The course materials are divided into modules which can be accessed in Canvas. A module will have several sections including the overview, content, readings, discussions, and assignments. You are encouraged to preview all sections of the module before starting. All modules run for a period of seven (7) days. You should regularly check Canvas for assignment due dates.
In this course, students will gain a thorough introduction to self-supervised learning techniques for NLP applications. Through lectures, assignments, and a final project, students will learn the necessary skills to design, implement, and understand their own self-supervised neural network models using the PyTorch framework. Students will not only develop self-supervised models, they will also gain the expertise necessary to critique other existing implementations through their understanding of modern research trends.
There are no textbooks that we will follow on a consistent basis. Rather, sections of the relevant texts will be provided at each new module. All readings will be freely available online.
Optional readings will be provided as needed throughout the semester. All readings will be freely available online.
In terms of software for delivery of the course material, we will be using GitHub Classroom to host assignments, Discord to host discussions, and Canvas to host midterm quizzes.
For the software component of our assignments, a requirements.txt file is included to help set up a software environment. You should use an environment manager, like the Anaconda distribution, to manage these environments.It is expected that each module will take approximately 7–10 hours per week to complete. Here is an approximate breakdown: reading the assigned sections of the texts (approximately 2-3 hours per week) as well as some outside reading, listening to the audio annotated slide presentations (approximately 2 hours per week), and software assignments (approximately 3–5 hours per week).
Be sure to refer to the Course Policies section of the Syllabus.
This course will consist of the following basic student requirements:
Assignments (35% of Final Grade Calculation)Assignments will include a mix of theoretical and hands-on software exercises. All assignments are due according to the dates in the Calendar. Assignments will be submitted through GitHub Classroom.
For Assignment 1, all submissions will be completed individually. For the remaining homework assignments, you are welcome to work in groups of 2-3 students. All the assignments are designed to be completed individually in their given timeframes, although you may benefit from explaining concepts to others and learning alongside your classmates.
That said, all the work you submit must be your own. This explicitly does not include simply copying work from other students. So, it is acceptable to study together and collaborate on general approaches to homework problems, but it is unacceptable to copy solutions from other students or from online. If you decide to work with other students on the homework, note all names of your collaborators alongside your own. If you’re unsure, ask the instructor.
As this is an NLP course, it is not lost on the instructor that these models can assist you in completing your work in profound ways. The habits you adopt in using generative models are yours to choose, but you should strive to work with these technologies responsibly as a study aid, not a fast-track to an answer. For instance, there is incredible benefit to reasoning carefully through the assignments—the learning is not in choosing the final answer but rather in the process that brings you there. If you use large language models in any part of your work, you must provide citations for the code that was generated by these models.
There is a single one-week extension automatically applied to the first assignment that you submit beyond a deadline. Otherwise, extensions to due dates should be discussed with the instructor on a case-by-case basis.Assignments are evaluated by the following grading elements:
Assignments are graded as follows:
A Course Project will be assigned in Module 7, which you will complete in groups of 3-4 students. The final week will be devoted to Course Project presentations.
The final project is broken into three deliverables:
Refer to the Course Project Overview for more information.
There are two exams to be completed individually, one in Module 3 and the other in Module 8. For each exam, they will be released when a module begins and due at that same module’s end, giving you a 7-day window to complete the exam.
You will have three hours to complete each exam once you open the file on Canvas. You may use the course resources (slides, recordings, notes, texts) to complete the exams but no other resources.
Assignments are due according to the dates posted in the Canvas site. I will post grades one week after assignment due dates.
EP uses a +/- grading system (see “Grading System”, Graduate Programs catalog, p. 10).
Score Range | Letter Grade |
---|---|
100-97 | = A+ |
96-93 | = A |
92-90 | = A− |
89-87 | = B+ |
86-83 | = B |
82-80 | = B− |
79-77 | = C+ |
76-73 | = C |
72-70 | = C− |
69-67 | = D+ |
66-63 | = D |
<63 | = F |
Final grades will be determined by the following weighting:
Item | % of Grade |
Assignments | 35% |
Course Project | 35% |
Exams | 30% (15% + 15%) |
Canvas will be used to host postings of assignments and other deliverables. Assignments will be submitted through GitHub Classroom. A Discord server will be used for any questions about lecture topics and homework. The purpose of Discord is for the students to have a forum to collaborate with one another. Other questions (i.e., sensitive or private matters) should be emailed directly to the instructor. Do not post comments on Canvas assignments; please send an email instead if you’d like to discuss feedback.
In this class, you are encouraged to work with fellow students on assignments—in fact, for the project, you must work in a group. That said, all the work you submit must be your own. This explicitly does not include simply copying work from other students. So, it is acceptable to study together and collaborate on general approaches to homework problems, but it is unacceptable to copy solutions from other students or from online. If you decide to work with other students on the homework, note all names of your collaborators alongside your own. If you’re unsure, ask the instructor.[MW1]
As this is an NLP course, we recognize that LLMs can assist you in completing your work in profound ways. The habits you adopt in using generative models are yours to choose, but you should strive to work with these technologies responsibly as a study aid, not a fast-track to an answer. If you use large language models in any of your work, you must provide citations for the code that was generated by these models.Deadlines for Adding, Dropping and Withdrawing from Courses
Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.
Academic Misconduct Policy
All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students. This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.
Students with Disabilities - Accommodations and Accessibility
Johns Hopkins University is committed to providing welcoming, equitable, and accessible educational experiences for all students. If disability accommodations are needed for this course, students should request accommodations through Student Disability Services (SDS) as early as possible to provide time for effective communication and arrangements. For further information about this process, please refer to the SDS Website.
Student Conduct Code
The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/
Classroom Climate
JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).
Course Auditing
When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.