705.743.8VL - ChatGPT from Scratch: Building and Training Large Language Models

Artificial Intelligence
Summer 2024

Description

Large language models (LLMs) like ChatGPT have ushered in a new wave of virtual assistants, chatbots, and text generators. Many see them as a paradigm shift in how humans interact with machines. Huge development ecosystems have arisen around LLMs, often abstracting away how they work to make them accessible to more people. While the democratization of this technology is important, LLMs cannot be fully harnessed and improved without understanding their inner workings at a fine level. In this course, students will build a small version of a text generation model like GPT3 over the course of several weeks. They will learn about the details of the GPT architecture from bottom to top, how the GPT architecture came about, and how it is used today in applications like ChatGPT. Once these fundamentals are established, students will build their own research experiment on top of their home-grown language models. Completing this course will prepare students to build and modify language models for further LLM research or novel applications.

Instructor

Course Structure

The course is 12 weeks long (summer) and divided into two parts:

The first seven weeks of the course will focus on the details of the language model architecture and implementing language models in PyTorch. During this period, weekly modules will be used to focus on one component of language models at a time. Each module will have (1) a programming assignment (implementing the discussed component), and (2) a short answer component which will test your understanding of lecture materials and weekly readings (if any). Programming and short answers will be due the following week (one week after they are assigned).

You can access each module by navigating to Course Modules on the course menu. A module may have several sections, and you are encouraged to preview all sections of the module before starting. You should regularly check the Calendar and Announcements for assignment due dates.

The second five weeks of the course will focus on advanced topics in language models which are used to build applications on top of the core technology discussed during the first part of the course. This portion of the course will be project-based, and will begin with students writing a brief project proposal. During these remaining weeks, students will develop a language model variation or research experiment which will be due the final week of the course (both the code and a final write-up). During this time, weekly assignments will consist only of the short answer responses, as the majority of time outside of class should be spent working on the final project.

This course is focused on building hands-on experience with programming LLMs and understanding how they work. The assignments will reflect this goal. There is no midterm or final exam.

If the course is later given in fall or spring, two additional weeks will be added during which students present their final projects to the class. The final project would, in that case, include a presentation requirement and grade.

Course Topics

Each week (module), the course will focus on a different topic.

Language Model Implementation:

Advanced Topics in Language Models:


Course Goals

To understand the inner workings of large language models (LLMs) and gain experience implementing them first-hand, albeit at a small scale. Understand additional technologies which are used to turn a pretrained language model into a powerful piece of software like ChatGPT, and how this process could be used for specific applications of interest. Finally, explore one of these additional technologies or a specific LLM research question in a final project.

Course Learning Outcomes (CLOs)

Textbooks

There is no required textbook for this course. Readings, when assigned, will be from research publications or web content that are publicly available.

Other Materials & Online Resources

Readings will be made available with each module on Canvas.

Required Software

This course will make use of PyTorch (a deep learning library for python) for implementing components of language models, and assumes prior familiarity with using PyTorch for deep learning. 

PyTorch is available for free from the PyTorch website: https://pytorch.org/get-started/locally/

We will use a few other publicly available python packages which will also be free. If these packages are not extremely standard for general python users, we will introduce them during the course.

Specific versions of python and pytorch will be mentioned during the first week of class to make sure everyone is using similar software.

Student Coursework Requirements

It is expected that each module will take approximately 8–10 hours per week to complete. The weekly programming assignments will take 6-8 hours per week, with readings taking an additional 1-2 hours. In the latter half of the course (final projects), expect to spend about 4-6 hours on programming each week and 2-4 hours on your final report.

This course will consist of the following basic student requirements:


Readings and Participation (15% of Final Grade Calculation)

You are responsible for carefully reading any assigned material and responding to short answer questions each week, which will be based on the readings and lecture material. Your responses will be graded for accuracy, critical thinking, and clarity.

Readings and participation is graded as follows:


Programming Assignments (50% of Final Grade Calculation)

The majority of coursework will consist of implementing components of LLMs in python / pytorch. Program interfaces will be provided (i.e. empty class or method definitions) which should be used to structure your programs. Assignments will be submitted as .py files, and graded for code clarity and functionality (code should be both readable (clarity) and should execute correctly (functionality)). 

Clarity (30%): 

Your code should be well-organized and include helpful comments where appropriate. There is no specific formatting standard required or a need to include unit tests, but as this is a 700-level course there is an expectation of quality code. Clarity grading guidelines include:

Functionality (70%):

Submitted programs will be tested against a reference implementation and be expected to produce similar results. Any edge cases or data formatting specifics will be made clear in the assignment instructions. You are encouraged to create your own test cases to validate that your code runs correctly.

Submissions will be expected to run without errors and in a reasonable amount of time*. If your submission fails to run you will receive zero credit for functionality.

The quantitative score for functionality will be taken directly from the number of test cases that are passed. Specific details will accompany each assignment as to the type and weight of test cases, if not uniform.

* reasonable amount of time: An amount of time such that the practical grading of assignments is not impeded. There is no need to prove a specific theoretical runtime. Runtime will generally not be a significant concern unless your program gets stuck in an infinite loop or is orders of magnitude slower than the reference implementation.

Final Project (35% of Final Grade Calculation)

A course project will be assigned several weeks into the course. In lieu of weekly programming assignments, the last few weeks of the course should be spent on the final project. Weekly readings with short answers will continue.

The final project will consist of making a variation to the GPT transformer architecture, training loop, or inference loop, and testing how this impacts the model. You may invent your own variation of interest or implement something from existing literature. You should test your variation through experiments that show how your variation relates with other hyperparameters (i.e. what is the relationship between your design choice (and any parameters therein), with sequence length, training time, or model size, etc?). Specific examples will be given with the assignment. If you are re-implementing something from literature, seek to do something different than simply reproducing their results as well.

Note: You do not need to improve the existing GPT architecture (which has been optimized by the research community). You are simply tasked with understanding a specific design choice of GPT models and how it influences overall performance. A null outcome will not be penalized if it is reasonably motivated.

You may form groups of up to 3 students if you wish, but this is not required (can be individual). Groups can submit together, but will be held to a higher standard than individuals. 

The final project will have the following components, which are each graded separately:

  1. Project Proposal (20%): A written proposal including your intended variation of the GPT model, experiments to evaluate, and hypothesis as to the impact of the variation. Include all team member’s names. The proposal must show critical thinking as to the variation introduced, why it may be significant, and how it will be evaluated. 

The proposal is also an opportunity to get feedback on your ideas before you begin. You will be expected to incorporate proposal feedback into your final submission.

  1. Programming Component (40%): A working implementation of your variation and code to reproduce your experiments, with instructions to run them. This will be graded similarly to other programming assignments (clarity, functionality, ease of use, etc). Your results should be reproducible by running your provided code. Specific requirements will be given when the course project is assigned.
  2. Final Report (40%): A multi-page report detailing the typical workings of the GPT model, your proposed change and its motivation, your experiments, the results of your experiments, and conclusions that can be drawn from these results. This will follow a typical research paper / lab report structure as you may have seen in previous courses. Specific requirements (sections and content) will be given when the course project is assigned.

Final Project is graded as follows:

Grading Policy

Assignments are due according to the dates posted in your Canvas course site. You may check these due dates in the Course Calendar or the Assignments in the corresponding modules. I will post grades one week after assignment due dates.

We generally do not directly grade spelling and grammar. However, egregious violations of the rules of the English language will be noted without comment. Consistently poor performance in either spelling or grammar is taken as an indication of poor written communication ability that may detract from your grade.

A grade of A indicates achievement of consistent excellence and distinction throughout the course—that is, conspicuous excellence in all aspects of assignments and the final project.

A grade of B indicates work that meets all course requirements on a level appropriate for graduate academic work.

EP uses a +/- grading system (see “Grading System”, Graduate Programs catalog, p. 10).

100-98 = A+
97-94 = A
93-90 = A−
89-87 = B+
86-83 = B
82-80 = B−
79-77 = C+
76-73 = C
72-70 = C−
69-67 = D+
66-63 = D
<63 = F

Final grades will be determined by the following weighting:
- Readings and Participation (15%)
- Programming Assignments (50%)
- Final Project (35%)

Course Policies

External Code Resources

The subject of this course, building a language model in pytorch, is a popular one. There are many tutorials, blog posts, videos, and code repositories online which will be very similar to the programming assignments in this course. Note that these vary widely in quality. The most important thing to me is that students understand the material, so I will not (and practically could not) prevent students from accessing this content. In fact, I encourage it if it is used in a beneficial way that increases understanding. My guidelines would be:


Generative AI

Do not use generative AI to author anything your turn in. Yes, it would be very ironic to do so for this course. Please do not do this.

You can of course use LLMs to help you understand how LLMs work (i.e. probe a model and observe the response). We will even be doing this in lectures. However, there is a clear line between observing LLMs and having LLMs do your work for you.

Special Considerations for Due Dates

If you are unable to meet a specific deadline, due to either the demands of life or because you are falling behind in the course, please reach out to me. I am happy to grant extensions where appropriate and point you to additional resources. I will have regular office hours and check my email frequently. However, if extensions or exceptions become a regular practice, we may need to reconsider your enrollment in the course.

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.

This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.

For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, ep-disability-svcs@jhu.edu.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. 

For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. 
 
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.