The abridged syllabus you’re looking for has not been entered yet. We’ve provided a similar abridged syllabus as an example. Check back later for that specific abridged syllabus. The complete syllabus will be available in your Canvas course.

705.743.8VL - ChatGPT from Scratch: Building and Training Large Language Models

Artificial Intelligence

Fall 2025

Description

Large language models (LLMs) like ChatGPT have ushered in a new wave of virtual assistants, chatbots, and text generators. Many see them as a paradigm shift in how humans interact with machines. Huge development ecosystems have arisen around LLMs, often abstracting away how they work to make them accessible to more people. While the democratization of this technology is important, LLMs cannot be fully harnessed and improved without understanding their inner workings at a fine level. In this course, students will build a small version of a text generation model like GPT3 over the course of several weeks. They will learn about the details of the GPT architecture from bottom to top, how the GPT architecture came about, and how it is used today in applications like ChatGPT. Once these fundamentals are established, students will build their own research experiment on top of their home-grown language models. Completing this course will prepare students to build and modify language models for further LLM research or novel applications.

Instructors

Ted Staley

edward.staley@jhuapl.edu

Erhan Guven

eguven2@jhu.edu

Course Structure

The course materials are divided into modules, one for each week of the course. All course materials and assignments will be housed in Canvas and Microsoft Teams. The module content can be accessed by clicking Course Modules on the left menu. A module will have several sections including the overview, content, readings, discussions, and assignments. You are encouraged to preview all sections of the module before starting. Most modules run for a period of seven (7) days, exceptions are noted in the Course Outline. You should regularly check the Calendar and Announcements for assignment due dates.

Course Topics

Introduction
Tokenization
Embeddings
Multi Head Attention
Transformers
Training LLMs
Sampling and Inference
Practical Considerations
Scaling Laws
Fine-Tuning and RAGs
Parameter Efficient Tuning
Quantization
Reinforcement Learning Human Feedback
Multimodality

Course Goals

Implement tokenization, embeddings, and attention mechanisms in building language models.
Construct and train transformer architectures while analyzing scaling behaviors.
Design inference routines using sampling strategies and efficient generation techniques.
Adapt models through fine-tuning, parameter-efficient tuning, and quantization.
Extend models with reinforcement learning from human feedback, retrieval augmentation, and multimodality.

Textbooks

There is no required textbook for this course. Readings, when assigned, will be from research publications or web content that are publicly available.

Student Coursework Requirements

for more information.It is expected that each module will take approximately 8–10 hours per week to complete. The weekly programming assignments will take 6-8 hours per week, with readings taking an additional 1-2 hours. In the latter half of the course (final projects), expect to spend about 4-6 hours on programming each week and 2-4 hours on your final report.
This course will consist of the following basic student requirements:
Participation (10% of Final Grade Calculation)
You are responsible for coming to lectures every week. Please let me know if you are unable to attend. I am flexible with this requirement on a case-by-case basis (illness, work conflicts, life), but if a pattern emerges then you will lose points from this portion of your grade, to be determined at the end of the semester.
Readings (10% of Final Grade Calculation)
You are responsible for carefully reading any assigned material and responding to short answer questions each week, which will be based on the readings and lecture material. Your responses will be graded for accuracy, critical thinking, and clarity.
Readings are graded as follows:
● 100–90 = A— Short answer responses include accurate references to lecture or reading material, and show personal insight or thought into the mechanisms discussed, including an understanding of how the material fits into the larger LLM paradigm. Writing is clear and concise.
● 89–80 = B— Short answer responses contain mostly accurate references to lecture or reading material, and show understanding of the mechanisms discussed, perhaps only in isolation. Writing is generally easy to follow.
● 79–70 = C— Short answer responses contain statements of mixed accuracy or few factual references, and may only present a partial understanding of the given material. Writing is cumbersome or only partially organized.
● <70 = F— Short answer responses do not adequately reflect an understanding of the material or the student having spent time with the material. Writing is hard to understand or lacking in structure and content.
Programming Assignments (50% of Final Grade Calculation)
The majority of coursework will consist of implementing components of LLMs in python / pytorch. Program interfaces will be provided (i.e. empty class or method definitions) which should be used to structure your programs. Assignments will be submitted as .py files, and graded for code clarity and functionality (code should be both readable (clarity) and should execute correctly (functionality)).
Clarity (30%):
Your code should be well-organized and include helpful comments where appropriate. There is no specific formatting standard required or a need to include unit tests, but as this is a 700-level course there is an expectation of quality code. Clarity grading guidelines include:
● 100–90 = A— Code is well formatted and easy to understand. Logical variable and method names are used, and common routines are broken into separate classes or methods where appropriate. Subtleties or complex components are clearly explained with comments.
● 89–80 = B— Code is reasonably formatted and relatively easy to decipher. Comments are generally helpful and are included at critical points of potential confusion. Some common routines are separated into their own classes or methods.
● 79–70=C— Code takes some effort to understand and comments are either infrequent or not helpful. Code routines are somewhat bloated or not well organized. Variable naming is poor.
● <70=F— Code is very hard to understand and takes considerable effort to trace through. Code organization is non-existent or counterproductive. Comments are either not present or not helpful.
Functionality (70%):
Submitted programs will be tested against a reference implementation and be expected to produce similar results. Any edge cases or data formatting specifics will be made clear in the assignment instructions. You are encouraged to create your own test cases to validate that your code runs correctly.
Submissions will be expected to run without errors and in a reasonable amount of time*. If your submission fails to run you will receive zero credit for functionality.
The quantitative score for functionality will be taken directly from the number of test cases that are passed. Specific details will accompany each assignment as to the type and weight of test cases, if not uniform.
* reasonable amount of time: An amount of time such that the practical grading of assignments is not impeded. There is no need to prove a specific theoretical runtime. Runtime will generally not be a significant concern unless your program gets stuck in an infinite loop or is orders of magnitude slower than the reference implementation.
Final Project (30% of Final Grade Calculation)
A course project will be assigned several weeks into the course. In lieu of weekly programming assignments, the last few weeks of the course should be spent on the final project. Weekly readings with short answers will continue.
The final project will consist of making a variation to the GPT transformer architecture, training loop, or inference loop, and testing how this impacts the model. You may invent your own variation of interest or implement something from existing literature. You should test your variation through experiments that show how your variation relates with other hyperparameters (i.e. what is the relationship between your design choice (and any parameters therein), with sequence length, training time, or model size, etc?). Specific examples will be given with the assignment. If you are re-implementing something from literature, seek to do something different than simply reproducing their results as well.
Note: You do not need to improve the existing GPT architecture (which has been optimized by the research community). You are simply tasked with understanding a specific design choice of GPT models and how it influences overall performance. A null outcome will not be penalized if it is reasonably motivated.
You may form groups of up to 3 students if you wish, but this is not required (can be individual). Groups can submit together, but will be held to a higher standard than individuals.
The final project will have the following components, which are each graded separately:
1. Project Proposal (10%): A written proposal including your intended variation of the GPT model, experiments to evaluate, and hypothesis as to the impact of the variation. Include all team member’s names. The proposal must show critical

Grading Policy

EP uses a +/- grading system (see “Grading System”, Graduate Programs catalog, p. 10).

Score Range	Letter Grade
100-98	= A+
97-94	= A
93-90	= A−
89-87	= B+
86-83	= B
82-80	= B−
79-77	= C+
76-73	= C
72-70	= C−
69-67	= D+
66-63	= D
<63	= F

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students. This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University is committed to providing welcoming, equitable, and accessible educational experiences for all students. If disability accommodations are needed for this course, students should request accommodations through Student Disability Services (SDS) as early as possible to provide time for effective communication and arrangements. For further information about this process, please refer to the SDS Website.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.