705.651.8VL - Large Language Models: Theory and Practice

Artificial Intelligence
Fall 2023

Description

An apparently new breed of neural network -- the large language model (LLM) -- figures increasingly in today's news: ChatpGPT and Microsoft's new chatbot-like Bing Chat interface seem to garner headlines on the daily. This course constitutes a thorough introduction to this technology, tracing the historical threads in computational linguistics and language modeling that led to it, and exploring the design patterns that underpin its application in modern AI systems. In between, students will learn about language modeling, the attention mechanism, prompt and instruction tuning, composability, quantization, low-rank adaptation, and the wealth of software and hardware optimizations that enable LLMs to be used at scale and with acceptable latencies.

Instructor

Profile photo of Samuel Barham.

Samuel Barham

Course Structure

The course materials are divided into modules which can be accessed by clicking Course Modules on the course menu. A module will have several sections including the overview, content, readings, and discussions. You are encouraged to preview all sections of the module before starting. Most modules run for a period of seven (7) days, exceptions are noted in the Course Outline. You should regularly check the Calendar and Announcements for assignment due dates.

This will largely be conducted a seminar course. This is done mostly out of a desire to ensure the curriculum remains flexible in the face of a constantly changing field – unlike other fields, like Ancient Roman archaeology, which may remain mostly stable for years until a new breakthrough discovery is made, the field of generative AI (and LLMs in particular) is liable to tectonic shifts on a monthly – or even weekly -- basis.

 

After the first three weeks, which will consist of introductory lectures by the instructor, students will begin to present papers to the class for discussion. At the start of each lesson, the instructor will introduce the topic at hand; after this introduction, one by one, students will present the papers they were assigned and, after this brief presentation, will help lead the class in a thorough discussion of the paper. It is expected that three students will present at each lesson. Presenters will be graded on their understanding of the material, their thoroughness, the quality of their presentation, and their ability to lead the class in an interesting and informative discussion; but the rest of the class will also be graded on the quality of their active participation in that discussion.

 

Towards the beginning of the semester, students will also form into groups for the purpose of completing a semester-long project. This project is (intentionally) left quite open-ended -–its purpose, to put it crudely, is "to build something cool using large language models." The intent is force students to engage hands-on with the technology under discussion – to gain practical experience applying it to real-world domains, as well as to become familiar with the technology stacks that are growing up around it (tuning libraries, fast inference frameworks, prompt hubs, prompt chaining toolkits, front-end libraries, etc.). Students will check in regularly with the instructor throughout the semester; milestones are marked clearly on the schedule below. The project will culminate in a 20-minute end-of-semester presentation, given as a group. The structure and content of the presentations is entirely at the discretion of the students involved, but it must be (1) clear, (2) engaging, and (3) compelling. Example content may range from background information, to related work, introductions to technologies and software packages involved, discussions of the software architecture employed, demonstrations of the developed application, discussions of future work or publishing opportunities, etc.

 

There will be no final exam, and no regular graded homework assignments beyond the preparation of papers for discussion; there will, however, be recommended (open-ended) exercises designed to acquaint students with LLM-based technologies. The goal of this course is, after all, not to pass on a static and well-defined body of knowledge – for such does not yet exist in the burgeoning field of generative AI – but rather to equip students with the tools to grapple with, learn, and explore the field on their own, and with their own peculiar purposes in mind.

Course Topics


Course Goals

To acquaint students with the broad history of NLP and language representation prior to the deep learning revolution; to familiarize students with the transformer architecture – which underpins modern large language models – and the ways in which it builds upon earlier attention mechanisms, while employing these in an entirely novel and paradigm-shifting way. To engage students in thinking about the consequences of scale, both in model parameter count and in training dataset size, upon a model’s capabilities. To steep students in the modern paradigm of composable LLM development – from prompt chaining, to ReAct-style agentive systems – giving them practical experience with the software libraries used to build such applications.

Course Learning Outcomes (CLOs)

Textbooks

This a seminar course, so there are no required texts; readings will be drawn directly from the literature and discussed on a weekly basis. Since the literature represents a living and dynamic canon, and the field is changing at such a rapid pace that even experts struggle at times to keep up, scheduled readings – fixed only loosely in advance – are subject to change at any moment up until the preceding week; that is, the readings pertaining to a particular Module will be considered fixed at the end of the preceding Module’s discussion.

Other Materials & Online Resources

Textbooks on Syntax and Semantics:

 

Textbooks on Pragmatics:

 

Textbooks on Traditional Computational Linguistics and NLP:

 

Textbooks on Deep Learning:

 

More Hands-on Books on All of the Above:

 

Texts on GAI/LLMs:

Please understand – this list is by no means exhaustive. Learning is additive. No time spent practicing a discipline – any discipline – is wasted. Treat these books as etudes: they are to be consumed with hands around a pencil, or on a keyboard. “Doing is learning,” to echo the philosophy of John Dewey.

Required Software

Python Development Environment

Just as LISP served as the language of logic-based AI in the past, so Python is the de facto language of modern, stochastic AI. Over the course of the semester, you will be expected to engage seriously with some subset of the suggested (optional) homework assignments; in most cases, it will be most convenient to interface with the relevant software libraries in Python.

In general, we find as software engineers that our jobs are made much easier – are alleviated of many otherwise persistent headaches – when conducted primarily in a UNIX-based environment. If at all possible, do your work in UNIX – either on a physical Linux machine, or a Linux VM, or on a Mac.

This being said, your choice of development environment is just that – your choice. Nonetheless, certain best practices obtain. The community has largely settled around:

These are no more than general guidelines, however; while many developers rely daily on these tools, any number of other combinations of open-source tools are not only common, but entirely legitimate and effective. Explore the world of available tools, assess them carefully and intelligently, and select your toolkit wisely.

Student Coursework Requirements

Final grades will be calculated as a 30/30/40 weighting of the following three components:

Paper Presentations/Discussion Leadership (30% of Final Grade Calculation)

As this is a seminar course, most of our learning will come directly from the reading of real and current scholarly papers in the field.

During most weeks, the papers will be presented by students. These students will have thoroughly read one of the designated papers for the upcoming module, and will have prepared a short PowerPoint presentation summarizing the paper’s key points, and potentially elaborating on some of the thornier or more confusing aspects of the paper. They will present these slides, and – once finished – will proceed to lead the class in a close discussion of the paper.

You – as all AI practitioners – may at first struggle to understand a given paper. In cases like these, the first recourse (even for professionals!) is to blog posts and YouTube videos that attempt to explain or contextualize the paper or relevant concept; if you still find the paper difficult to understand, reach out to me, either via e-mail or – preferably – during office hours.

Students will sign up for the paper of their choice using a Google Sheet, to be posted in the Announcements section of the course sometime during the second week of class.

These paper presentations, and leadership of the subsequent class discussion, will constitute a significant portion of your grade in this class. The following rubric will be used to assess student paper presentations:

  1. Understanding and Clarity (10 points)
  1. Elaboration and Critical Analysis (10 points)
  1. Presentation Skills (5 points)
  1. Leading Discussion (10 points)
  1. Overall Impact (5 points)

Total Points: 40

Active and respectful participation in discussions led by peers is an important aspect of this seminar-style course. Engaging with the material and contributing constructively will enhance the overall learning experience for everyone.

NB: Because of odd number of students in the class, it may be necessary for some students present a second, additional, paper towards the end of the semester; this additional presentation will be entirely voluntary and will not be graded.

Preparation and Participation (30% of Final Grade Calculation)

You are responsible for carefully reading all assigned material and being prepared for discussion. As noted above, the majority of readings will be drawn from the current literature, and the schedule of readings will be added to the calendar by no later than the second week of class. You are welcome to read ahead, of course.

Participation in weekly paper discussions will contribute substantially to your final grade. I will assess this in the following wise: least once a discussion, you must make a substantive contribution to the conversation, in the form of an interesting comment, reflection, or question material to the talk at hand. You will be graded each week, quite simply, on a binary scale as to whether or not you have done so.

A discussion board will also be opened in which students are invited to post their latest interesting LLM-related finds – new papers, technical blogs or blog posts, tutorials, YouTube videos, etc. I would like students to cultivate on this discussion board a lively exchange of ideas and materials; to gently encourage this, you will be graded, each week, on a binary scale as to whether or not you have contributed in some way to the discussion.

Course Project (40% of Final Grade Calculation)

A course project will be assigned several weeks into the course. Students will form into groups of 3 (because enrollment is capped at 19, there will be one group of 4) to complete the project.

Students are asked to imagine, design, and build a composable LLM-based application. The application’s purpose is up to students – modulo the instructor’s approval. The scope of the project is left intentionally vague; its only goal is to give students a chance to work, hands-on, with large language models and the technology stack the community has developed around them.

There are few bad ideas here, so long as the scale of the proposed project is commensurate with the amount of time students are given to work on it – over half a semester.

Project milestones are noted in the course outline, and have also been added to the calendar. I’ll summarize them here:

Projects will be graded according to the following rubric:

  1. Project Concept and Innovation (20 points)
  1. Technical Implementation (30 points)
  1. Progress and Milestones (15 points)
  1. Presentation and Communication (10 points)
  1. Group Collaboration (20 points)
  1. Overall Quality and Completion (5 points)

Total Points: 100

Submission Protocol:

On the day before the final presentation, students will e-mail the instructor (1) a link to the project’s repository, and (2) the finished PowerPoint presentation (or similar visual medium) to be used in the following day’s presentation.

Late Submission Policy: Late submissions will be subject to a deduction of 10% of the project's total points per day, up to a maximum of 3 days. Submissions later than 3 days will not be accepted unless under exceptional circumstances, as determined by the instructor.

NB: Plagiarism will result in severe academic consequences, as outlined in the university's policies, further below.

Grading Policy

A grade of A indicates achievement of consistent excellence and distinction throughout the course—that is, conspicuous excellence in all aspects of assignments and discussion in every week.

A grade of B indicates work that meets all course requirements on a level appropriate for graduate academic work. These criteria apply to both undergraduates and graduate students taking the course.

EP uses a +/- grading system (see “Grading System”, Graduate Programs catalog, p. 10).

Score RangeLetter Grade
100-97= A+
96-93= A
92-90= A−
89-87= B+
86-83= B
82-80= B−
79-77= C+
76-73= C
72-70= C−
69-67= D+
66-63= D
<63= F

In this course, final grades will be determined by the following weighting:

Item

% of Grade

Participation in Weekly Seminar Discussions

30%

Paper Presentations/Discussion Leadership

30%

Course Project

40%


Course Policies

Attendance and active participation are essential components of this seminar course. However, I understand that circumstances might arise that could prevent you from attending a lecture. If you find yourself needing to miss a lecture, please adhere to the following policy:

1. Advance Notice: Whenever possible, provide advance notice of your absence. You can do this by sending an email to the instructor at least 24 hours before the scheduled lecture. This notice will help the instructor better accommodate your absence and plan accordingly.

2. Catch Up on Material: It is your responsibility to catch up on any missed material. Check the course syllabus and any announcements or resources shared on the course platform to find out what was covered during the lecture you missed. If there are any assigned readings or materials related to the lecture, make sure to review them. 

3. Classmate Collaboration: You are encouraged to reach out to a classmate to get notes or summaries from the lecture you missed. Engaging in peer-to-peer collaboration can provide valuable insights and help you stay up to date with the course content.

4. Participation Points: If the missed lecture included a presentation or discussion led by you, please notify the instructor as soon as possible. Depending on the circumstances, the paper content, and the course structure, there might be an opportunity to reschedule your presentation or fulfill your discussion leadership responsibilities in an alternative manner.

5. Recording and Materials: Lectures will generally be recorded and links to the Zoom recordings posted on the Calendar. Of course, relying solely on recordings is not a substitute for active participation and engagement during the live lectures.

Remember that attendance and participation contribute to your overall learning experience and understanding of the course material. Regular engagement with the course content through lively discussion with your peers will only enhance your understanding of the material.

Please note that repeated or unexplained absences might impact your overall participation grade for the course. If you encounter persistent challenges attending lectures, consider reaching out to the instructor to discuss potential solutions.

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.

This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.

For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, ep-disability-svcs@jhu.edu.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. 

For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. 
 
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.