705.744.8VL - Deep Learning Using Transformers

Artificial Intelligence
Fall 2024


Transformer networks are a new trend in Deep Learning. In the last decade, transformer models dominated the world of natural language processing (NLP) and have become the conventional model in almost all NLP tasks. However, developments of transformers in computer vision were still lagging. In recent years, applications of transformers started to accelerate. This course will introduce the attention mechanism and the transformer networks by understanding the pros and cons of this architecture. The importance of unsupervised or semi-supervised pre-training for the transformer architectures, as well as their impact for developments of large-scale foundation model. This will pave the way to introduce transformers in computer vision. Additionally, the course aims to extend the attention idea into the 2D spatial domain for image datasets, investigate how convolution can be generalized using self-attention within the encoder-decoder meta architecture, analyze how this generic architecture is almost the same in image as in text and NLP, which makes transformers a generic function approximator, and discuss the channel and spatial attention, local vs. global attention among other topics. Furthermore, we will also study different neural architectures that are designed for several fundamental tasks in computer vision, namely, classification, object detection, semantic and instance segmentation. In particular, vision transformer, pyramid vision transformer, shifted window transformer (Swin), Detection Transformer (DETR), segmentation transformer (SETR), and many others will be explored. The course also examines the application of Transformers in video understanding with focus on action recognition and instance segmentation and will emphasize recent developments of transformers in large-scale pre-training and multimodal learning covering self-supervised learning, contrastive learning with masked image modeling, multimodal learning, and vision foundation models.

Expanded Course Description

EN.705.643 or equivalent PyTorch experience. Solid understanding of deep learning basics will be very helpful but not required.


Default placeholder image. No profile image found for Vince Zhu.

Vince Zhu


Course Structure

The course materials are divided into modules which can be accessed by clicking Modules on the course menu. A module will have several sections including the overview, content, readings, discussions, and assignments. You are encouraged to preview all sections of the module before starting. Most modules run for a period of seven (7) days, exceptions are noted in the Course Outline. You should regularly check the Calendar and Announcements for assignment due dates.

Course Topics

Course Goals

To examine the details of how transformers work and dive deep into the various designs of transformers for different benchmark tasks in computer vision. Develop theoretic understanding and hands-on experience with PyTorch-based implementations. Adopt a research perspective and expose students to state-of-the-art domain solutions using transformers.

Course Learning Outcomes (CLOs)


There is no single textbook to be used for this course. Each module contains self-contained slides, papers, or other reading materials.


Additionally, any of the following texts or other resources that you may have from previous courses may be useful for this course if you find yourself struggling with specific skills:

Required Software


You will need access to a recent version of PyTorch, which is fully open-sourced and free for download.

There are several different ways to install PyTorch. We briefly provide two installation methods below. Feel free to use other installation alternatives that you may find via different sources.

All implementations (i.e., coding) are expected to be completed using Python and Google Colab Notebook (or Jupyter Notebook). For those with access to local machines or remote servers with GPUs, I strongly recommend that students modularize their code and execute via command line. Specifically, students are encouraged to develop intermediate experience with the following topics:

Insufficient preparations in some categories may demand extra time commitment from students. If you either took these courses recently or maintained a decent recollection of roughly 70% or above on these concepts, you should be considered well-prepared. A solid PyTorch experience is extremely important.

Student Coursework Requirements

It is expected that each module will take approximately 7–10 hours per week to complete. Here is an approximate breakdown: reading the assigned sections of the texts (approximately 3–4 hours per week) as well as some outside reading, listening to the audio annotated slide presentations (approximately 2–3 hours per week), and writing assignments (approximately 2–3 hours per week).

This course will consist of the following basic student requirements:

Assignments (60% of Final Grade Calculation)

Assignments will include a mix of qualitative assignments (e.g. literature reviews, model summaries), quantitative problem sets, and case study updates. Include a cover sheet with your name and assignment identifier. Also include your name and a page number indicator (i.e., page x of y) on each page of your submissions. Each problem should have the problem statement, assumptions, computations, and conclusions/discussion delineated. All Figures and Tables should be captioned and labeled appropriately.

All assignments are due according to the dates in the Calendar.

Late submissions will be reduced by one letter grade for each week late (no exceptions without prior coordination with the instructors).

If, after submitting a written assignment you are not satisfied with the grade received, you are encouraged to redo the assignment and resubmit it. If the resubmission results in a better grade, that grade will be substituted for the previous grade.

Qualitative assignments are evaluated by the following grading elements:

  1. Each part of question is answered (20%)
  2. Writing quality and technical accuracy (30%) (Writing is expected to meet or exceed accepted graduate-level English and scholarship standards. That is, all assignments will be graded on grammar and style as well as content.)
  3. Rationale for answer is provided (20%)
  4. Examples are included to illustrate rationale (15%) (If you do not have direct experience related to a particular question, then you are to provide analogies versus examples.)
  5. Outside references are included (15%)
Qualitative assignments are graded as follows:

Quantitative assignments are evaluated by the following grading elements:

  1. Each part of question is answered (20%)
  2. Assumptions are clearly stated (20%)
  3. Intermediate derivations and calculations are provided (25%)
  4. Answer is technically correct and is clearly indicated (25%)
  5. Answer precision and units are appropriate (10%)

Quantitative assignments are graded as follows:

Course Project (40% of Final Grade Calculation)

A course project will be assigned several weeks into the course. The next-to-the-last week will be devoted to the course project.

The course project is evaluated by the following grading elements:

  1. Student preparation and participation (as described in Course Project Description) (40%)
  2. Student technical understanding of the course project topic (as related to individual role that the student assumes and described in the Course Project Description) (20%)
  3. Team preparation and participation (as described in Course Project Description) (20%)
  4. Team technical understanding of the course project topic (as related to the Customer Team roles assumed by the students and the Seller Team roles assumed by the students and described in the Course Project Description) (20%)
Course Project is graded as follows:

Grading Policy

Assignments are due according to the dates posted on your Canvas course site. You may check these due dates in the Course Calendar or the Assignments in the corresponding modules. I will post grades one or two weeks after assignment due dates.

We generally do not directly grade spelling and grammar. However, egregious violations of the rules of the English language will be noted without comment. Consistently poor performance in either spelling or grammar is taken as an indication of poor written communication ability that may detract from your grade.

A grade of A indicates achievement of consistent excellence and distinction throughout the course—that is, conspicuous excellence in all aspects of assignments and discussion every week.

A grade of B indicates work that meets all course requirements at a level appropriate for graduate academic work. These criteria apply to both undergraduates and graduate students taking the course.

100-90 = A
89-80 = B
79-70 = C
<70 = F

Final grades will be determined by the following weighting. The grade cut-offs provided above are simply guidelines. I reserve the right to curve the final letter grades based upon the actual class performance.


% of Grade



Course Project


EP uses a +/- grading system (see “Grading System”, Graduate Programs catalog, p. 10).

Score RangeLetter Grade
100-97= A+
96-93= A
92-90= A−
89-87= B+
86-83= B
82-80= B−
79-77= C+
76-73= C
72-70= C−
69-67= D+
66-63= D
<63= F

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.

This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.

For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, ep-disability-svcs@jhu.edu.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. 

For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. 
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.