535.741.81 - Optimal Control and Reinforcement Learning

Mechanical Engineering
Spring 2024

Description

This course will explore advanced topics in nonlinear systems and optimal control theory, culminating with a foundational understanding of the mathematical principals behind Reinforcement learning techniques popularized in the current literature of artificial intelligence, machine learning, and the design of intelligent agents like Alpha Go and Alpha Star. Students will first learn how to simulate and analyze deterministic and stochastic nonlinear systems using well-known simulation techniques like Simulink and standalone C++ Monte-Carlo methods. Students will then be introduced to the foundations of optimization and optimal control theory for both continuous- and discrete- time systems. Closed-form solutions and numerical techniques like co-location methods will be explored so that students have a firm grasp of how to formulate and solve deterministic optimal control problems of varying complexity. Discrete-time systems and dynamic programming methods will be used to introduce the students to the challenges of stochastic optimal control and the curse-of-dimensionality. Supervised learning and maximum likelihood estimation techniques will be used to introduce students to the basic principles of machine learning, neural-networks, and back-propagation training methods. The class will conclude with an introduction of the concept of approximation methods for stochastic optimal control, like neural dynamic programming, and concluding with a rigorous introduction to the field of reinforcement learning and Deep-Q learning techniques used to develop intelligent agents like DeepMind's Alpha Go.

Instructor

Course Structure

The course materials are divided into modules which can be accessed by clicking Course Modules on the left menu. A module will have several sections including the overview, content, readings, discussions, and assignments. You are encouraged to preview all sections of the module before starting. Most modules run for a period of seven (7) days, exceptions are noted in the Course Outline. You should regularly check the Calendar and Announcements for assignment due dates.

Course Topics

Course Goals

The goal of this course is to introduce students to new methods in optimal and intelligent control theory and artificial intelligence and learn how to apply these techniques to solve complex deterministic and stochastic nonlinear system problems. Building from first principles in optimization and optimal control theory, students will learn how to develop closed-form and numerical solutions to complex linear and nonlinear optimization problems. Students will be able to leverage their knowledge of optimization to understand and apply techniques in machine learning to solve problems like maximum-likelihood-estimation. Students will integrate their knowledge of dynamic programming for discrete-time systems with their knowledge of machine learning and neural networks to develop the framework and principles behind advanced techniques like neural dynamic programming, Deep-Q learning and the general field of Reinforcement Learning.

Course Learning Outcomes (CLOs)

Textbooks

This course spans many different topics at the graduate level, as such, there is no single book that covers the material. The lecture material and handouts are self-contained for the first ten modules. Modules 11 – 14 will be based, in part, on Richard Sutton and Andrew Barto’s introductory book on Reinforcement Learning which is available free online as a PDF. The link to this book and suggestions for additional reference material (not required) are provided below:

Required Software

MATLAB

You will need access to a recent version of MATLAB with Simulink. The MATLAB Total Academic Headcount (TAH) license is now in effect. This license is provided at no cost to you. Send an email to software@jhu.edu to request your license file/code. Please indicate that you need a standalone file/code. You will need to provide your first and last name, as well as your Hopkins email address. You will receive an email from Mathworks with instructions to create a Mathworks account. The MATLAB software will be available for download from the Mathworks site.

C++ Compiler and IDE

You will need access to a C++ compiler and associated IDE. For those who do not have one, MINGW-W64 (mingw-w64.org) is an excellent free C++ compiler. Additionally, the Eclipse IDE (Oxygen for C++) is an excellent free IDE.

Student Coursework Requirements

It is expected that each module will take approximately 7–10 hours per week to complete. Here is an approximate breakdown: listening to the audio annotated slide presentations (approximately 2 hours per week), reading associated reference material (approximately 1-2 hours as desired) and completing assignments (approximately 3 - 4 hours per week). Additionally, 3 projects will be assigned that will be completed over the course of weeks (1 - 2 hours per week). There will be no exams.

You are responsible for carefully reviewing all assigned material and being prepared for class discussions. The majority of readings are from the course text. Additional reading may be assigned to supplement text readings.

This course will consist of the following basic student requirements:

Discussions (5% of Final Grade Calculation)

Post your initial response to the discussion questions by the evening of day 3 for the first question and day 5 for second question for that module week. Posting a response to the discussion question is part one of your grade for module discussions (i.e., Timeliness).

Part two of your grade for module discussion is your interaction (i.e., responding to classmate postings with thoughtful responses) with at least two classmates (i.e., Critical Thinking). You are expected to respond by 5 and day 7 respectively for questions 1 and 2. Just posting your response to a discussion question is not sufficient; we want you to interact with your classmates. Be detailed in your postings and in your responses to your classmates' postings. Feel free to agree or disagree with your classmates. Please ensure that your postings are civil and constructive.

I will monitor module discussions and will respond to some of the discussions as discussions are posted. In some instances, I will summarize the overall discussions and post the summary for the module.

Evaluation of preparation and participation is based on contribution to discussions. Preparation and participation is evaluated by the following grading elements:

  1. Participation and Timeliness (50%) 
  2. Critical Thinking (50%)

Each discussion is worth .2% of the final grade (25 discussions x .2% = 5% of final grade). Grading for the discussion will following the following rubric:

Assignments (60% of Final Grade Calculation)

There will be 12 Problem Sets assigned over the semester (Modules 1 - 12). Submissions should include a cover sheet with your name and assignment identifier. Also include your name and a page number indicator (i.e., page x of y) on each page of your submissions. Each problem should have the problem statement, assumptions, computations, and conclusions/discussion delineated. All Figures and Tables should be captioned and labeled appropriately. For assignments involving code, you should submit a copy of the code file that can be executed for analysis of correctness. All assignment submissions should be zipped into a single archive file and submitted. You file name should be Module#_Name.zip

All assignments are due according to the dates in the Calendar. Assignments are not considered late so long as submitted by the following Saturday by 6am. Late submissions will be accepted (for a maximum grade of 90%) up to 1 week late (no exceptions without prior coordination with the instructors) as solutions are auto-released 1 week after due date.

Problem sets are graded based on technical accuracy. Each problem's value is indicated in the assignment. Minor mistakes typically result in 1-2 points off while major technical mistakes in understanding or application will result in a more significant reduction in score. Partial credit is provided assuming a strong attempt at the solution is made.  

Course Projects (35% of Final Grade Calculation)

Three course projects will be assigned over the semester:

  1. Project 1 (5% of Final Grade) 
  2. Project 2 (10% of Final Grade)
  3. Project 3 (20% of Final Grade)

The course project is evaluated by the following grading elements:

  1. Technical professionalism and presentation (20%)
  2. Thoroughness and demonstrated critical thinking (40%)
  3. Technical accuracy and achievement of desired results (40%)

Course Project is graded as follows:

Grading Policy

Student assignments are due according to the dates identified in each Module. Late homework is accepted up to one week late (without prearrangement with instructor) as homework solutions and grades will be posted one week after the assignment’s due date. There will be a total of 12 homework assignments.

Your total score will be based on the final criteria weighting:

Item% of Grade
Preparation and Participation (Module Discussions)5
12 Assignments60
Project 15
Project 210
Project 320


At the end of the semester each Student's total score will be calculated. Final letter grades will be assigned based on the Grading System as specified in the Graduate Programs catalog, p10.

Score RangeLetter Grade
100-97= A+
96-93= A
92-90= A-
89-87= B+
86-83= B
82-80= B-
79-77= C+
76-73= C
72-70= C-
69-67= D+
66-63= D
<63= F

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.

This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.

For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, ep-disability-svcs@jhu.edu.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. 

For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. 
 
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.