605.646.01 - Natural Language Processing

Computer Science
Fall 2023

Description

This course surveys the principal difficulties of working with written language data, the fundamental techniques that are used in processing natural language, and the core applications of NLP technology. Topics covered in the course include language modeling, text classification, labeling sequential data (tagging), parsing, information extraction, question answering, machine translation, and semantics. The dominant paradigm in contemporary NLP uses supervised machine learning to train models based on either probability theory or deep neural networks. Both formalisms will be covered. A practical approach is emphasized in the course, and students will write programs and use open source toolkits to solve a variety of problems. Course prerequisite(s): There are no formal prerequisite courses, although having taken any of EN.605.649 Introduction to Machine Learning, EN.605.744 Information Retrieval, or EN.605.645 Artificial Intelligence is helpful. Course note(s): A working knowledge of Python is assumed. While some of the assigned exercises can be done in any programming language, we will sometimes provide example code in Python, and many of the labs are best solved in Python.Course note(s): A working knowledge of Python is assumed. While some of the assigned exercises can be done in any programming language, we will sometimes provide example code in Python, and many of the labs are best solved in Python.

Instructors

Default placeholder image. No profile image found for James Mayfield.

James Mayfield

Profile photo of Paul McNamee.

Paul McNamee

mcnamee@jhu.edu

Course Structure

The course materials are divided into modules which can be accessed in Canvas by clicking Modules on the left menu. A module will have several sections including the overview, content, readings, and assignments.  Lectures will be held weekly, and will usually focus on a specific topic in NLP.  There will be a quiz in most lectures covering material from the previous week.  Assignments will be due before lectures, and should be submitted in Canvas.

Course Topics

Note: these topics or their order may change as needed.

Course Goals

The course introduces a broad array of techniques for processing digital text. Lab assignments will give students hands-on experience using these methods and open-source software packages on realistic problems. Supervised machine learning is a cornerstone of contemporary NLP – students will use a variety of tools based on both statistical and neural approaches.

Course Learning Outcomes (CLOs)

Textbooks

Textbooks: 

We will mainly use readings from Jurafsky and Martin, Speech and Language Processing (3rd edition draft).  For Fall 2023 the book is not available in print, but the chapters we will read are available free online at: https://web.stanford.edu/~jurafsky/slp3/. There will also be supplemental readings and videos.  Note: there is a 2nd edition of the text in print, but it is now substantially dated, and would not be of much help in the course.

Student Coursework Requirements

Student Coursework Requirements: 

605.646 is a graduate computer science course, and completing the work for each week will likely take at least 12 hours on average, depending on the material and your background. An approximate breakdown of the main components is: (a) reading the assigned materials (1.5 to 2 hours per week); (b) attending lectures (2.5 hours per week); (c) assigned labs (4 to 7 hours, some labs more); (d) working on the class project (variable).

Grading Policy

Grading Policy: 

 Course grades are based on the following components:

  • (60%) Lab Assignments
  • (20%) Class Project
  • (20%) Quiz Scores
Course grades will be assigned using letter grades with plus/minus modifiers (see below). Submitting a project is required to attain a grade of A- or higher, but the project is optional if not aiming for a grade above B+. Students not submitting a project will have their grades based on their other course work, and will not be eligible for a grade above B+. A grade of A indicates achievement of consistent excellence and distinction throughout the course—that is, conspicuous excellence in all aspects of assigned work. A grade of B indicates work that meets all course requirements on a level appropriate for graduate academic work.

100-98 = A+

97-94 = A

93-90 = A−

89-87 = B+

86-83 = B

82-80 = B−

70-79 = C

< 70 = F

Course Policies

Course Policies: 

 

Submitting Individual Assignments

Your name, the course number and a title (e.g., "Lab #4") must be present on the first page of each submission. Work for the class, such as Lab Assignments (including source code) must be submitted in Canvas as a single PDF file. However, on some labs we may also ask for separate file submissions to provide test results. We generally do not directly grade spelling and grammar. However, violations of the rules of the English language may be noted without comment. Consistently poor spelling or grammar that detracts from the understandability of the submission may detract from your grade. A PDF file generated from a Jupyter notebook is a reasonable way to submit an assignment written in Python. However, if you do so, be sure to examine the output to ensure that it is readable and does not contain truncated lines.

Policy on Late Work

Lab assignments must be submitted in Canvas. A late assignment will be accepted up to one week late with an automatic 20% deduction. No assignment will be accepted more than one week late – the assignment will be given a grade of zero instead. Generally speaking, it is better to submit something slightly incomplete or imperfect on time than to submit it late. Remember, the lowest grade will be dropped when computing your lab assignment average. In extraordinary circumstances you should contact the instructors. Reasonable accommodation will be made for an extended hospitalization or other serious situations. However documentation is expected (​e.g., signed note on letterhead with printed contact information of the physician, ​etc...)​.

In some situations, withdrawing from the course (no permission needed) or taking an incomplete (permission required) are appropriate. You are encouraged to speak with the instructors and/or your academic advisor if you are considering pursuing either course of action.

Additional Comments on Academic Honesty

Discussions among students are an important part of learning and are key to success in a graduate course. It is permissible, and often even desirable for you to discuss the general nature of course content and assignments with your peers. However, the line between collaboration and cheating needs to be carefully delineated. You should not discuss or reveal solutions to assigned work with others, or share any unpublished source code. When you submit work with your name on it for evaluation it must represent an original, individual effort by you alone. 

This course requires you to write computer programs, and unless explicitly prohibited on an assignment, it is perfectly acceptable to make use of published examples and source code from the literature or public domain–but only if attribution is given​. You must provide a citation for source code or other material that you do not write yourself (e.g., URLs to websites, pointers to GitHub repos, Numerical Recipes in C, Stack Overflow, etc...). Contact the instructors if you have any questions about this policy.

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.

This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.

For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, ep-disability-svcs@jhu.edu.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. 

For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. 
 
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.