Instructor Information
Paul McNamee
Dr. McNamee has been active in information retrieval research for over 20 years. He developed the HAIRCUT retrieval engine and has participated in over 25 international community evaluations of language technologies. Dr. McNamee is active in the ACM SIGIR and ACL communities, and he currently serves on the NIST TREC program committee. He maintains an information retrieval resource page at: http://pmcnamee.net/ir.html
Course Information
Course Description
A multibillion-dollar industry has grown to address the problem of finding information. Commercial search engines are based on information retrieval: the efficient storage, organization, and retrieval of text. This course covers both the theory and practice of text retrieval technology. Topics include automatic index construction, formal models of retrieval, Internet search, text classification, multilingual retrieval, question answering, and related topics in NLP and computational linguistics. A practical approach is emphasized and students will complete several programming projects to implement components of a retrieval engine. Students will also give a class presentation based on an independent project or a research topic from the IR literature. Prerequisite(s): EN.605.202 Data Structures or permission of the instructor
Course Goal
To gain a detailed understanding of automated indexing and retrieval and to survey various subfields of information retrieval and closely related topics in language processing.
Course Objectives
- Students will learn the underlying technology of search engines.
- Gain practical experience building simple, but true-to-practice retrieval software
- Appreciate topics in the broad area of information retrieval, including evaluation, classification, cross-language retrieval, and computational linguistics.
When This Course is Typically Offered
Offered at least once a year online, usually in the Fall semester.
Syllabus
- search engines
- retrieval models
- inverted files
- query processing
- evaluation
- term similarity; relevance feedback
- web retrieval
- text classification
- multilingual retrieval
- distributed retrieval
- retrieval of spoken archives
- linguistically-motivated IR
- information extraction
- question answering
Student Assessment Criteria
programming assignments (typically 5) | 30% |
problem sets | 20% |
exams | 15% |
scholarly engagement / student participation | 15% |
individual class projects | 20% |
The grading criteria listed above is typical, but may vary from term to term. The class project is optional; however, a completed project must be submitted to be eligible for a grade higher than "B+".
Computer and Technical Requirements
Any language can be supported (e.g., Python, Java, C++, Perl). Students with weak programming backgrounds will find the first few assignments challenging. Specific programming skills that are important include: dictionaries and trees; string manipulation, sorting; and use of binary file I/O. No graphical or GUI programming is required in the class. A good understanding of discrete math and data structures is assumed.
Participation Expectations
Work is invididual. Programming assignments are about every two weeks, but staggered towards the front. Problem sets are frequent (about 10), and are good preparation for the final exam. The class project lets students explore a specialty topic in detail - deliverables include a written report and an video/oral presentation.
Textbooks
Textbook information for this course is available online through the MBS Direct Virtual Bookstore.
Course Notes
There are notes for this course.
Final Words from the Instructor
The lecture notes and textbook are the primary source of instructional material. Additional readings and handouts supplement the assigned text. A course webpage is maintained at http://pmcnamee.net/ir.html.
Term Specific Course Website
(Last Modified: 08/25/2023 02:47:19 PM)