Dr. McNamee has been active in information retrieval research for over 20 years. He developed the HAIRCUT retrieval engine and has participated in over 25 international community evaluations of language technologies. Dr. McNamee is active in the ACM SIGIR and ACL communities, and he currently serves on the NIST TREC program committee. He maintains an information retrieval resource page at: http://pmcnamee.net/ir.html
A multibillion-dollar industry has grown to address the problem of finding information. Commercial search engines are based on information retrieval: the efficient storage, organization, and retrieval of text. This course covers both the theory and practice of text retrieval technology. Topics include automatic index construction, formal models of retrieval, Internet search, text classification, multilingual retrieval, question answering, and related topics in NLP and computational linguistics. A practical approach is emphasized and students will complete several programming projects to implement components of a retrieval engine. Students will also give a class presentation based on an independent project or a research topic from the IR literature.
605.202 Data Structures.
To gain a detailed understanding of automated indexing and retrieval and to survey various subfields of information retrieval and closely related topics in language processing.
- Students will learn the underlying technology of search engines.
- Gain practical experience building simple, but true-to-practice retrieval software
- Appreciate topics in the broad area of information retrieval, including evaluation, classification, cross-language retrieval, and computational linguistics.
When This Course is Typically Offered
Typically offered in-person during the spring semester in odd years at the APL campus. Starting in 2016, offered at least once a year online.
- search engines
- retrieval models
- inverted files
- query processing
- term similarity; relevance feedback
- web retrieval
- text classification
- multilingual retrieval
- distributed retrieval
- retrieval of spoken archives
- linguistically-motivated IR
- information extraction
- question answering
Student Assessment Criteria
|homework assignments (typically 6)||40%|
The grading criteria listed above is typical for in-person courses, but may vary from term to term. Homework assignments, particularly the first few, involve programming. In recent semesters I have made the class project optional; however, a project must be submitted to be eligible for a grade higher than "B+".
Computer and Technical Requirements
Java helpful, but any language can be supported (Python, Clojure, C++, Perl). Students with weak programming backgrounds will find the first few assignments challenging. Specific programming skills that are important include: dictionaries and trees; string manipulation, sorting; and use of binary file I/O. No graphical or GUI programming is required in the class. A good understanding of discrete math and data structures is assumed.
Work is invididual. Programming assignments are about every two weeks, but slightly staggered towards the front. The research project lets students explore a specialty topic in detail - deliverables include a written report and an oral presentation. Exams are held in class. Students should normally attend class. Participation is measured through occasional quizzes, in class discussion, and short oral presentations.
Textbook information for this course is available online through the MBS Direct Virtual Bookstore.
There are notes for this course.
Final Words from the Instructor
The lecture notes are the primary source of instructional material. Additional readings and handouts supplement the assigned text. A course webpage is maintained at http://pmcnamee.net/ir.html.
Term Specific Course Website
(Last Modified: 11/08/2016 04:32:11 PM)