Instructor Information
Mike Weisman
Mike Weisman received a BE in electrical engineering from the Cooper Union for the Advancement of Science and Art and a PhD in applied mathematics from Harvard University. Dr. Weisman is a mathematician in the Computational and Information Sciences Directorate at the U.S. Army Research Laboratory. He has previously been a member of the senior technical staff at the Johns Hopkins University Applied Physics Laboratory and of the technical staff at MIT Lincoln Laboratory.
Current Courses:
625.601 - Real Analysis (ACM)
625.703 - Complex Analysis (ACM)
625.740 - Data Mining (ACM)
Previously Taught Course:
605.716 - Modeling and Simulation of Complex Systems (CS)
Computer Science Course Spotlight: Modeling and Simulation of Complex Systems!
Beila Leboeuf
Course Information
Course Description
The field of data science is emerging to make sense of the growing availability and exponential increase in size of typical data sets. Central to this unfolding field is the area of data mining, an interdisciplinary subject incorporating elements of statistics, machine learning, artificial intelligence, and data processing. In this course, we will explore methods for preprocessing, visualizing, and making sense of data, focusing not only on the methods but also on the mathematical foundations of many of the algorithms of statistics and machine learning. We will learn about approaches to classification, including traditional methods such as Bayes Decision Theory and more modern approaches such as Support Vector Machines and unsupervised learning techniques that encompass clustering algorithms applicable when labels of the training data are not provided or are unknown. We will introduce and use open-source statistics and data-mining software such as R. Students will have an opportunity to see how data mining algorithms work together by reviewing case studies and applying techniques learned in hands-on projects.
Prerequisites
Multivariate calculus, linear algebra, and matrix theory (e.g., EN.625.609 Matrix Theory), and a course in probability and statistics (such as EN.625.603 Statistical Methods and Data Analysis). This course will also assume familiarity with multiple linear regression and basic ability to program.
Course Goal
This course will introduce the student to data mining, containing many of the key algorithms and ideas of the recently emerging, still developing, and keenly popular new field known as data science. We will study and apply algorithms that are mathematically interesting, and often have been implemented in machine learning, machine vision, and pattern recognition.
Topics include: Supervised Learning, Bayes Decision Theory, Parametric and Non-Parametric Methods, Linear Discrimination, K-Means Clustering, Expectation-Maximization, and Support Vector Machines.
Course Objectives
- Upon successful completion of the course, students will be able to
- Implement a variety of algorithms to parse and evaluate data,
- Run data mining procedures available in existing software and write short computer programs to implement algorithms and process data sets.
- Students will be able to understand
- various procedures and methods for data classification,
- how competing algorithms differ, and how to set the various associated parameters,
- the underlying statistical and mathematical theories from which the algorithms are built.
When This Course is Typically Offered
Data Mining is typically offered each year in the fall.
Syllabus
- Bayesian decision theory
- Classification
- Supervised learning
- Unsupervised learning
- Clustering
- Principal components analysis
- Support vector machines
Student Assessment Criteria
Homework | 20% |
Class Project | 30% |
Exams | 50% |
Students will be expected to
- Understand the mathematical underpinnings of algorithms,
- Apply the algorithms to data.
Computer and Technical Requirements
Intermediate experience programming in a high level programming language is assumed.
Participation Expectations
The exams will test theory and will also have applied problems to be solved in 'take-home' form on a computer.
Exams and Homework should only reflect each student's own work.
Although the class project is due near the end of the semester, previous experience shows that beginning early makes for a much more managable and enjoyable experience!
Textbooks
Textbook information for this course is available online through the MBS Direct Virtual Bookstore.
Course Notes
There are notes for this course.
Final Words from the Instructor
Please feel free to email the instructors with any questions about the course.
Term Specific Course Website
https://piazza.com/jhu/fall2022/625740/home
(Last Modified: 06/23/2022 06:33:19 PM)