Instructor Information

David Silberberg

Home Phone: 410-358-4164
Work Phone: 443-778-6231

Dr. David Silberberg is a Principal Professional Staff member at the Johns Hopkins University Applied Physics Laboratory (APL) and is the Research Director of the Johns Hopkins Institute for Assured Autonomy. Dr. Silberberg has conducted extensive research and development in the areas of leading-edge AI and machine learning algorithms, including graph analytics, distributed and large-scale architectures, intelligent access to distributed and heterogeneous database systems, and semantic graph query languages. Dr. Silberberg led the Large Scale Analytics group at the APL that applies machine learning and AI-base algorithms to perform descriptive, predictive, and prescriptive analytics on large and complex data. Dr. Silberberg also served as chief architect for the deep archive of NASA mission data and for the Hubble Space Telescope data archive. Dr. Silberberg received Master’s and Bachelor’s degrees in Computer Science from the Massachusetts Institute of Technology and a Ph.D. in Computer Science from the University of Maryland, College Park. He teaches a Computer Science graduate course on Large-Scale Database Systems at The Johns Hopkins University School of Engineering for Professionals.

Course Information

Course Description

This course investigates the theory and practice of modern large-scale database systems. Large-scale approaches include distributed relational databases; data warehouses; and non-relational databases including HDFS, Hadoop, Accumulo for query and graph algorithms, and Mahout bound to Spark for machine learning algorithms. Topics discussed include data design and architecture; database security, integrity, query processing, query optimization, transaction management, concurrency control, and fault tolerance; and query formulation, graph algorithms, and machine learning algorithms on large-scale distributed data systems. At the end of the course, students will understand the principles of several common large-scale data systems including their architectures, performance, and costs. Students will also gain a sense of which approach is recommended for different requirements and circumstances.

Prerequisites

EN.605.202 Data Structures; EN.605.641 Principles of Database Systems or equivalent. Familiarity with “big-O” concepts and notation is recommended.

Course Goal

The goal of this course is to teach distributed database management system theory and apply it to traditional distributed relational database systems, cloud databases (Hadoop, Accumulo), large-scale machine learning systems, and data warehouses.

Course Objectives

  • To provide an understanding of architecture and design tradeoffs of all aspects of distributed database management systems.
  • To apply heuristics to design high performing distributed database schemas, to create optimized distributed query execution plans, and to understand the underpinnings of transaction managment and fault tolerance.
  • To characterize algorithms that are optimally solved by MapReduce, to design and query large-scale databases, and to understand tradeoffs among distributed database, cloud databases, and data warehouses.
  • To create machine learning algorithms on large-scale data platforms. Machine learning algorithms include collaborative filtering, clustering, and classification.

When This Course is Typically Offered

This course is typically offered in the Spring and Summer semesters online.

Syllabus

  • Relational database management systems review
  • Computer networks and distributed database architectures
  • Horizontal and vertical partitioning
  • Semantic data control
  • Query processing
  • Query decomposition and data localization
  • Optimization of distributed queries
  • Transaction management
  • Distributed concurrency control
  • Distributed reliability protocols
  • Data warehouses
  • Cloud computing and MapReduce
  • Massive structured databases
  • Analytics for massive database systems

Student Assessment Criteria

Weekly quizzes and homeworks 80%
Participation in weekly online discussion groups 20%

All work must be completed on your own.

Computer and Technical Requirements

Background and/or experience in the Priciples of Database Systems and Java Programming are strongly recommended, but not required.

Textbooks

Textbook information for this course is available online through the MBS Direct Virtual Bookstore.

Course Notes

There are notes for this course.

Term Specific Course Website

http://blackboard.jhu.edu

(Last Modified: 05/26/2016 11:24:58 AM)