605.652.81 - Biological Databases and Database Tools

Computer Science
Fall 2023

Description

The sequencing of thousands of genomes, including those related to disease states, interest in proteomics, epigenetics, and variation have resulted in an explosive growth in the number of biological databases, as well as the need to develop new databases to handle the diverse new content being generated. The course focuses on the design of biological databases and examines issues such as those related to data modeling, heterogeneity, interoperability, evidence, and tool integration. It also surveys a wide range of biological databases and their access tools and enables students to develop proficiency in their use. Databases introduced include genome and sequence databases such as GenBank and Ensembl, as well as protein databases such as PDB and UniProt. Databases related to RNA, sequence variation, pathways and interactions, metagenomics, and epigenomics are also presented. Tools for accessing and manipulating data from databases such as BLAST, genome browsers, multiple sequence alignment, gene finding, and protein tools are reviewed. The programming language Perl is introduced, along with the use of Perl in obtaining data via web services and in storing data in an SQLite database. Students will use Perl (Python may be used for some assignments), biological databases, and database tools to complete homework assignments. They will also design a database and will write code in the language of their choice to create their own database as a course project.

Expanded Course Description

The course focuses on using online resources’ RESTful APIs (web services) to programmatically obtain and parse information and store it in a database. It also focuses on database issues, data models, data integration, and designing a database.

There will be examples showing the use of an AI program (ChatGPT) if you are interested in working with such a tool. But -- See below under Course Policies for rules for "Using an AI program in this course". These rules must be followed if you choose to use such a program or else the assignment(s) will be given 0 points and is considered academic misconduct. 

Instructor

Default placeholder image. No profile image found for Elizabeth Hobbs.

Elizabeth Hobbs

ehobbs1@jhu.edu

Course Structure

The course content is divided into modules. Course Modules can be accessed by clicking Course Content on the left menu. A module will have several sections including the overview, content, readings, discussions, and assignments. Students are encouraged to preview all sections of the module before starting. Each module runs for a period of seven (7) days – Monday through Sunday, except the last module.

There will be four homework assignments (due at the end of modules 2, 5, 8, and 10) and the course project (due on the last day of Module 13). In addition, discussion posts are due in modules 1, 2, 3, 4, 6, 7, 9, 12, and 14. In Module 13 as a discussion post, students will also post a short presentation regarding their project and comment on other students’ projects (this is part of the course project). A project topic email is due at the end of module 7, and a short project proposal at the end of module 8. Students should use the Calendar to view assignment due dates and times. 

IMPORTANT: All assignments are due at 11 pm Eastern on the date due. If you are in another time zone, please contact the instructor if this time is problematic for you. The date/time used will be that given by Canvas upon assignment upload. See Course Policies below for information regarding lateness penalties.

Course Topics

There will be examples showing the use of an AI program (ChatGPT) if you are interested in working with such a tool. But -- See below under Course Policies for rules for "Using an AI program in this course". These rules must be followed if you choose to use such a program or else the assignment(s) will be given 0 points and considered academic misconduct.

Course Goals

To become familiar with the vast array of available biological databases/resources and tools and obtain data from/work with a selection of these, to be able to define the data model for a resource, to design a database that integrates diverse data from multiple sources, to understand various issues related to biological databases/resources, and to apply knowledge learned about data modeling, data integration, and database issues to develop databases for biological data.

Course Learning Outcomes (CLOs)

Textbooks

The Lecture Content PDFs for each module are the textbook for this cource.

Other Materials & Online Resources

Each module contains a Lecture Content PDF, considered to be the course "textbook". Students are expected to read these PDFs (and not simply only view course videos). The PDFs contain links to websites and additional information for students to visit and view.

Required Software

The use of Perl 5 is required for Homework 1 and part of Homework 3. Optionally, you may use Python 3 for Homework 2 and part of Homework 3. For your course project you may use any programming language you wish.

One highly recommended Perl for Windows is Strawberry Perl http://strawberryperl.com/. Unix-based systems will already have Perl available, but you must upgrade to Perl 5 if it is an older version. You are expected to have a machine you can install Perl 5 on or a machine that already has Perl 5.

See below under Course Policies for rules for Using an AI program in this course. These rules must be followed if you choose to use such a program or else the assignment(s) will be given 0 points and considered academic misconduct.

Student Coursework Requirements

Important: Before taking this course, students are expected to have taken, at the college level or above,  a molecular biology course, a course on databases, and at least an introductory programming class. They are also expected to be comfortable using the command line (Windows-based or Linux-based). The time estimates below are based on a student who is comfortable with molecular biology, relational databases, and programming. If this doesn't describe you, in order to have a successful outcome, you're strongly encouraged to gain that comfort before taking this class.

It is expected that each module will take approximately 15-20 hours per week to complete. Here is an approximate breakdown: reading the assigned sections of the texts (approximately 2 hours per week) as well as some outside reading, listening to the audio annotated slide presentations and on un-graded exercises (approximately 1 hour per week), and discussion questions (approximately 2 hours per week), working on homework assignments or the course project (10-15 hours per week, if you are already comfortable with programming).

This course will consist of four basic student requirements (please see Course Policies below for the late policy for all requirements):

  1. Preparation and Participation (Module Discussions) (20% of Final Grade Calculation)

You are responsible for carefully reading all assigned material and being prepared for discussion. The majority of readings are from the course text. Additional reading may be assigned to supplement text readings.

Post your initial response to the discussion questions by the night of day 4 for that module week. Posting a response to the discussion question is part one of your grade for module discussions (i.e., Timeliness).

Part two of your grade for module discussion is your interaction (i.e., responding to classmate postings with thoughtful responses) with at least two classmates (i.e., Critical Thinking). Be detailed in your postings and in your responses to your classmates' postings. Feel free to agree or disagree with your classmates. Please ensure that your postings are civil and constructive. Reponses are all due by the night of day 7.

I will monitor module discussions and will respond to some of the discussions as discussions are posted. In some instances, I may summarize the overall discussions and post the summary for the module.

Evaluation of preparation and participation is based on contribution to discussions.

Preparation and participation is evaluated by the following grading elements:

    1. Timeliness (20%)
    2. Critical Thinking (80%) (this includes the analysis if you are using an AI program)

Preparation and participation is graded as follows:

100–90% = A- to A+ —Timeliness [regularly participates; all required postings; early in discussion; throughout the discussion]; Critical Thinking [rich in content; full of thoughts, insight, and analysis].

89–80% = B- to B+—Timeliness [frequently participates; all required postings; some not in time for others to read and respond]; Critical Thinking [substantial information; thought, insight, and analysis has taken place].

79–70% = C- to C+—Timeliness [infrequently participates; all required postings; most at the last minute without allowing for response time]; Critical Thinking [generally competent; information is thin and commonplace].

<70% = F—Timeliness [rarely participates; some, or all required postings missing]; Critical Thinking [rudimentary and superficial; no analysis or insight is displayed].


  1. Homework Assignments (40% of Final Grade Calculation)

Homework assignments will include a mix of qualitative assignments (for example, design) and programming in Perl (or Python is optional for some homework assignment parts).

All assignments are due according to the dates in the Calendar.

Qualitative assignments are evaluated by the following grading elements. If an AI program is used, additionally the required information has also been supplied, including all interactions with the program and the student's analysis of the program's responses.

    1. All posts are completed (for discussion questions); for homework 4 and the final report all required information is supplied (20%)
    2. Writing quality and technical accuracy (10%) (Although writing won’t be graded per se, it is expected to meet accepted English and scholarship standards, and significant deviations from this will be penalized.)
    3. Rationale for answer is provided (30%) (would be part of student analysis section if using an AI program)
    4. Examples are included to illustrate rationale (25%) (would be part of student analysis section if using an AI program)
    5. Outside references are included (15%) (when not required, more points will be added to #3.)

Qualitative assignments are graded as follows. If an AI program is used, additionally the required information (prompts, responses, and analysis) has also been supplied.

100–90% = A- to A+— All parts of question are addressed; Writing Quality/ Rationale/ Examples/ Outside References [rich in content; full of thought, insight, and analysis].

89–80% = B- to B+— All parts of the question are addressed; Writing Quality/ Rationale/ Examples/ Outside References [substantial information; thought, insight, and analysis has taken place].

79–70% = C- to C+— Majority of parts of the question are addressed; Writing Quality/ Rationale/ Examples/ Outside References [generally competent; information is thin and commonplace].

<70% =F—Some parts of the question are addressed; Writing Quality/ Rationale/ Examples/ Outside References [rudimentary and superficial; no analysis or insight displayed].

Quantitative assignments are evaluated by the following grading elements:

    1. If you are not using an AI program:
      1. For each script in the assignment where an AI program was not used, the script performs the specified requirements (60%)
      2. For each script in the assignment where an AI program was not used, if identified in the assignment, appropriate testing has occurred and screenshots given (20%)
      3. For each script in the assignment where an AI program was not used, the code is well structured, documented, and easy to follow (20%)
    2. If you use an AI program, the following:
      1. For each script in the assignment where an AI program was used: the code contains all the necessary steps to complete the task in a clear, easy to understand manner (10%)
      2. For each script in the assignment where an AI program was used: all required information for the interaction with the program was given (10%)
      3. For each script in the assignment where an AI program was used, if identified in the assignment, appropriate testing has occurred and screenshots given (20%)
      4. For each script in the assignment where an AI program was used, independent verification of the response (key points will be listed in the assignment) and an analysis of the response (60%)

Quantitative assignments are graded as follows. If an AI program is used, additionally the required information (prompts, responses, and analysis) has also been supplied.

100–90% = A- to A+—For each script in the assignment: The script performs all required functionality. Code is well written, documented, and clear. 

89–80% = B- to B+—For each script in the assignment: The script lacks a minor capability and/or code is not well structured/documented/clear in one place. Or, one script is lacking a moderate capability or is not well structured/documented/clear in one place.

79–70%== C- to C+—For each script in the assignment: The script lacks a moderate capability and/or code is not well structured/documented/clear in several places. Or, one script in the assignment is lacking a major capability or is not well structured/documented/clear.

<70%=F—More than one script in the assignment is lacking a major capability or is not well structured/documented/clear.

 

  1. Course (Individual) Project (40% of Final Grade Calculation)

A course project will be assigned several weeks into the course. The last week will be devoted to the course project.

The course project (400 points total) is evaluated by the following grading elements:

    1. Project topic, 1-3 sentences (5 points)
    2. Short project proposal (15 points)
    3. Project report (must follow the instructor-supplied template, don't neglect the related work and references sections) (200 points)
    4. Database prototype, including all code written to populate the database, search it, and obtain/display data from it. (150 points)
    5. Short presentation posted about the project and comments on others’ (30 points)

Course Project is graded as follows (which includes what also to do it you are using an AI program):

Further refinements for the database prototype:

100–90% = A- to A+—Student Preparation and Participation [work product(s) as agreed to, well prepared, complete, demonstrable]; Student Understanding [the project is rich in content; its implementation shows full thought, insight, and analysis].

89–80% = B- to B+—Student Preparation and Participation [work product(s) as agreed to and generally well prepared, although some may be less well developed]; Student Understanding [the project has good content; its implementation shows thought, insight, and analysis has taken place].

79–70% = C- to C+—Student Preparation and Participation [some individual/ team work product(s) partially completed]; Student Understanding [the project has content is thin and its implementation shows some, but not detailed or in depth thought, insight, and analysis has taken place].

<70% = F—Student Preparation and Participation [many individual work product(s) partially prepared or missing]; Student Understanding [project content is minimal and its implementation is rudimentary and superficial; no analysis or insight displayed].

See below under Course Policies for rules for Using an AI program in this course. These rules must be followed if you choose to use such a program or else the assignment(s) will be given 0 points and considered academic misconduct.


Grading Policy

Assignments are due according to the dates posted in your Canvas course site. You may check these due dates in the Course Calendar or the Assignments in the corresponding modules. 

I generally do not directly grade spelling and grammar. However, egregious violations of the rules of the English language will be noted without comment. Consistently poor performance in either spelling or grammar is taken as an indication of poor written communication ability and will detract from your grade.

A grade of A indicates achievement of consistent excellence and distinction throughout the course—that is, conspicuous excellence in all aspects of assignments and discussion in every week.

A grade of B indicates work that meets all course requirements on a level appropriate for graduate academic work. These criteria apply to both undergraduates and graduate students taking the course.

This course is based on a 1000 point total.

Final grades will be determined by the following weighting:

  Item  Percent of Grade
  Preparation and Participation (Module Discussions)  20 % (200 points)
  Assignments  40 % (400 points)
  Course Project  40 % (400 points)

See below for grading penalties for lateness for module discussion posts, assignments, and the course project.

See below under Course Policies for rules for Using an AI program in this course. These rules must be followed if you choose to use such a program or else the assignment(s) will be given 0 points and considered academic misconduct.

Course Policies

Grading Penalties:

These next rules apply to all assignments except the course project (for that, see below).

Late homework and discussion questions may be accepted according to the following guidelines.

Please don't hesitate to ask for help if you have questions that need answering to complete an assignment, although do ask questions in a timely fashion, at least 48 hours BEFORE the assignment is due.

 

Grading Penalties for the course project, it is due at 11pm on Sunday the last day of Module 13. Guidelines:

.

NOTE: Although students are allowed to discuss homework assignments and the project with others via the Canvas forums only, they are expected to turn in their own work. For the discussion questions, homework assignments, and project, you may use information found on the Internet or in papers or textbooks provided you cite (and quote if you are using the exact text from your source) your sources. You are expected to cite any sources used for the discussion posts, homeworks and course project (the project report includes a bibliography section). But note if you plan to use an AI program for non-coding assignments, you must follow the rules given below under "Using an AI program in this course ".

In your code, you are welcome to use any online or hardcopy resource you find (textbook, website), but include in comments in your code where you obtained your information. (Such comments are also good reference material for you.) But note if you plan to use an AI program, you must follow the rules given below under "Using an AI program in this course ".

Again: do not ask (in any manner) current or past students for any assignment. You may ask questions of each other using the course discussion board Questions only.

 

Incomplete. If you wish to take an incomplete, you must 1) have a documented medical emergency for you or a close family member, 2) email me no later than 8 am the FIRST day of Module 13 (Monday, November 27) with the reason and the documentation, and 3) you must have completed at least 70% of the course, which means a good portion of the course project must be completed for my review when you make the request. The request to take an incomplete is not automatically guaranteed. If the request is agreed to, the remaining work must be completed before the start of the spring semester. If work is not completed in a timely fashion, the grade will automatically convert to an F.


Using an AI program in this course

Many of you, if not all of you, have heard of ChatGPT (or other AI program), and some of you may have tried one or more of these.

In this course you WILL be allowed to use ChatGPT or other AI program IF YOU FOLLOW THE RULES BELOW (1-12). Other courses may not allow the use of AI programs, so be sure to check with your other instructors.

Note that you do not have to use an AI program if you don't want to.

Students are generally encouraged to use a variety of tools to assist with their learning; however, presenting the output of these tools as one's own work is plagiarism, and using these tools on assignments where they are forbidden is cheating.

Notice that last part: use of these tools (AI or any tool) and passing the responses off as your own work or using them where they are forbidden is academic misconduct -- this can result in dismissal from your program.

So -- don't do that.

Instead follow these rules in this class for all assignments (or parts of assignments) where I allow an AI program to be used:

  1. State which AI program you used.
  2. Give all the prompts that you presented to the AI program in such a way that I can tell what you asked it to do.
  3. Give all the responses that it returned to you, associated with each prompt.
  4. For coding assignments (Homeworks 1, 2, and 3, and the coding portion of your course project), show via a screen shot (or more than one if necessary) how you tested what the AI program gave you. Be thorough in your testing -- try inputs that should work, inputs that will give errors -- and think about edge cases (I'll say more about testing in Module 2 and the coding assignments will state expectations).
  5. For the coding assignments (Homeworks 1, 2, and 3, and the coding portion of your course project), include an analysis for each key point (given) in the assignment by
    1. finding an independent source/reference that will verify the AI program's code is correct and cite each requirement's reference. There will be an example in Module 2 to show you what to do. 
    2. provide your assessment or additional information (such as alternatives or other uses) about the key points 
    3. For the course project, I will assign you key points after we settle on what you'll be doing.
  6. For non coding assignments (Discussion questions, Homework 4, and the project report portion of the course project), write an assessment/analysis of what the response was and independently verify its advice, for the key points for the assignment (where they are given). Make sure it is clear that it is your assessment/analysis and not the output of the AI program. For example: What are the pros/cons of the response? Do you have a different answer -- if so, what would you do/say differently? Did the response contain any mistakes -- if so, what were they and what is the correct response? Write more than Yes/No answers here. Depending on the assignment (discussion post, Homework 4, course project report), I expect differing levels of analysis. A few sentences (or more) for a discussion post, and more for Homework 4 and your course project report. Verify the AI program's response by checking other references, and give those references. 
  7. It will be acceptable to write the code yourself first and then use it as a prompt and ask for fixes or improvements. All the prompts, including the initial code, and all responses need to be provided as with any other use of an AI program. Remember you'll still need to analyze what the AI program recommended and verify with outside citations and testing that the code is running properly.
  8. There will be some parts of assignments, including the course project coding and the report, or some discussion questions where I say to NOT use an AI program. Do not use one in this case  because this part will score 0 points. If you are unsure, ask.
  9. Do not upload or prompt any private, sensitive, proprietary or other information you should not share or are not permitted to share into any AI program. Current AI tools such as ChatGPT don't guarantee privacy or security.
  10. Do not prompt with any inappropriate inputs. Only prompt with inputs relevant for the subject matter of this course.
  11. Before you use an AI program for an assignment, send me an email that you've read this "Using an AI program in this course" and agree to follow these rules. Use the following in your email:
    1. I have read the information in Using an AI program in this course and I agree to follow the rules specified there.
  12. You'll receive a reply from me that I received your email. If you don't receive my reply, re-send your email. Failure to email me your agreement to follow these rules (and get my reply) will mean a 0 for any assignment turned in using an AI program.

Repeat: Failure to follow the above rules will result in 0 points being given for the assignment.

You'll see some examples in the course of how I interacted with ChatGPT to show you what I mean about giving prompts, responses, and assessments/analyses.

Also remember that AI programs should "assist with learning." Such a tool is NOT intended to be a SUBSTITUTE for learning. And you'll see that AI programs make mistakes. You'll still need to understand what you are trying to accomplish and what needs to happen to do so correctly in order to use an AI program effectively. The examples I have will show you mistakes or possible shortcomings.

To repeat: if you plan to use an AI program, you must send me the email acknowledging that you've read and understood this information "Using an AI program in this course ". Please feel free to ask me any questions about this topic.

One final comment: Companies are banning the use of ChatGPT and similar tools, for many reasons including lack of privacy and security. Be sure to understand and follow any rules adopted by your workplace. And this is another reason to consider an AI program as an aid to learning not a substitute for actual learning and understanding.

 

Academic Policies

Deadlines for Adding, Dropping and Withdrawing from Courses

Students may add a course up to one week after the start of the term for that particular course. Students may drop courses according to the drop deadlines outlined in the EP academic calendar (https://ep.jhu.edu/student-services/academic-calendar/). Between the 6th week of the class and prior to the final withdrawal deadline, a student may withdraw from a course with a W on their academic record. A record of the course will remain on the academic record with a W appearing in the grade column to indicate that the student registered and withdrew from the course.

Academic Misconduct Policy

All students are required to read, know, and comply with the Johns Hopkins University Krieger School of Arts and Sciences (KSAS) / Whiting School of Engineering (WSE) Procedures for Handling Allegations of Misconduct by Full-Time and Part-Time Graduate Students.

This policy prohibits academic misconduct, including but not limited to the following: cheating or facilitating cheating; plagiarism; reuse of assignments; unauthorized collaboration; alteration of graded assignments; and unfair competition. Course materials (old assignments, texts, or examinations, etc.) should not be shared unless authorized by the course instructor. Any questions related to this policy should be directed to EP’s academic integrity officer at ep-academic-integrity@jhu.edu.

Students with Disabilities - Accommodations and Accessibility

Johns Hopkins University values diversity and inclusion. We are committed to providing welcoming, equitable, and accessible educational experiences for all students. Students with disabilities (including those with psychological conditions, medical conditions and temporary disabilities) can request accommodations for this course by providing an Accommodation Letter issued by Student Disability Services (SDS). Please request accommodations for this course as early as possible to provide time for effective communication and arrangements.

For further information or to start the process of requesting accommodations, please contact Student Disability Services at Engineering for Professionals, ep-disability-svcs@jhu.edu.

Student Conduct Code

The fundamental purpose of the JHU regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. 

For a full description of the code please visit the following website: https://studentaffairs.jhu.edu/policies-guidelines/student-code/

Classroom Climate

JHU is committed to creating a classroom environment that values the diversity of experiences and perspectives that all students bring. Everyone has the right to be treated with dignity and respect. Fostering an inclusive climate is important. Research and experience show that students who interact with peers who are different from themselves learn new things and experience tangible educational outcomes. At no time in this learning process should someone be singled out or treated unequally on the basis of any seen or unseen part of their identity. 
 
If you have concerns in this course about harassment, discrimination, or any unequal treatment, or if you seek accommodations or resources, please reach out to the course instructor directly. Reporting will never impact your course grade. You may also share concerns with your program chair, the Assistant Dean for Diversity and Inclusion, or the Office of Institutional Equity. In handling reports, people will protect your privacy as much as possible, but faculty and staff are required to officially report information for some cases (e.g. sexual harassment).

Course Auditing

When a student enrolls in an EP course with “audit” status, the student must reach an understanding with the instructor as to what is required to earn the “audit.” If the student does not meet those expectations, the instructor must notify the EP Registration Team [EP-Registration@exchange.johnshopkins.edu] in order for the student to be retroactively dropped or withdrawn from the course (depending on when the "audit" was requested and in accordance with EP registration deadlines). All lecture content will remain accessible to auditing students, but access to all other course material is left to the discretion of the instructor.