U.S. flag

An official website of the United States government

Monkeypox sequence submission using BankIt

GenBank also accepts partial Monkeypox genomes, including single gene or gene fragments as long as the total length is over 50 nucleotides. You may submit your assembled Monkeypox sequences using the web-based BankIt submission tool or the command-line table2asn program. This document will focus on submitting multiple sequences using BankIt. If you are only submitting a single sequence, you may still use BankIt but some of the options for applying information to all sequences will look slightly different.

Table of Contents

  1. Prepare your files
  2. Select submission tool
  3. BankIt forms
  4. What to expect

Prepare your files

GenBank submission tools (BankIt, table2asn and Submission Portal) accept the same fasta and source files in order to prepare your submission. This section will discuss the specific information encouraged for Monkeypox submissions.

FASTA sequence including BioProject link

BankIt allows the submission of multiple sequences at one time, rather than submitting each sequence individually. Submitting multiple sequences at once is highly encouraged to ease the submission process and ensure your sequences are processed efficiently.

  1. Put all of your Monkeypox sequences into a single FASTA file.
  2. Ensure that each sequence has a unique Sequence Identifier. This information is not retained in the final GenBank flatfile but is an identifier to link the sequence to the correct source information during submission. Sequence Identifiers should be less than 25 characters and can contain only the following characters - letters, digits, hyphens (-), underscores (_), periods (.), colons (:), asterisks (*), number signs (#), and slashes (/).
  3. If you have registered a BioProject, BioSample or submitted the raw reads via SRA, you can include links to that data in the FASTA definition line. While it is not required to include this information, it is highly encouraged to provide links between the reads and the assembled sequence. The BioSample registration also provides an opportunity to provide rich sample data to completely characterize your submission.

Example FASTA file for two sequences without BioProject/BioSample/SRA link

>MPXV_USA_2022_NY0013
TAATTAATTTAATTTTACTATTTTATTTAGTGTCTAGAAAAAAATGTGTGACCCACGACCGTAGGAAACT....
>MPXV_USA_2022_WA0002
TAAATAATTTAATTTTACTATTTTATTTAGTGTCTAGAAAAAAGTGTGTGACCCACGACCGTAGGAAACT....

Example FASTA file for two sequences with BioProject/BioSample/SRA link

>MPXV_USA_2022_NY0013 [BioProject=PRJNAxxxxx] [BioSample=SAMNxxxxx1] [SRA=SRRxxxxxx1]
TAATTAATTTAATTTTACTATTTTATTTAGTGTCTAGAAAAAAATGTGTGACCCACGACCGTAGGAAACT....
>MPXV_USA_2022_WA0002 [BioProject=PRJNAxxxxx] [BioSample=SAMNxxxxx2] [SRA=SRRxxxxxx2]
TAAATAATTTAATTTTACTATTTTATTTAGTGTCTAGAAAAAAGTGTGTGACCCACGACCGTAGGAAACT....

Source metadata

BankIt provides two methods for entering information about the source of your Monkeypox virus submission. You can enter the data in forms or by a tab-delimited table. We recommend using a table for submissions containing multiple sequences.

The following information is required for Monkeypox submissions:

  • unique isolate name
  • collection date including month and day if known; this is the date that the virus sample was collected in the field. Examples: 2022-01-30, Oct-2002.
  • geographic location name (geo_loc_name) where sample was collected. See INSDC geographic location name list for allowed names and format. Use the approved geographic location name first, followed by a colon and then additional information separated by commas, in larger to smaller order, i.e. geographic location name: state, city. Example: "USA: Maryland, Bethesda".
  • host organism; if the virus was not isolated from a host organism, enter "environment". If you would like to include additional information about the host, use the host name, followed by a semi-colon. Example: Homo sapiens; age 43

Optional information encouraged for Monkeypox submissions:

  • isolation-source; the physical environment where the virus was collected. Example: skin lesion
  • passage details can be included in a source note

The source table should contain a single row for each sequence in the FASTA file prepared above. Column headers must match the name of the modifier in the INSDC Feature Table. The table can be prepared in excel and saved as a tab-delimited file. Please remember to check the collection date of any files saved in excel.

Example source table for two sequences

Note spacing in table is for display purposes; the individual columns must be separated by a tab.

Sequence_ID isolate geo_loc_name collection_date host note isolation_source
MPXV_USA_2022_NY0013 MPXV_USA_2022_NY0013 USA: NY 2022-06 Homo sapiens passage details: Vero 2 skin
MPXV_USA_2022_WA0002 MPXV_USA_2022_WA002 USA: WA, Seattle 15-May-2022 Homo sapiens; age 56

Select submission tool

All submissions via BankIt or the Submission Portal require an NCBI submission account. If you do not yet have an NCBI Account, please follow the onscreen directions to set up your account.

  1. Go to the submit page
  2. If you are not logged into your NCBI Account, please do so using the Log in button in the upper right corner
  3. Select 'Sequence data not listed above', the last option in the list of sequence data types to be submitted and click on the Start button
  4. Click 'Start BankIt Submission' and you can begin your submission

BankIt forms

The BankIt submission program is a short series of forms designed to prompt the user for all necessary submission information. Each submission is given a temporary BankIt ID number which can be used to track and identify your submission before an accession number is provided. The program is designed to save intermediary steps in your submission so you can pause your submission and return to it later if need be. Preparing your input files before beginning your submission will greatly speed the submission process. Some validation will occur during submission. If a Warning appears at the top of the page, verify the information is correct as provided and click the Continue button at the bottom. If an Error appears, changes must be made to the provided information before proceeding, as outlined in the Error text.

Contact

Please fill in the contact information form with the information of the person who should be contacted if there are issues with your submission. The institution and address information will appear in your public GenBank record in order to attribute data to the proper institution. The phone and email will not appear in the file. Fax numbers are optional and not required information. The email address associated with your NCBI login is populated in the Email field automatically. Please provide an alternate email address to faciliate communication. Do not uncheck the User profile update selection. Leaving this checked ensures that this information will be retained for future submissions. You only need to fill out this form in your first submission; for subsequent submissions, the information is retained and you will be asked only to verify that it is still correct.

Press Continue at the bottom of the page.

Reference

The Reference form allows you to provide both sequence authors for data attribution and publication information. At least one name must be provided in the Sequence Authors field; you can use the Add button to add more authors. While a reference is not required, it is encouraged to facilitate the linking of your record to your current or future publication. If the sequence authors are the same as the reference authors, you can copy the list using the Same As Sequence Authors selection.

Press Continue at the bottom of the page.

Sequencing Technology

We encourage you to provide the sequencing technology and assembly program used to generate your sequence. Technologies not listed can be added by selecting Other. Select that the sequences are assembled sequences. All technologies other than Sanger require the Assembly Program and Version/Date of that program in order to continue. Assembly Name and Coverage are not required.

Press Continue at the bottom of the page.

Nucleotide

On the Nucleotide page, you will be asked about the release date for your submission. We encourage you to choose Immediately After Processing to support scientific discovery during this public health emergency.

In the Sequence and Definition Line section,

  • Select genomic DNA in the Molecule Type
  • Maintain Linear for Topology
  • Answer Yes or No to the question regarding the complete nature of your sequences
  • Select FASTA sequences (not alignment)
  • Choose File to upload the FASTA file prepared above
  • Press Continue at the bottom of the page.

Organism

Type Monkeypox virus in the Organism Name box. This will apply the correct organism name to all sequences in your submission.

Press Continue at the bottom of the page.

A preview table will appear for your review.

Press Continue in the middle of the page.

Set/Batch

We recommend that you choose the Batch submission option

Press Continue at the bottom of the page.

Submission Category

If you or your collaborating group are responsible for sequencing the data in your submission, choose Original.

Press Continue at the bottom of the page.

Source Modifiers

  1. Leave the Organelle/Location field blank.
  2. In Source Modifiers, choose Upload source modifiers Table File
  3. Click Choose File to upload the Source metadata file prepared above

Press Continue at the bottom of the page.

A preview table will appear for your review.

Press Continue in the middle of the page.

Features

We are not currently requiring the addition of features (gene and coding region) for Monkeypox sequences to expedite their release. If you have this information, it can be supplied using a 5-column feature table. When your sequence submission is received, GenBank staff will attempt to add gene and coding region features using VADR. If you would like to check your sequences before submission, you can run VADR yourself and check for issues.

If you are not supplying features, press Continue in the middle of the page without selecting any option in the radio buttons.

You will receive a Warning that features have not been provided. Within the blue warning box, select No

Review and Correct

This page provides a preview of your submission for your review. Press Finish Submission to complete your submission. A window will appear notifying you that your submission is completed and containing your temporary BankIt ID number.

What to expect

Upon completion of your submission, you will receive an email confirming receipt and listing the temporary BankIt ID number.

Your submission will then be reviewed by the GenBank staff. We will attempt to add gene and coding region features using VADR. If you would like to check your sequences before submission, you can run VADR yourself and check for issues. If any issues are detected with the sequence (ie misassembly, stop codons in essential genes, etc), you will be contacted via email and asked to review the sequences in the identified region and reply to staff. You should be contacted within one working day.

If there are no issues with your submission, you will receive an email notification listing your GenBank accession numbers and the data will be immediately released.

Please contact us if you have any questions at: gb-admin@ncbi.nlm.nih.gov

Support Center

Last updated: 2024-06-13T18:14:47Z