Introduction
|
Instrumentation
|
List of Services
|
Retention Policies
|
Sample Quality Control
|
Sample Requirements
|
Online Sample Submission
|
Results Files
|
Boilerplate Text for Methods Sections
|
Managing your Account
The CCHMC Genomics Sequencing Facility offers many different services enabling investigators to perform experiments with next generation sequencing (NGS) data.
- Find out exactly at which site(s) in the genome a specific transcription factor binds
- Obtain the complete profile of microRNAs expressed in a tissue from any organism
- Sequence an entire cDNA library to get gene expression levels from every isoform
- Access the complete variation profile of every coding region in the human genome
Instrument | Sequencing Workflow | Flow Cell Type | Number of Lanes per Flow Cell |
Available kit sizes (Total Cycles) |
Expected Output (M clusters) per Lane |
Expected Output (M clusters) Total |
---|---|---|---|---|---|---|
iSeq 100 | Reverse Complement | i1 | 1 | 300 | 4 | 4 |
MiSeq | Forward | MiSeq | 1 | 300, 500, 600 | 15 | 15 |
NovaSeq 6000 | Forward (v1.0 flow cells) Reverse Complement (v1.5) | SP (S prime) | 2 | 100, 300, 500 | 360 | 720 |
NovaSeq 6000 | S1 | 2 | 100, 200 | 720 | 1440 | |
NovaSeq 6000 | S2 | 2 | 100, 200, 300 | 1600 | 3200 | |
NovaSeq 6000 | S4 | 4 | 200, 300 | 2000 | 8000 | |
NovaSeq X Plus | Reverse Complement | 1.5B | 2 | 100, 200, 300 | 720 | 1440 |
NovaSeq X Plus | 10B | 8 | 100, 200, 300 | 1200 | 9600 | |
NovaSeq X Plus | 25B | 8 | 300 | 3000 | 24000 |
Starting from DNA or RNA we can prepare libraries for specific nextgen sequencing applications. Our libraries are accurately sized and quantified to yield an optimal cluster density for the platform being used.
- mRNA-seq
- polyA mRNA capture enrichment
- ribosomal RNA reduction enrichment
- microRNA-seq
- Whole Genome (WGS)
- Whole Exome (WES) - available for both human and mouse using capture libraries from Twist Bioscience
- Custom enrichment using specifically-designed capture libraries from Twist
- ChIP-Seq/Cut&Run
- Small Genome Sequencing (for organisms with genome sizes < 20 kb)
- RRBS (Reduced Representation Bisulfite Sequencing)
- Whole Genome Methylation Sequencing
- 16S Metagenomics Sequencing
Enrichment modes for Twist products are optimized for single-plex or 8-plex enrichments. Pricing levels are set for these two enrichment modes. Optimal per sample pricing occurs when submissions contain even multiples of 8 samples.
Sequencing of externally-made library pools
We regularly plan and execute sequencing runs for libary pools that are made outside of our laboratory. We perform an extensive library quality control procedure for pools that are given to us to be sequenced so that we can ensure that these library pools are sized and quantified to yield an optimal cluster density for the platform being used, just like the libraries that we create in our lab.
Note: For pools containing a relatively small number of libraries, special care must be taken to ensure base diversity during the sequencing of index reads.
Please refer to the Illumina Knowledge article on index color balancing for the NovaSeq X/XPlus for details.
Careful consideration of the lines When sequencing on a NovaSeq X/X Plus, combine index sequences so that signal is present in both channels for every cycle whenever possible. It is acceptable to have signal only in the green channel from the T or C bases if needed.
Illumina recommends avoiding index combinations which only have signal in the blue channel from A or A+G in any given cycle.
is important.
For each library, we will need to obtain the sequence of the 'indexes' or 'barcodes' that will uniquely identify that library in the pool. This information can usually be found in the documentation of the library preparation kit that you have used to create your NGS libraries. We have included some commonly used, commercially-available kits above. If you have used another kit to create your libraries, you will need to use the custom option and enter the bases manually.
Please understand that we require information for both index1 (i7) and index2 (i5) for each library, even if your kit is only a 'single index' kit. We will almost uniformly sequence in a dual-indexed fashion, and the demultiplexing program will require the sequence generated by the index2 primer, even if it is uniform for all of the libraries.
The i7 index is always read in the 'forward' orientation on the P7 end of the library molecule.
5'-[INSERT]AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG
If you are unsure of how to find the location of the i5 index location inside of a 'universal' adapter, we have provided a few examples below as a guide.
The position of the barcode region is highlighted(NNNNNNNN), and the bases necessary for the sample sheet will be what is found in the adapter.
Even though the sequencing will be performed in the reverse complement direction, your index2 (i5) sequence provided to the sequencer for demultiplexing
will be the 'forward' direction of the (usually) 8 nucleotides following the p5 flowcell-binding region and immediately preceding the read 1 primer region,
which is typically either TruSeq (ACACTCTTTCCCTAC
) or Nextera (TCGTCGGCAGCGTC
).
- Illumina TruSeq - P5
5'-AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT[INSERT]
- Illumina Nextera - P5
5'-AATGATACGGCGACCACCGAGATCTACACNNNNNNNNTCGTCGGCAGCGTC[INSERT]
We can enrich genomic libraries using the following commercial products:
- Twist Bioscience Comprehensive Human Exome, resequence all the coding regions in the human genome using bead-bound DNA capture probes.
- Twist Bioscience Mouse Exome, resequence all the coding regions in the mouse genome using bead bound DNA capture probes. Enrichment modes for Twist products are optimized for single-plex or 8-plex enrichments. Pricing levels are set for these two enrichments modes. Optimal per sample pricing occurs when submissions contain even multiples of 8 samples.
Original Sample vials containing excess total RNA for RNA-seq (mRNA) and miR-seq (micro RNA) and DNA for Whole Genome Sequencing and Exome sequencing will be retained by the CCHMC Genomics Sequencing Facility for no more than 90 days past the submission date. After this time, all sample vials (regardless of presence of leftover sample volume) will be discarded. Any investigator who wishes to reclaim leftover sample volume must indicate this in the online request by checking the "Keep Leftover" checkbox for each sample. There is a minimum $15 per sample charge for retention and return. Arrangements to retrieve the sample must be made once the initial library preparation is complete and notification has been given that the sample is available. Samples will be kept for a maximum of three years after the submission date.
Data Retention Policy
NGS data (FASTQ, BAM, VCF files, etc...) will be kept on our servers for no more than 90 days after data generation and made available for investigators to download. After that period has passed the raw data used to generate these files will be archived for 365 days. Because of the low reliability of affordable archival systems we cannot guarantee that we will be able to retrieve all archived files. NGS data files that can be regenerated from this raw archive will be made available to the original investigator at a charge. Due the massive amounts of data generated for NGS runs, files from runs older than 365 days will be purged from the archival system and will become permanently unavailable. Please ensure that all of the data needed for your projects is downloaded and stored as soon as it is available. Because of the way raw data is organized we cannot always send individual notifications when we purge data out of the archive.
Library Retention Policy
The excess volume for all individual NGS libraries and library pools made by the CCHMC Genomics Sequencing Facility and pools of submitted libraries remaining after injection on the sequencer will be kept for up to three months following the intial sequencing of the library. Retrieval (if possible) will need to be requested and scheduled with a member of our staff. Occasionally, the entire library volume of some low-yield libraries or library pools may be exhausted in the initial sequencing run, for these libraries and pools retrieval is impossible. At the end of the three month window, libraries and library pools will be discarded.
All samples for NGS library creation and sequencing will undergo a quality control process by the CCHMC Genomics Sequencing Facility. Quality Control procedures will depend upon the sample type and desired library type. Investigators may be notified by email that individual samples have failed QC, and be offered three options for their NGS submission.
- Process the sample anyway with all the materials available, there is no guarantee that the data generated will be usable. You will be charged the full amount of the service cost for the sample.
- Get the sample back and remove it from the rest of the samples in this submission. You will be charged a minimum of $17.60 for QC and return of the sample. Please note that if you want to resubmit a more concentrated sample you will have to make a new request online.
- Discard the sample and remove it from the rest of the samples in this submission. You will be charged a minimum of $2.60 for QC of the sample.
The amount of starting material (Genomic DNA for whole genomes, exomes or custom enrichments, immunoprecipitated dsDNA for ChIP-seq or total RNA for mRNA-seq and microRNA-seq libraries) for NGS library creation is variable depending upon the library preparation type requested. The following table contains the minimal yield and maximum volume of each input for the major library preparation types that we offer. For all sample types or library preparation processes, we customarily request at least 2x the minimum yield without increasing the maximum volume at the time of submission.
Service | Input Material | Minimum Yield (ng) | Recommended Volume (uL) | Diluent | Minimum Concentration (ng/uL) | |
---|---|---|---|---|---|---|
ChIP-Seq | IP dsDNA | 1* | 15 | 10 mM Tris-HCl | 0.0667 | |
Cut&Run | dsDNA | 1* | 15 | 10 mM Tris-HCl | 0.0667 | |
Large Genome (WGS) | High MW genomic dsDNA | 300 | 20 | 10 mM Tris-HCl | 15 | |
Small Genome | High MW genomic dsDNA | 20 | 10 | 10 mM Tris-HCl | 2.0 | |
Exome | High MW genomic dsDNA | 300 | 20 | 10 mM Tris-HCl | 15 | |
Custom Enrichment | High MW genomic dsDNA | varies | varies | 10 mM Tris-HCl | varies | |
DNA-Seq (Classic TruSeq-like library prep; end repair, A-tail, adapter ligation) | High MW genomic dsDNA | 50 | 20 | 10 mM Tris-HCl | 2.5 | |
DNA-Seq (Nextera) | High MW genomic dsDNA | 20 | 10 | 10 mM Tris-HCl | 2.0 | |
RRBS-Seq | High MW genomic dsDNA | 200 | 20 | 10 mM Tris-HCl | 10 | |
RNA-seq | polyA stranded RNA-seq | total RNA | 300 | 30 | RNAse-free Water | 10 |
RiboZero RNA-Seq | total RNA | 400 | 15 | RNAse-free Water | 27 | |
miR-seq | total RNA | 1200 | 10 | RNAse-free Water | 120 | |
Pacific BioSciences Services | PacBio HiFi Genomes | High MW genomic dsDNA | 5000 | 50 | 10 mM Tris-HCl | 100 |
PacBio Iso-Seq | total RNA | 300 | 5 | RNAse-free Water | 60 | |
Oxford Nanopore gDNA | High MW genomic dsDNA | 1500 | 50 | 10 mM Tris-HCl | 30 | |
Externally made library pools | iSeq 100 | Illumina-compatible library pool | ** | 30 | 10 mM Tris-HCl | 5 nM |
MiSeq | Illumina-compatible library pool | ** | 30 | 10 mM Tris-HCl | 5 nM | |
NovaSeq SP, S1, S2 | Illumina-compatible library pool | ** | 40 | 10 mM Tris-HCl | 5 nM | |
NovaSeq S4 | Illumina-compatible library pool | ** | 80 | 10 mM Tris-HCl | 5 nM | |
NovaSeqX 1.5B, 10B | Illumina-compatible library pool | ** | 30 | 10 mM Tris-HCl | 5 nM | |
NovaSeqX 25B | Illumina-compatible library pool | ** | 60 | 10 mM Tris-HCl | 5 nM |
* For ChIP-seq and Cut&Run samples a yield of 5 ng - 30 ng of post-IP/enriched DNA is recommended. Lower-yield input amounts may result in a successful library preparation, but the chance for a library preparation failure increases. Fragment sizes between 100 bp and 300 bp are optimal and should make up the bulk of the yield for the highest chance of library preparation success.
** Completed library pools should be at least 5 nM with a minimum volume of 30 uL (see table). The minimum concentration is of the pool, not the individual libraries in the pool
All samples submitted for NGS library preparation and/or NGS sequencing must be submitted in specific containers (tubes). These tubes are supplied free of charge by the facility and can be obtained from the barcoding/dropoff location on R1. These tubes have an integrated barcode embedded in the tube itself and should not be affixed with an additional barcode label. The adhesive on the barcode stickers does not hold up well at -80 degrees, and may detach from the tube.
The use of these tubes is especially important for RNA samples which should be stored at -80 degrees and minimize the number of freeze/thaw cycles. If you need more than a few of these tubes, please inquire inside the lab.
**Note: The exception to the use of the Wheaton 1.0 mL E-Z Microtube for sample submission for NGS services is the 16S Metagenomics Sequencing service. Samples for this service should be submitted in 96-well pcr microplates and be accompanied by a file which details the plate positions and sample names for each sample. These 96-well plates may be requested from the CCHMC Genomics Sequencing Facility as well.
Sample submission to the CCHMC DNA Sequencing and Genotyping Core is a four-step process once you have registered an account (CCHMC Network ID and password) with the facility. Each step has to be completed before you can go on to the next one.
Step 1: Sample preparation
Depending on the service you are requesting, there are specific instructions to follow for successful sample preparation. Please refer to the list above of services for details. In most cases you will prepare only one tube per sample. For best results use the tubes or plates provided at no charge at one of our drop-off locations for all your sample submissions.Step 2: Electronic form submission of sample information
- Use your CCHMC network ID and password to log in to the system from any page of the website. If you need assistance, please see the Managing your account section, below.
- Select the button in the MyDNACore section and choose the NEXTGEN SEQUENCING submission category.
- The prompts will guide you through the process. You will need to have the following information to fill the electronic form: name of submission, billing account to be used, names and number of samples that you are submitting. For samples that we will perform and alignment to a reference genome or transcriptome, we will need to know the organism, reference genome version and possibly the transcript annotation to be used. For pools of completed libraries, we will require the indexes (molecular barcodes) that have been used in the library preparation for the demultiplexing process.
- Make sure to hit the button at the end of the process to complete your submission.
- A copy of the submission detail (Similar to what will be seen on an Invoice) are mailed to the PI and the submitter.
Step 3: Delivery of samples to one of our locations
After making the online submission, bring the samples to our 24-hour drop-off locations:- CCHMC, Research Building (Location R), 1st floor, in front of room R1031
- Samples can also be sent by mail to the following address:
CCHMC Genomics Sequencing Facility
TCHRF R1045A
3333 Burnet Avenue
Cincinnati, OH 45229
Step 4: Sample registration and barcoding
This step does not apply to samples sent by mail. In this case, the barcoding is done upon sample receipt by the facility personnel.The registration process is the assignment of a barcode identifier to a specific tube, strip or plate containing samples submitted for sequencing. It is done by using the computers with a barcode scanner at each of our drop-off locations.
There is a shortcut on the desktop of these computers called BARCODE REGISTRATION which will take you to the online registration form.
- Use the same CCHMC network ID and password used to make the online submission to log in to the system.
- Select the submission name for which you are bringing the samples to register from the list of submissions.
- A complete display of the tubes, strips or plates used in your submission and the name of samples they contain will be shown on screen.
- Proceeding in the order they are displayed scan the embedded 2D barcode for each tube using the barcode scanner.
- Make sure that each tube, strip or plate in your submission has been assigned the correct barcode before pressing the button. An alert will confirm that barcodes have been saved.
- Once the all samples are barcoded and registered you can place them inside the -20 freezer on any rack available. RNA samples have a special container inside the freezer that contains dry ice.
Most Next Generation Sequencing Data generated in our facility may be downloaded directly through our sftp interface by following the instructions found on our DOWNLOADING RESULTS page on the menu section of the site. Sequencing data is kept here for a minimum of 90 days prior to being removed or archived. See the NGS Data Retention Policy above. The sftp accounts (username, password) have a default expiration of 30 days after creation.
NGS results file naming conventions
- FASTQ files: We utilize the standard/default Illumina naming convention for the FASTQ files that we generate:
where:[sample_name]_S#_L00#_[R/I]#_001.fastq.gz
- [sample_name] is the name of the sample/library as it is submitted to us in our online submission interface.
- S# is the number of the sample in the sequencing run. Each unique sample_name will be assigned an individual sample number for that run. If the same sample is present on a subsequent flow cell run, this sample may or may not have the same sample number assigned to it.
- L00# is the lane number of the flow cell.
- [R/I]# is the read number. Libraries that are sequenced with in a 'Single End' specification will not contain a Read 2 file. If fastq files are created for the index reads, these will be indicated by 'I1' for the i7 index or 'I2' for the i5 index.
- 001 The last segment is always 001.
- fastq.gz Each file has been compressed with the gzip compression algorithm to conserve space.
- Alignment files: We will generate and make available for download .bam and .bai files for any library that we create in our laboratory. These files will be named with the following naming conventions:
and[pipeline_name]~[sample_name]~[reference].bam
where:[pipeline_name]~[sample_name]~[reference].bai
- [pipeline_name] is the name assigned to the alignment/analysis pipeline that was performed for this library. Pipeline names should contain descriptive information about the sample types and processes performed in the pipeline. For example, our default RNA-Seq alignment pipeline is named "mRNA-PE" because it generates files specific to RNA-Seq experiments on paired-end fastq files.
- [sample_name] is the name of the sample/library as it is submitted to us in our online submission interface.
- [reference] is the name of the reference genome used for the alignment pipeline.
The CCHMC Genomics Sequencing Facility no longer concatenates fastq files across lanes of a flow cell, nor across different flow cells for samples that are sequenced on more than one run. We used to do this prior to the implemenation of our sftp file server to allow for fewer downloads of files. This is no longer necessary, and had drawbacks in terms of bioinformatics analysis of the files, so we have discontinued this practice. Investigators will receive one fastq file for each read (R1/R2 and maybe I1/I2) of each lane (L00#) of each flow cell that the library was sequenced on. Files from different sequencing runs will be organized into subdirectories named by the flow cell ID (barcode) that the pool was run on.
We have created some documents that contain text that can be used in a methods section of a publication for some of our more common NGS services. Feel free to use some or all of the information in these documents when publishing the results of your experiments.
The CCHMC Genomics Sequencing Facility does not offer standalone bioinformatics services. We work in close conjunction with the Bioinformatics Collaborative Services (BCS) group within IS4R to facilitate informatic analysis of NextGeneration Sequencing sample sets. Information about the BCS can be found here.
User accounts are now linked to your CCHMC network ID and password. For investigators outside of CCHMC, a request for a CCHMC network ID will need to be made by contacting help-cores@bmi.cchmc.org. All IDs must be provisioned for use with our interface by facility administrators. If you are having difficulty in registering, or have an existing account that needs to be inactivated, please contact us at dnacore@cchmc.org or by phone at (513) 636-2801. Currently, user accounts are associated to a single principal investigator (PI), and linked to that PIs available PeopleSoft budget numbers. If you work for more than one PI, please contact us.