I am analysing Nextera paired-end sequencing data and have discovered that ~25% of the reads align in an RF orientation, but I'm not sure why? According to the biochemistry of the sample preparation there should only be FR orientated reads?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
There are tools out there that will incorrectly call read orientation if the insert size is shorter than read length. How, specifically, did you do preprocessing (such as adapter-trimming), mapping, and calculate the orientations?
Also, is this a Nextera fragment or LMP library? The LMP libraries can produce output in either orientation.Last edited by Brian Bushnell; 02-27-2015, 04:18 PM.
-
I trimmed the adapters using Trimmomatic in PE mode using the Nextera-PE adapters provided. I then mapped the trimmed reads using Bowtie2, with max insert size set to 2000. I then removed PCR duplicates using Picard's MarkDuplicates. I then wanted to look at the insert distribution size and that's when Picard's CollectInsertSizeMetrics reported that I had both RF and FR orientated reads. If I plot a graph of the RF insert sizes, most of them are the size of my reads (~125) but I do still get a lot which are significanly bigger (~500-900bp). It's a bit odd that 75% map in the correct FR orientation, and 25% map in the RF orientation. At first I thought this may be to do with the tagmentation process used by the transposomes. The transposomes are dimers made up of two monomers, each of which has a sinle primer attached. The monomers are made by mixing them in a solution with 50% forward primers and 50% reverse primers. The monomers then dimerise and are used for tagmentation. That means that the 25% of the transposome dimers have two forward primers, 25% have two reverse and 50% have both the forward and reverse. The sample is from an ATAC-seq experiment, it does not have a circularised fragment step, so I don't think it is a LMP library, but a Nextera fragment
Commands used:
# Trimming
java -jar Trimmoamtic.jar PE \
-threads 1 \
sample_1.fastq \
sample_2.fastq \
sample_1P.fastq \
sample_1U.fastq \
sample_2P.fastq \
sample_2U.fastq \
ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:1:true \
LEADING:20 \
TRAILING:20 \
SLIDINGWINDOW:4:15 \
MINLEN:15
# Mapping
bowtie2 -X 2000 \
-p 4 \
-x genome \
-1 sample_1P.fastq \
-2 sample_2P.fast \
-S sample.sam
# Remove PCR duplicates
java -Xmx2g -XX:ParallelGCThreads=4 -jar Picard.jar MarkDuplicates \
I=sample.sorted.bam \
O=sample.sorted.rmdup.bam \
M=sample.sorted.rmdup.pcrMetrics \
REMOVE_DUPLICATES=true \
ASSUME_SORTED=true \
VALIDATION_STRINGENCY=LENIENT
# Collect insert sizes
java -jar Picard.jar CollectInsertSizeMetrics \
I=sample.sorted.rmdup.bam \
O=sample_insertsize.metrics \
H=sample_insertsize.pdf \
VALIDATION_STRINGENCY=LENIENT
Comment
-
wrong RF orientated reads in atac-seq bam file
Originally posted by jmacrm91 View PostI am analysing Nextera paired-end sequencing data and have discovered that ~25% of the reads align in an RF orientation, but I'm not sure why? According to the biochemistry of the sample preparation there should only be FR orientated reads?
Karen
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 11:49 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
Yesterday, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment