Thanks, Brian. This is where I am showing my ignorance I am sure, but how did the reads become so short? Looking at what I pulled out of the sam file, they are full-length (300bp) reads for the first few matches, but then become those little buggers are well.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
BWA-mem produces 'chimeric alignments'. This is actually a really neat feature in some cases, and a big pain in other cases - in my opinion, it should be disabled by default.
If you look at the sam lines you posted, most of them have a bitflag (the second column) of over 2048. That indicates they are chimeric. BWA-mem appears to do multiple local alignments on reads, such that if there is a really good match for the first 20% somewhere, that will be presented as a single line in the sam file, and if there is a really good match for the middle 40%, that will be displayed as a different line, etc. So a single read could generate a huge number of lines in the sam file. The goal is to correctly map reads that are chimeric (such as reads from a cancer sample with two chromosomes randomly fused together). But apparently, it does not work well in extreme-GC genomes; most mappers are designed for human and mouse genomes, which have approximately 50% GC, as they constitute the majority of genetic research. But since I work at a place that strictly deals with microbial, plant, and fungal genomes, BBMap (which was originally designed for human) is now developed for and tested on a much wider array of organisms than most.
BWA's chimeric alignments are local and hard-clipped. For example, this cigar string from the second line you posted - "221H79M" - means that the first 221 bases were ignored and only the last 79 bases are included in the alignment. Of course, this will wreak havoc with something like fastqc, where all reads are weighted equally regardless of length. Rather than a length filter (which will unnecessarily exclude reads that had been adapter- or quality-trimmed), I think you should simply use samtools to filter out reads with the chimeric flag marked.Last edited by Brian Bushnell; 08-02-2014, 12:00 PM.
Comment
-
Originally posted by Genomics101 View PostThanks very much, GenoMax. Indeed, it is MiSeq data, but I never had this problem with MiSeq before (that was with 250bp PE reads, these are 300s). Can you tell me more the particular pathology with MiSeq? Is this a problem with library construction? And, goodness, what is an adapter lawn?
Comment
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment