When I trim sequences for RNASeq, can I trim by sliding window on quality score or will the fact that the Illumina reads are now of different lengths affect the FPKM values (using cufflinks)? Should I just do a 3' and 5' trim of a specific length and leave all lengths equal?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Not sure if there are any papers on this yet, but I don't think so.
I tried the window based trimmer ea-utils on my RNAseq datasets and realigned, getting extremely similar results to my untrimmed datasets.
I should mention this is on very short read data (35-40bp).
For longer reads with low quality ends this is of course more of an issue.
-
The point of trimming low-quality base-calls is to remove bases which might be erroneous. A naive aligner would be deterred from a valid mapping by mismatches that are only due to sequencing errors. A quality-aware aligner, however, knows that a mismatch of a low-quality base is no reason to reject an otherwise good mapping and will report it. Many of the currently popular aligners work this way, i.e., they pay less attention to the low-quality ends of reads. By trimming, you basically take away from the aligner the chance to make use of this feature. However, trimming makes a hard-cut decision (everything below a quality threshold is removed) while a well-designed quality-aware aligner may make a more sophisticated, gradual, decision.
Comment
-
Thanks for the answer- do you know which aligners work in this way? (I am working with bowtie and cufflinks)
The reason we were thinking of trimming is because we are working with mixed communities (co-cultures) so we may have two bacteria in the same Illumina. We then align all the data to each of two genomes, allowing zero mismatches.
Given this, would your answer still be not to trim?
And just for my general knowledge- do you know if different length reads if we did trim would affect the final FPKM calculations?
Thanks!
Comment
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
32 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment