SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Weird bioanalyzer results exo Sample Prep / Library Generation 3 03-27-2014 02:15 PM
Need help interpreting these weird Bioanalyzer results Sciurus Sample Prep / Library Generation 5 01-29-2014 08:04 AM
weird bioanalyzer results newendophytologist 454 Pyrosequencing 10 10-02-2012 11:42 AM
Cuffdiff, replicates, and weird results DrAlexander RNA Sequencing 16 05-22-2012 12:36 PM
very weird samtools pileup results -- help! csoong Bioinformatics 1 12-25-2010 03:20 AM

Reply
 
Thread Tools
Old 10-18-2014, 12:21 AM   #1
sulicon
Member
 
Location: Los Angeles

Join Date: Aug 2010
Posts: 41
Default Weird results at the 51st nucleotide

Hi all,

We just got some bisulfite sequencing data from a set of patients. After running FastQC on a few samples, we observed an extremely low quality score at position #51, for two samples.

Attached are the distribution of base quality scores, % of Ns, and the nucleotide composition at each position.

My question is, what is the possible reason for this? Should we trim all the nucleotides after 50?

Thanks in advance!
Attached Images
File Type: png per_base_quality.png (12.9 KB, 20 views)
File Type: png per_base_n_content.png (18.8 KB, 11 views)
File Type: png per_base_sequence_content.png (29.9 KB, 10 views)
sulicon is offline   Reply With Quote
Old 10-18-2014, 01:21 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

There was probably a bubble that floated through the flowcell. That's a pretty common cause of random dips in quality. I wouldn't bother trimming that off, the dip in quality is only affecting a few bases. Any of the common aligners should still be able to accurately align the reads. Just choose one that allows you to set a minimum phred score during methylation extraction (i.e., one that will ignore methylation calls at those crappy bases).
dpryan is offline   Reply With Quote
Old 10-18-2014, 09:51 PM   #3
sulicon
Member
 
Location: Los Angeles

Join Date: Aug 2010
Posts: 41
Default

Thank you Devon for the explanation and suggestion.
sulicon is offline   Reply With Quote
Old 10-19-2014, 04:44 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

Were these two samples on the same lane? Were there other samples besides these two in the lane that are not showing this problem?

You should at least request your sequence provider to re-run the sample(s) at no cost (unless there were other samples in that lane that do not have this problem).
GenoMax is offline   Reply With Quote
Old 10-20-2014, 12:17 PM   #5
sulicon
Member
 
Location: Los Angeles

Join Date: Aug 2010
Posts: 41
Default

We got the data from another group. There is no barcodes used, I assume each sample was sequenced on one entire lane. (right?)

We have checked about 10 samples, two samples mentioned above have the problem at exactly the same position (#51). The two samples are in the control group.

There are also other samples with problem at different position. Three of the case samples we checked have low quality base-calling at position #63 (see attached).

It seems that the position of abnormal base-calling differs in a run-by-run manner.

Quote:
Originally Posted by GenoMax View Post
Were these two samples on the same lane? Were there other samples besides these two in the lane that are not showing this problem?

You should at least request your sequence provider to re-run the sample(s) at no cost (unless there were other samples in that lane that do not have this problem).
Attached Images
File Type: png per_base_quality.png (12.4 KB, 7 views)
File Type: png per_base_n_content.png (17.3 KB, 3 views)
File Type: png per_base_gc_content.png (17.3 KB, 3 views)
File Type: png per_base_sequence_content.png (30.5 KB, 4 views)
sulicon is offline   Reply With Quote
Old 10-20-2014, 05:21 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

You are correct in that your samples must have been run in separate lanes, if there are no barcodes. It is possible that one or more lanes had a bubble flow though that affected the basecalls. You can look at the tile representation from FastQC to see if you can identify bad tiles across cycles.

In general Q-scores take a nose dive when the nucleotide diversity no longer exists (or is significantly reduced). It appears that majority of the data in the example above consists of N's beyond cycle #63 and would likely need to be trimmed before analysis.
GenoMax is offline   Reply With Quote
Old 10-24-2014, 10:38 AM   #7
sulicon
Member
 
Location: Los Angeles

Join Date: Aug 2010
Posts: 41
Default

Thank you GenoMax. I will trim the low quality bases as you suggested.
sulicon is offline   Reply With Quote
Reply

Tags
bisulfite sequencing, fastqc, pre-processing, quality score

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO