SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
strange mapping results bwa + SOLiD Hit SOLiD 11 05-09-2011 10:54 AM
results of mapping dmborek Illumina/Solexa 0 01-31-2011 09:24 PM
PubMed: Optical mapping of DNA: Single-molecule-based methods for mapping genomes. Newsbot! Literature Watch 0 01-06-2011 02:00 AM
bad library prep results dina Sample Prep / Library Generation 3 09-16-2010 05:00 AM
Tools to convert WTAP mapping results to nucleotide space? kchu SOLiD 0 09-14-2009 08:28 AM

Reply
 
Thread Tools
Old 05-12-2011, 05:32 AM   #1
xquan
Junior Member
 
Location: london

Join Date: Nov 2010
Posts: 7
Default Very Bad Mapping Results with several mapping softwares

I am new to NGS data analysis and trying to map my genome reads from Illumnia platform (paired-end reads, read length 100 and 150bp with insert length 350bp, and mate pair reads with read length 38bp and insert length 2kb and 5kb) to a reference genome (not the same specie, in the same family). I have tried to align these reads reference genome with BOWTIE, and BWA,).
No reads are mapped by BOWTIE at all with the following setting:
./bowtie REF --fr -I 320 -X 420 -q -1 GAIIx_150bp_1.fastq -2 GAIIx_150bp_2.fastq --fr -I 320 -X 420 -q -1 HiSeq_100bp_1.fastq -2 HiSeq_100bp_2.fastq --ff -I 1300 -X 3500 -q -1 HiSeq_s_1_1_QuaControled.fastq -2 HiSeq_s_1_2_QuaControled.fastq --ff -I 3300 -X 7600 -q -1 HiSeq_s_2_1_QuaControled.fastq -2 HiSeq_s_2_2_QuaControled.fastq -k 1 -m 1 -v 3 --al Palm_Hits_Bowtie --un Palm_NoHits_Bowtie Palm_Bowtie.sam -S --tryhard > bowtie.output


What's wrong with my setting?
I tried with BWA with one lane of 150bp reads as well using the default setting, I only got 0.26% percentage of reads being mapped with default setting, and 0.36% percentage of reads being mapped with the following command:
bwa aln -n 7 -o 3 REF.fsa GAIIx_s_1_1.fastq > GAIIx_s_1_1_bwa.sai
bwa aln -n 7 -o 3 REF.fsa GAIIx_s_1_2.fastq > GAIIx_s_1_2_bwa.sai
bwa sampe REF.fsa GAIIx_s_1_1_bwa.sai GAIIx_s_1_2_bwa.sai GAIIx_s_1_1.fastq GAIIx_s_1_2.fastq > GAIIx_s_1_bwa.sam.

All these reads have been under the following quality controls:
1. convert base quality phred(Q+64) score to phred(Q+33) score
2. Adapter trimming:
adapter sequence for read 1: GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG;
adapter sequence for read 2: ACACTCTTTCCCTACACGACGCTCTTCCGATCT;
sequence to trim from read 1: AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG
sequence to trim from read 2: AGAAAGGGATGTGCTGCGAGAAGGCTAGA minimum adapter alignment length is set as 10;
base quality score threshold set as 20;
minimum length of read after adapter sequence trimming and qualtiy score filtering is set as 120 for GAIIx 150bp reads, 70 for HiSeq 100bp reads, and 30 for HiSeq 38bp reads.

Is it possible that my quality control went wrong?

Any help will be greatly appreciated
xquan is offline   Reply With Quote
Old 05-12-2011, 06:07 AM   #2
xquan
Junior Member
 
Location: london

Join Date: Nov 2010
Posts: 7
Default

The query genome size (3Gb) is much larger than the reference genome (500mb). And reference genome is from de novo assembly contigs. But still mapped reads should not be so low.
xquan is offline   Reply With Quote
Old 05-19-2011, 12:22 AM   #3
blackjimmy
Junior Member
 
Location: shanghai

Join Date: Mar 2009
Posts: 4
Default

We've also met this question, except that our tag size was 36bp.
21M reads passed filtering, when aligned using bowtie, only about 10k reads mapped to the reference genome. We also found huge duplicate reads in our FASTQ file.
Does Illumina has officially quality control results to tell us whether our sequencing process is OK? Thanks a lot!
blackjimmy is offline   Reply With Quote
Old 05-19-2011, 03:23 AM   #4
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I highly recommend you doing your own QC with a program like FASTQC or FASTX and analyzing the quality metrics in each lane.
zee is offline   Reply With Quote
Old 05-19-2011, 07:15 AM   #5
xquan
Junior Member
 
Location: london

Join Date: Nov 2010
Posts: 7
Default

Quote:
Originally Posted by blackjimmy View Post
We've also met this question, except that our tag size was 36bp.
21M reads passed filtering, when aligned using bowtie, only about 10k reads mapped to the reference genome. We also found huge duplicate reads in our FASTQ file.
Does Illumina has officially quality control results to tell us whether our sequencing process is OK? Thanks a lot!
The only quality control by Illumina I know is the chastity filtering process. Too lanes of our data completely failed to pass the filtering. And I didn't use any of the reads failed to pass the chastity filtering process. Does anyone know other Illumina quality control?
xquan is offline   Reply With Quote
Old 05-19-2011, 07:17 AM   #6
xquan
Junior Member
 
Location: london

Join Date: Nov 2010
Posts: 7
Default

Quote:
Originally Posted by zee View Post
I highly recommend you doing your own QC with a program like FASTQC or FASTX and analyzing the quality metrics in each lane.
I have done QC check with FASTQC, and our reads after my QC from all lanes got good results except the k-mer analysis (which gave yellow warning sign).
xquan is offline   Reply With Quote
Old 05-19-2011, 07:33 AM   #7
glacerda
Member
 
Location: Brazil

Join Date: Aug 2008
Posts: 27
Default

The Illumina mate pair libraries used to be in reverse-forward orientation ( --rf parameter ), Unless something has changed in the mate pair protocol, this could be the cause of the bad mapping.
glacerda is offline   Reply With Quote
Old 05-19-2011, 10:01 AM   #8
xquan
Junior Member
 
Location: london

Join Date: Nov 2010
Posts: 7
Default

Quote:
Originally Posted by glacerda View Post
The Illumina mate pair libraries used to be in reverse-forward orientation ( --rf parameter ), Unless something has changed in the mate pair protocol, this could be the cause of the bad mapping.
Do you mean that I should use --rf instead of --fr for the pair-end reads? I thought mate pair should be forward-forward orientation and use --ff.
xquan is offline   Reply With Quote
Old 05-19-2011, 10:35 AM   #9
glacerda
Member
 
Location: Brazil

Join Date: Aug 2008
Posts: 27
Default

Hi xquan,

Illumina mate pair libraries are supposed contain outwards facing reads ( <-- --> ) and we should use --rf in bowtie. Illumina Mate Pair libraries are used to long insert lengths, greater than 2 Kbp usually.

Illumina paired end libraries are supposed to contain inwards facing reads ( --> <-- ) and we should use --fr in bowtie. Illumina Paired Ends are used to short insert lengths (at most 500 bp) usually.

As far as I can remeber, 454 and SOLiD use forward-forward ( --> --> )
glacerda is offline   Reply With Quote
Old 05-19-2011, 12:05 PM   #10
xquan
Junior Member
 
Location: london

Join Date: Nov 2010
Posts: 7
Default

Quote:
Originally Posted by glacerda View Post
Hi xquan,

Illumina mate pair libraries are supposed contain outwards facing reads ( <-- --> ) and we should use --rf in bowtie. Illumina Mate Pair libraries are used to long insert lengths, greater than 2 Kbp usually.

Illumina paired end libraries are supposed to contain inwards facing reads ( --> <-- ) and we should use --fr in bowtie. Illumina Paired Ends are used to short insert lengths (at most 500 bp) usually.

As far as I can remeber, 454 and SOLiD use forward-forward ( --> --> )

Thanks very much! I will confirm this with the sequencing company (who told me that their library preparation for mate pair is forward-forward) and try to run bowtie with --rf again.
xquan is offline   Reply With Quote
Old 05-22-2011, 06:28 AM   #11
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Hi,

In this case, I usually try aligning the data from one end to the reference as single-fragment to see what percentage of reads are mapped.

Douglas
www.contigexpress.com
DZhang is offline   Reply With Quote
Old 05-22-2011, 01:01 PM   #12
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Have you tried Blasting some of the reads? You will sometimes be surprised by what you find when doing this.
chadn737 is offline   Reply With Quote
Old 05-22-2011, 01:19 PM   #13
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Hi chadn737,

That's a great point. Oftentimes the simplest approach is the best one. In one project, I randomly chose 10 reads and BLASTed it. They all came back mapping to an rRNA gene. No other approach is faster than BLAST to find this out.

Douglas
www.contigexpress.com
DZhang is offline   Reply With Quote
Old 05-22-2011, 11:31 PM   #14
stoker
Member
 
Location: Poland

Join Date: Oct 2010
Posts: 17
Default

I have observed that Illumina instruments have different filters configurations. If your filters has been mounted incorrectly - in wrong positions (this is possible when you have a new device or you have a service repairs) then you may need to change bases in your reads. A to C, G to T and vice versa. We have met this problem in our lab.
__________________
Tomasz Stokowy
www.sequencing.io.gliwice.pl
stoker is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:41 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO