Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very Bad Mapping Results with several mapping softwares

    I am new to NGS data analysis and trying to map my genome reads from Illumnia platform (paired-end reads, read length 100 and 150bp with insert length 350bp, and mate pair reads with read length 38bp and insert length 2kb and 5kb) to a reference genome (not the same specie, in the same family). I have tried to align these reads reference genome with BOWTIE, and BWA,).
    No reads are mapped by BOWTIE at all with the following setting:
    ./bowtie REF --fr -I 320 -X 420 -q -1 GAIIx_150bp_1.fastq -2 GAIIx_150bp_2.fastq --fr -I 320 -X 420 -q -1 HiSeq_100bp_1.fastq -2 HiSeq_100bp_2.fastq --ff -I 1300 -X 3500 -q -1 HiSeq_s_1_1_QuaControled.fastq -2 HiSeq_s_1_2_QuaControled.fastq --ff -I 3300 -X 7600 -q -1 HiSeq_s_2_1_QuaControled.fastq -2 HiSeq_s_2_2_QuaControled.fastq -k 1 -m 1 -v 3 --al Palm_Hits_Bowtie --un Palm_NoHits_Bowtie Palm_Bowtie.sam -S --tryhard > bowtie.output


    What's wrong with my setting?
    I tried with BWA with one lane of 150bp reads as well using the default setting, I only got 0.26% percentage of reads being mapped with default setting, and 0.36% percentage of reads being mapped with the following command:
    bwa aln -n 7 -o 3 REF.fsa GAIIx_s_1_1.fastq > GAIIx_s_1_1_bwa.sai
    bwa aln -n 7 -o 3 REF.fsa GAIIx_s_1_2.fastq > GAIIx_s_1_2_bwa.sai
    bwa sampe REF.fsa GAIIx_s_1_1_bwa.sai GAIIx_s_1_2_bwa.sai GAIIx_s_1_1.fastq GAIIx_s_1_2.fastq > GAIIx_s_1_bwa.sam.

    All these reads have been under the following quality controls:
    1. convert base quality phred(Q+64) score to phred(Q+33) score
    2. Adapter trimming:
    adapter sequence for read 1: GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG;
    adapter sequence for read 2: ACACTCTTTCCCTACACGACGCTCTTCCGATCT;
    sequence to trim from read 1: AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG
    sequence to trim from read 2: AGAAAGGGATGTGCTGCGAGAAGGCTAGA minimum adapter alignment length is set as 10;
    base quality score threshold set as 20;
    minimum length of read after adapter sequence trimming and qualtiy score filtering is set as 120 for GAIIx 150bp reads, 70 for HiSeq 100bp reads, and 30 for HiSeq 38bp reads.

    Is it possible that my quality control went wrong?

    Any help will be greatly appreciated

  • #2
    The query genome size (3Gb) is much larger than the reference genome (500mb). And reference genome is from de novo assembly contigs. But still mapped reads should not be so low.

    Comment


    • #3
      We've also met this question, except that our tag size was 36bp.
      21M reads passed filtering, when aligned using bowtie, only about 10k reads mapped to the reference genome. We also found huge duplicate reads in our FASTQ file.
      Does Illumina has officially quality control results to tell us whether our sequencing process is OK? Thanks a lot!

      Comment


      • #4
        I highly recommend you doing your own QC with a program like FASTQC or FASTX and analyzing the quality metrics in each lane.

        Comment


        • #5
          Originally posted by blackjimmy View Post
          We've also met this question, except that our tag size was 36bp.
          21M reads passed filtering, when aligned using bowtie, only about 10k reads mapped to the reference genome. We also found huge duplicate reads in our FASTQ file.
          Does Illumina has officially quality control results to tell us whether our sequencing process is OK? Thanks a lot!
          The only quality control by Illumina I know is the chastity filtering process. Too lanes of our data completely failed to pass the filtering. And I didn't use any of the reads failed to pass the chastity filtering process. Does anyone know other Illumina quality control?

          Comment


          • #6
            Originally posted by zee View Post
            I highly recommend you doing your own QC with a program like FASTQC or FASTX and analyzing the quality metrics in each lane.
            I have done QC check with FASTQC, and our reads after my QC from all lanes got good results except the k-mer analysis (which gave yellow warning sign).

            Comment


            • #7
              The Illumina mate pair libraries used to be in reverse-forward orientation ( --rf parameter ), Unless something has changed in the mate pair protocol, this could be the cause of the bad mapping.

              Comment


              • #8
                Originally posted by glacerda View Post
                The Illumina mate pair libraries used to be in reverse-forward orientation ( --rf parameter ), Unless something has changed in the mate pair protocol, this could be the cause of the bad mapping.
                Do you mean that I should use --rf instead of --fr for the pair-end reads? I thought mate pair should be forward-forward orientation and use --ff.

                Comment


                • #9
                  Hi xquan,

                  Illumina mate pair libraries are supposed contain outwards facing reads ( <-- --> ) and we should use --rf in bowtie. Illumina Mate Pair libraries are used to long insert lengths, greater than 2 Kbp usually.

                  Illumina paired end libraries are supposed to contain inwards facing reads ( --> <-- ) and we should use --fr in bowtie. Illumina Paired Ends are used to short insert lengths (at most 500 bp) usually.

                  As far as I can remeber, 454 and SOLiD use forward-forward ( --> --> )

                  Comment


                  • #10
                    Originally posted by glacerda View Post
                    Hi xquan,

                    Illumina mate pair libraries are supposed contain outwards facing reads ( <-- --> ) and we should use --rf in bowtie. Illumina Mate Pair libraries are used to long insert lengths, greater than 2 Kbp usually.

                    Illumina paired end libraries are supposed to contain inwards facing reads ( --> <-- ) and we should use --fr in bowtie. Illumina Paired Ends are used to short insert lengths (at most 500 bp) usually.

                    As far as I can remeber, 454 and SOLiD use forward-forward ( --> --> )

                    Thanks very much! I will confirm this with the sequencing company (who told me that their library preparation for mate pair is forward-forward) and try to run bowtie with --rf again.

                    Comment


                    • #11
                      Hi,

                      In this case, I usually try aligning the data from one end to the reference as single-fragment to see what percentage of reads are mapped.

                      Douglas

                      Comment


                      • #12
                        Have you tried Blasting some of the reads? You will sometimes be surprised by what you find when doing this.

                        Comment


                        • #13
                          Hi chadn737,

                          That's a great point. Oftentimes the simplest approach is the best one. In one project, I randomly chose 10 reads and BLASTed it. They all came back mapping to an rRNA gene. No other approach is faster than BLAST to find this out.

                          Douglas

                          Comment


                          • #14
                            I have observed that Illumina instruments have different filters configurations. If your filters has been mounted incorrectly - in wrong positions (this is possible when you have a new device or you have a service repairs) then you may need to change bases in your reads. A to C, G to T and vice versa. We have met this problem in our lab.
                            Tomasz Stokowy
                            www.sequencing.io.gliwice.pl

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            18 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            22 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            17 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X