Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exome sequencing alignment

    Hi,

    I used bowtie to align exome sequencing (Illumina GA), and this is what I got:


    # reads processed: 37205349
    # reads with at least one reported alignment: 141065 (0.38%)
    # reads that failed to align: 37064284 (99.62%)
    Reported 141065 alignments to 1 output stream(s)

    I am wondering what went wrong.

    By the way, here is the .fastq file looks like,

    @SRR350953.3 MENDEL_0047_FC62MN8AAXX:1:1:1488:946 length=152TTTTTTTT
    NTCCCATTATCTCAAGCAGCCATATGTTTCTCATTCACTTGATACACTGTTTCTTTTCAACCCCCACATCCTCACCGTGCTCAA
    ACAAAGAAACAGGTGGTGAGGATGTGGGGGTTGAAAAGAAACAGTGTATCAAGTGAATGAGAAACATA########B@<7:EFE
    +SRR350953.3 MENDEL_0047_FC62MN8AAXX:1:1:1488:946 length=152EEB@@>>D
    ############################################################################FAD=FCCC
    CCFDD;E??@@FD?BD>F?FB=BFAEFDGGDGGDG@BB=5=/;?8B=DDDGDGBDGDACAE@B@G88;CTCCTTCCAGGACCCA


    Thanks!

  • #2
    Exome sequencing alignment

    Can anybody help?

    Thanks!

    Comment


    • #3
      "#" is a Q-score of 2 (>60% error). There's your problem.

      Actually, after that the error rate improves with still plenty of sequence left in the read. You can try trimming your reads using fastq-trimmer and re-aligning.
      Last edited by TonyBrooks; 12-13-2012, 08:22 AM.

      Comment


      • #4
        Thanks! I will give it a try.

        Comment


        • #5
          fastx_trimmer gave me an error (see below), is there anyway to make it work?

          Thanks!

          fastx_trimmer: Invalid quality score value (char '#' ord 35 quality value -29) on line 4.

          @SRR350953.1 MENDEL_0047_FC62MN8AAXX:1:1:1206:930 length=152
          NTGATTTAGCTGCATAGTTTTCTTCTTTTTAATCCATAATGTATACATTTTAGACTTTGTATTTTAACTGCTGACATTCC
          AGTCTAAGTCGGAAGCCACATCTTCTAAACCAAATGTCTCTTCATCCCTTATGTCAGGAACCTATTTTTTTT
          +SRR350953.1 MENDEL_0047_FC62MN8AAXX:1:1:1206:930 length=152
          ############################################################################B@<7
          :EFEEBF?8B?3=;@9GGGG?;:C7CBABA=DG><GGB>DGE>3<EADGEC=DDB8GGD3<CE-EEB@@>>D

          Comment


          • #6
            Are you sure your data isn't paired end? When I've got that large reads from Illumina always are paired end.
            If it is pair, you must separate your file into two files before align.

            Comment


            • #7
              Many thanks!

              I am now trying to use "grep" to separate the original file into two.

              grep -A 1 "\.1 " originalfile.fastq > newfile_1.fastq
              grep -A 1 "\.2 " originalfile.fastq > newfile_2.fastq

              Comment


              • #8
                I separate them from the original .sra file with:
                fastq-dump --split-3 originalFile.sra

                Comment


                • #9
                  Thanks a lot!

                  I will try it.

                  Comment


                  • #10
                    Originally posted by Jackken View Post
                    fastx_trimmer gave me an error (see below), is there anyway to make it work?

                    Thanks!

                    fastx_trimmer: Invalid quality score value (char '#' ord 35 quality value -29) on line 4.

                    @SRR350953.1 MENDEL_0047_FC62MN8AAXX:1:1:1206:930 length=152
                    NTGATTTAGCTGCATAGTTTTCTTCTTTTTAATCCATAATGTATACATTTTAGACTTTGTATTTTAACTGCTGACATTCC
                    AGTCTAAGTCGGAAGCCACATCTTCTAAACCAAATGTCTCTTCATCCCTTATGTCAGGAACCTATTTTTTTT
                    +SRR350953.1 MENDEL_0047_FC62MN8AAXX:1:1:1206:930 length=152
                    ############################################################################B@<7
                    :EFEEBF?8B?3=;@9GGGG?;:C7CBABA=DG><GGB>DGE>3<EADGEC=DDB8GGD3<CE-EEB@@>>D
                    The FASTX toolkit still assumes by default that all FASTQ files use the original Solexa Phred+64 encoding for their quality scores. Your file uses the (now standard) Phred+33 encoding. You have to explicitly tell fastx_trimmer that your file is Phred+33 by adding the parameter "-Q33" to your command line.

                    Comment


                    • #11
                      Thanks, kmcarr.

                      I think I didn't realize that it's paired end. So quique_vzquez is right. And I am separating the original .fastq file into two. I think it's working now.

                      quique_vzquez, thanks a lot!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      25 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      29 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      25 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X