Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • analysis after mapping

    Hi,I'm a beginner in RNA-Seq analysis,now I've mapped the short reads with Bowtie,TopHat and other softwares,I'm trying to analyze the mapping efficiency by counting the uniquely mapped reads,spanning junction reads and unmappable reads in different mapping results.But how can I get the information about which one is uniquely mapped,which one is mutiple reads from the SAM format file I get with the softwares?
    Thank you for your attention.

  • #2
    Originally posted by Huijuan View Post
    Hi,I'm a beginner in RNA-Seq analysis,now I've mapped the short reads with Bowtie,TopHat and other softwares,I'm trying to analyze the mapping efficiency by counting the uniquely mapped reads,spanning junction reads and unmappable reads in different mapping results.But how can I get the information about which one is uniquely mapped,which one is mutiple reads from the SAM format file I get with the softwares?
    Thank you for your attention.
    First you have to separate the Aligned and unaligned reads in the SAM file. For this you can use Picard:
    For Aligned reads :
    java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
    For Unaligned reads:
    java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

    You can then count the rows in each file that don't start with @ with:
    grep -cv "^@" your_Unaligned_reads.sam
    Should give you the number of unaligned reads.

    For the uniquely mapped reads: there should have this tag in the sam file: XT:A:U, so to count them:
    grep -c "XT:A:U" your_Aligned_reads.sam
    The reads that map to more than one position should be have "XT:A:R", so:
    grep -c "XT:A:R" your_Aligned_reads.sam

    Hope this helps.....

    Comment


    • #3
      Originally posted by upendra_35 View Post
      First you have to separate the Aligned and unaligned reads in the SAM file. For this you can use Picard:
      For Aligned reads :
      java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
      For Unaligned reads:
      java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

      You can then count the rows in each file that don't start with @ with:
      grep -cv "^@" your_Unaligned_reads.sam
      Should give you the number of unaligned reads.

      For the uniquely mapped reads: there should have this tag in the sam file: XT:A:U, so to count them:
      grep -c "XT:A:U" your_Aligned_reads.sam
      The reads that map to more than one position should be have "XT:A:R", so:
      grep -c "XT:A:R" your_Aligned_reads.sam

      Hope this helps.....
      Many thanks for helping.I did as what you said and got what I want.Thank you again
      Huijuan

      Comment


      • #4
        Here's the problem
        I use the way you told me with the mapping results with BWA. And the total number of unaligned reads, uniquely mapped reads and multireads is not equal to the total number of short reads and far less than it. What's the problem ?
        Thank you again
        Huijuan

        Comment


        • #5
          Originally posted by Huijuan View Post
          Here's the problem
          I use the way you told me with the mapping results with BWA. And the total number of unaligned reads, uniquely mapped reads and multireads is not equal to the total number of short reads and far less than it. What's the problem ?
          Thank you again
          Huijuan
          Do you get any error when you run the way i mentioned before? If yes then try running the modified way which is below. It should fix the problem. Good luck

          #To separate Aligned and unaligned reads I am using Picard now:
          #Aligned:
          java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
          #Unaligned:
          java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

          #You can then count the rows in each file that don't start with @ should give you the number of unaligned reads.:
          grep -cv "^@" your_Unaligned_reads.sam

          #For the uniquely mapped reads: they should have this tag in the sam file: XT:A:U, so to count them:
          grep -c "XT:A:U" your_Aligned_reads.sam
          #The reads that map to more than one position should be have "XT:A:R", so:
          grep -c "XT:A:R" your_Aligned_reads.sam

          Comment


          • #6
            Originally posted by upendra_35 View Post
            Do you get any error when you run the way i mentioned before? If yes then try running the modified way which is below. It should fix the problem. Good luck

            #To separate Aligned and unaligned reads I am using Picard now:
            #Aligned:
            java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
            #Unaligned:
            java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

            #You can then count the rows in each file that don't start with @ should give you the number of unaligned reads.:
            grep -cv "^@" your_Unaligned_reads.sam

            #For the uniquely mapped reads: they should have this tag in the sam file: XT:A:U, so to count them:
            grep -c "XT:A:U" your_Aligned_reads.sam
            #The reads that map to more than one position should be have "XT:A:R", so:
            grep -c "XT:A:R" your_Aligned_reads.sam
            I‘ve figured out the reason.Thanks so much for helping.
            Last edited by Huijuan; 04-26-2011, 05:00 PM.

            Comment


            • #7
              Originally posted by Huijuan View Post
              I‘ve figure out the reason.Thanks so much for helping.
              Can you share with me how you fixed the problem? Thanks

              Comment


              • #8
                This time I didn't use viewSam.jar and just used awk and grep to get them apart and then did counting

                Comment


                • #9
                  #For the uniquely mapped reads: they should have this tag in the sam file: XT:A:U, so to count them:
                  grep -c "XT:A:U" your_Aligned_reads.sam
                  #The reads that map to more than one position should be have "XT:A:R", so:
                  grep -c "XT:A:R" your_Aligned_reads.sam
                  Both of these grep's return 0 for me.


                  Are you doing any other steps between tophat and changing the file from bam to sam?

                  Comment


                  • #10
                    I used awk and grep for this task - just need to base on the 'read name' and CIGAR in sam file.

                    Comment


                    • #11
                      Originally posted by brpet View Post
                      Both of these grep's return 0 for me.


                      Are you doing any other steps between tophat and changing the file from bam to sam?
                      Yes i do the following to convert the bam to sam

                      samtools view accepted_hits.bam > accepted_hits.sam
                      samtools view -H accepted_hits.bam > header.txt
                      cat header.txt accepted_hits.sam > accepted_hits2.sam

                      But you can't use this sam file for counting the number of reads that are mapped to the reference or not as this sam is made from bam and bam file is supposed to contain only mapped reads.

                      I am still to figure out how to use tophat generated bam file for counting the number of reads that are mapped to the reference or not. I will let you know once i know it.

                      Comment


                      • #12
                        Originally posted by upendra_35 View Post
                        Yes i do the following to convert the bam to sam

                        samtools view accepted_hits.bam > accepted_hits.sam
                        samtools view -H accepted_hits.bam > header.txt
                        cat header.txt accepted_hits.sam > accepted_hits2.sam

                        But you can't use this sam file for counting the number of reads that are mapped to the reference or not as this sam is made from bam and bam file is supposed to contain only mapped reads.

                        I am still to figure out how to use tophat generated bam file for counting the number of reads that are mapped to the reference or not. I will let you know once i know it.
                        there‘s a method saying that you can keep the tmp files when you run Tophat by the parameter --keep-tmp then you will get both sam and bam files

                        Comment


                        • #13
                          Originally posted by upendra_35 View Post
                          Yes i do the following to convert the bam to sam

                          samtools view accepted_hits.bam > accepted_hits.sam
                          samtools view -H accepted_hits.bam > header.txt
                          cat header.txt accepted_hits.sam > accepted_hits2.sam
                          You can output the header with the alignments to your SAM file in one step, avoiding long cat job. The '-h' option for samtools view will output the header in addition to the alignments.

                          Code:
                          % samtools view -h accepted_hits.bam -o accepted_hits.sam
                          The -o option is used to give the output file name (or redirect stdout as shown in upendra's example).

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          30 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          32 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          28 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X