Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat/Bowtie - number of reads aligned

    Sorry if this is a really obvious question, but I am new to the analysis of sequencing data. (Also sorry if this is posted in the wrong forum.) We are aligning an Illumina paired-end RNA Seq run to the hg19 human genome. I want to know how many of the reads aligned to the genome and I am not sure I am looking in the right place. For each lane, there are 5 files in the logs directory that I think might be helpful:

    bowtie.left_kept_reads.fixmap.log
    # reads processed: 51079805
    # reads with at least one reported alignment: 29895367 (58.53%)
    # reads that failed to align: 21144050 (41.39%)
    # reads with alignments suppressed due to -m: 40388 (0.08%)
    Reported 38111199 alignments to 1 output stream(s)

    bowtie.left_kept_reads_seg1.fixmap.log
    # reads processed: 21144050
    # reads with at least one reported alignment: 4892206 (23.14%)
    # reads that failed to align: 16209674 (76.66%)
    # reads with alignments suppressed due to -m: 42170 (0.20%)
    Reported 8921307 alignments to 1 output stream(s)

    bowtie.left_kept_reads_seg2.fixmap.log
    # reads processed: 21144050
    # reads with at least one reported alignment: 5032085 (23.80%)
    # reads that failed to align: 16048266 (75.90%)
    # reads with alignments suppressed due to -m: 63699 (0.30%)
    Reported 9308464 alignments to 1 output stream(s)

    bowtie.left_kept_reads_seg3.fixmap.log
    # reads processed: 21144050
    # reads with at least one reported alignment: 4938783 (23.36%)
    # reads that failed to align: 16146855 (76.37%)
    # reads with alignments suppressed due to -m: 58412 (0.28%)
    Reported 9092214 alignments to 1 output stream(s)

    bowtie.left_kept_reads_seg4.fixmap.log
    # reads processed: 21144050
    # reads with at least one reported alignment: 3500454 (16.56%)
    # reads that failed to align: 17621660 (83.34%)
    # reads with alignments suppressed due to -m: 21936 (0.10%)
    Reported 5527529 alignments to 1 output stream(s)

    (There are the duplicate files for the right kept reads, which I know should be dealt with in the same way...)

    Obviously the first file is the initial alignment. The next 4 seem to be mapping the reads that were unmapped during the first pass (given the reads processed in each is the same as the reads unmapped in the first file). From the run data, I am also assuming that these originally unmapped reads are mapped to junctions?

    58% alignment isn't very good, but if I add the reads aligned in the 4 seg files, the total alignment is 94% - is this actually correct to do though?

    I also want to know how many of the reads map to junction sites - am I correct in thinking the 4 seg files are mapping reads to junctions? This seems like a really high number map to junction sites if this is the case (35%). If not, is there somewhere else I can find this data.

    Thanks for any help/advice you all can give me!

  • #2
    A simple way to get the data you want and a bit more would be to use bamtools or Picard. bamtools has the stats command and Picard has several different commands for extracting various metrics from your bam/sam files.

    Comment


    • #3
      how can I calculate the number of reads mapped to junctions by bamtools or picard?

      a simple way I can think about is to counts the splitted reads, but I am not sure it is right way.

      Thank you

      Originally posted by pbluescript View Post
      A simple way to get the data you want and a bit more would be to use bamtools or Picard. bamtools has the stats command and Picard has several different commands for extracting various metrics from your bam/sam files.

      Comment


      • #4
        Originally posted by townway View Post
        how can I calculate the number of reads mapped to junctions by bamtools or picard?

        a simple way I can think about is to counts the splitted reads, but I am not sure it is right way.

        Thank you
        You can use the Bamtools filter option to get all reads with an "N" in the cigar string. This indicates a split read.

        Comment


        • #5
          would you mind which version of bamtools you used?
          the current one does not have cigar filter
          Thank you

          General Filters:
          -alignmentFlag <int> keep reads with this *exact*
          alignment flag (for more detailed queries,
          see below)
          -insertSize <int> keep reads with insert size
          that mathces pattern
          -mapQuality <[0-255]> keep reads with map quality
          that matches pattern
          -name <string> keep reads with name that
          matches pattern
          -queryBases <string> keep reads with motif that
          mathces pattern
          -tag <TAG:VALUE> keep reads with this
          key=>value pair

          Originally posted by pbluescript View Post
          You can use the Bamtools filter option to get all reads with an "N" in the cigar string. This indicates a split read.

          Comment


          • #6
            The current version of Bamtools does have the filter by cigar string option.
            You can read a manual here:

            Get the current version here:


            Your bamtools command will be something like:

            Code:
            bamtools filter -in reads.bam -out split_reads.bam -script cigarN.script
            Your cigarN.script should look like this:
            Code:
            {
                    "cigar" : "*N*"
            }

            Comment


            • #7
              unmapped reads with TopHat and mismatch

              We work also with RNASeq in Plant genome Arabidopsis and for us the majority of unmapped reads are not due to intron junctions but are due to errors at the end of sequences (we got read of 100b and the ten or twenty last bases contain mismatch). We know that the parameters for bowtie are very strict, no more 3 mismatch are authorized by bowtie. So are you sure that the majority of unmapped reads were due to exon/inrton junctions?

              bye, VB

              Originally posted by mgibson View Post
              Sorry if this is a really obvious question, but I am new to the analysis of sequencing data. (Also sorry if this is posted in the wrong forum.) We are aligning an Illumina paired-end RNA Seq run to the hg19 human genome. I want to know how many of the reads aligned to the genome and I am not sure I am looking in the right place. For each lane, there are 5 files in the logs directory that I think might be helpful:

              bowtie.left_kept_reads.fixmap.log
              # reads processed: 51079805
              # reads with at least one reported alignment: 29895367 (58.53%)
              # reads that failed to align: 21144050 (41.39%)
              # reads with alignments suppressed due to -m: 40388 (0.08%)
              Reported 38111199 alignments to 1 output stream(s)

              bowtie.left_kept_reads_seg1.fixmap.log
              # reads processed: 21144050
              # reads with at least one reported alignment: 4892206 (23.14%)
              # reads that failed to align: 16209674 (76.66%)
              # reads with alignments suppressed due to -m: 42170 (0.20%)
              Reported 8921307 alignments to 1 output stream(s)

              bowtie.left_kept_reads_seg2.fixmap.log
              # reads processed: 21144050
              # reads with at least one reported alignment: 5032085 (23.80%)
              # reads that failed to align: 16048266 (75.90%)
              # reads with alignments suppressed due to -m: 63699 (0.30%)
              Reported 9308464 alignments to 1 output stream(s)

              bowtie.left_kept_reads_seg3.fixmap.log
              # reads processed: 21144050
              # reads with at least one reported alignment: 4938783 (23.36%)
              # reads that failed to align: 16146855 (76.37%)
              # reads with alignments suppressed due to -m: 58412 (0.28%)
              Reported 9092214 alignments to 1 output stream(s)

              bowtie.left_kept_reads_seg4.fixmap.log
              # reads processed: 21144050
              # reads with at least one reported alignment: 3500454 (16.56%)
              # reads that failed to align: 17621660 (83.34%)
              # reads with alignments suppressed due to -m: 21936 (0.10%)
              Reported 5527529 alignments to 1 output stream(s)

              (There are the duplicate files for the right kept reads, which I know should be dealt with in the same way...)

              Obviously the first file is the initial alignment. The next 4 seem to be mapping the reads that were unmapped during the first pass (given the reads processed in each is the same as the reads unmapped in the first file). From the run data, I am also assuming that these originally unmapped reads are mapped to junctions?

              58% alignment isn't very good, but if I add the reads aligned in the 4 seg files, the total alignment is 94% - is this actually correct to do though?

              I also want to know how many of the reads map to junction sites - am I correct in thinking the 4 seg files are mapping reads to junctions? This seems like a really high number map to junction sites if this is the case (35%). If not, is there somewhere else I can find this data.

              Thanks for any help/advice you all can give me!

              Comment


              • #8
                hi...

                I am done with eukaryotic de novo assembly of transcriptome( illuma platform) 30mn paired-end reads.now I would ike to chk the accuracy .so desided to map against reads using bowtie.when I use mapping qulaity 100 (--mapq 100).getting following results

                reads processed:47mn
                reads with atleast one reported alignment:4mn(9%)
                reads with failed to align:21%
                reads with alignments suppressed due to -m:69%
                reported:4mn

                same time when I don't use --mapq 100 I get follwing values.

                reads processed:47mn
                reads with atleast one reported alignment:37mn(78%)
                reads with failed to align:21%
                reads with alignments suppressed due to -m:69%
                reported:37mn

                bit confused abt setting the mapping quality for mapping RNA-seq reads aginst transcriptome assembely.can anyone suggest me ? thanks in advance!

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X