Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Read quality in SAM files from tophat

    I just noticed that the read quality field in the SAM file (output from tophat pipeline) is zero 75% of the times.. I read that 0 means poor quality read.
    Does this mean the data is bad? Should I filter all reads with quality score 0 from the SAM file? because I am not sure if the results on differential expression analysis are reliable anymore.
    thanks.

  • #2
    Can you provide a piece of your sam file for us to look at?

    Comment


    • #3
      The fifth column which indicates quality is zero most of the times.

      HWUSI-EAS598:17:FC-RNAseq:8:95:12261:1462 16 chr1 171130 0 36M * 0 0 GACATTAAGTTATTCTTCAACCTTCGTTTGTGTGTG AHHHHHGGDGGDEGGGGGBGGGGDBGAHEGHHHDHD NM:i:0 NH:i:14 CC:Z:= CP:i:422067
      HWUSI-EAS598:17:FC-RNAseq:8:45:16542:13964 16 chr1 179698 0 36M * 0 0 TTGTCCTGCTGGGTTTCTTGCTTTGGTCCAATCTTC ;HIIIIIIIHIIIIIIIHIIIIIHIIIIIIIIGIII NM:i:0 NH:i:6 CC:Z:= CP:i:617466
      HWUSI-EAS598:17:FC-RNAseq:8:47:19786:10094 0 chr1 188055 0 36M * 0 0 CGAGGATGCATTGACCTATGATGATGTACATGTGAA IIIIIIIIIIIIIIIIIIIIFHIIGFHIIIHIHII6 NM:i:2 NH:i:15 CC:Z:= CP:i:432104
      HWUSI-EAS598:17:FC-RNAseq:8:21:3206:4808 16 chr1 188066 0 36M * 0 0 TGACCTATGATGATGTACATGTGAACTTCACTCGAG 4CAEE<EEG>EEFEBE5=;>7DCF@CEDFEFE>CGG NM:i:0 NH:i:15 CC:Z:= CP:i:392265
      HWUSI-EAS598:17:FC-RNAseq:8:27:9023:4090 16 chr1 188066 0 36M * 0 0 TGACCTATGATGATGTACATGTGAACTTCACTCGAG <IFEIFIIIIGIIEIIIHIIGIIHIIIIIHIDIHII NM:i:0 NH:i:15 CC:Z:= CP:i:392265
      HWUSI-EAS598:17:FC-RNAseq:8:75:1211:20901 16 chr1 188066 0 36M * 0 0 TGACCTATGATGATGTACATGTGAACTTCACTCGAG ###################??:9=>5.==6?8BBCC NM:i:0 NH:i:15 CC:Z:= CP:i:392265
      HWUSI-EAS598:17:FC-RNAseq:8:66:7143:12901 16 chr1 188071 0 36M * 0 0 TATGATGATGTACATGTGAACTTCACTCGAGAAGAA 5EDDEE:>EEEBBDBBEEDEEE?DD>DDDB?GG=G= NM:i:0 NH:i:17 CC:Z:= CP:i:392270
      HWUSI-EAS598:17:FC-RNAseq:8:49:13479:17135 0 chr1 188073 0 36M * 0 0 TGATGATGTACATGTGAACTTCACTCGAGAAGAATG IIIIIIIIIIIIIIHIIIIIIIIIIIIIIIHIIGI< NM:i:0 NH:i:17 CC:Z:= CP:i:392272
      HWUSI-EAS598:17:FC-RNAseq:8:55:13510:18188 0 chr1 188073 0 36M * 0 0 TGATGATGTACATGTGAACTTCACTCGAGAAGAATG HHHHDHHHFHGGGGFHGDHHHHHHFGGDGFEGGGG9 NM:i:0 NH:i:17 CC:Z:= CP:i:392272
      HWUSI-EAS598:17:FC-RNAseq:8:116:9033:18901 16 chr1 188073 0 36M * 0 0 TGATGATGTACATGTGAACTTCACTCGAGAAGAATG ?IIHIIIIIIIIIIIIIIIIIIIGIIIIIIIIEIII NM:i:0 NH:i:17 CC:Z:= CP:i:392272
      HWUSI-EAS598:17:FC-RNAseq:8:67:15838:6920 16 chr1 188076 0 36M * 0 0 TGATGTACATGTGAACGTCACTCGAGAAGAATGGGC ############################??6??2(6 NM:i:1 NH:i:17 CC:Z:= CP:i:392275
      HWUSI-EAS598:17:FC-RNAseq:8:41:12593:14024 16 chr1 188077 0 36M * 0 0 GATGTACATGTGAACTTCACTCGAGAAGAATGGGCG <HHHHHHHHHHGHHHHGDEGEHHHHHBDDEB:;@9B NM:i:1 NH:i:17 CC:Z:= CP:i:392276
      HWUSI-EAS598:17:FC-RNAseq:8:76:6547:18104 16 chr1 188077 0 36M * 0 0 GATGTACATGTGAACTTCACTCGAGAAGAATGGGCG BGGGDDFEEFD?GEGGDD?E@GDEEGHHHHGHHHHH NM:i:1 NH:i:17 CC:Z:= CP:i:392276

      Comment


      • #4
        Originally posted by biofreak View Post
        I just noticed that the read quality field in the SAM file (output from tophat pipeline) is zero 75% of the times.. I read that 0 means poor quality read.
        Does this mean the data is bad? Should I filter all reads with quality score 0 from the SAM file? because I am not sure if the results on differential expression analysis are reliable anymore.
        thanks.
        Column 5 is not read quality it is mapping quality, a measure of the likelihood that the mapping position is correct. It would be the responsibility of the mapping algorithm to calculate and report this value. Bowtie, the aligner underlying Tophat does not calculate mapping qualities according to its author (see post #2 in this thread). Some things have changed since that post was written. Have a look at the Bowtie manual, particularly the sections describing the options '-M' and '--mapq' for uses of the mapping quality field.

        Comment


        • #5
          kmcarr,
          Thanks for the clarification.
          I want to filter out all those with map quality =0.
          I use readGappedAlignments function which takes BAM file and discards the MAPQ field altogether. So this has to be done before readGappedAlignments function.
          I could just load .SAM file , filter it myself and convert it into .BAM. But I was hoping if it can be done in more elegant way.
          I was also looking at srFilter function in ShortRead package. but could not figure out.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          45 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X