Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools mpileup

    Hi,

    When I use samtools mpileup, I get bases with base qualities lower than the -Q cutoff. For example my command

    $samtoolsdir/samtools mpileup -f $resdir/human_g1k_v37.fasta my_reads.recal.bam > junk.pileup

    should have the default cutoff of 13, but I see bases with quality 2=# like
    1 10257 A 1 . #

    Setting -Q to something stupid like 200 seems to have no effect. I am using samtools-0.1.12a . Is there a way to filter output using quality in mpileup?

    Thanks,

    Vince

  • #2
    any reason not to update to the latest version of samtools.. you're using a pretty old version. That alone may solve your problem

    Also, you can use awk/sed/grep to filter results based on quality..

    for instance:
    Code:
    awk '$6<200{next}1' snp.hits > snp.filtered.hits

    Comment


    • #3
      samtools mpileup

      The latest samtools-0.1.18 also does the same thing. I also used different flags

      -B -Q 43

      to disable base recalibration, and -Q43 which should be a more sensible cutoff--the default is maybe phred not sanger based:
      -Q INT Minimum base quality for a base to be considered [13]

      None of these changes fixed the problem. I still wind up with lines like

      1 610235 G 10 ,$......... ?HHHHHI#JK

      and it will not be so easy to fix this using awk/sed/grep to something like

      1 610235 G 9 ,$......... ?HHHHHIJK

      Thanks,

      Vince

      Comment


      • #4
        samtools mpileup

        For the record, problem solved by using latest git version of samtools, see


        Vince

        Comment


        • #5
          I was running into the same problem and was trying to download the git version. However I do not see a repository.



          The source forge page shows a last modified date of 2011-09-02 for 1.18 version.

          What is the solution to filter reads based on base quality?

          Comment


          • #6
            Tools (written in C using htslib) for manipulating next-generation sequencing data - samtools/samtools

            Comment


            • #7
              Thank you.That solved the -Q problem but I see another problem.I do not see all the pileup reads in some cases

              Code:
              samtools mpileup -A -B -C 0 -r 1:14600-14600 -q 0 -Q 0 -M 256 sorted309.bam
              [mpileup] 1 samples in 1 input files
              <mpileup> Set max per-file depth to 8000
              1	14600	N	4	CcCC	@.IJ
              While I see 4 reads in mpileup, viewing it through IGV has 11 reads piled up at that particular position.
              When I use pysam "fetch" the pileup has 11 rows too(see output below). I tried looking for patterns (strand direction, spliced reads, etc..) but I do not see any specific pattern.

              Code:
              python sampy.py sorted309.bam 1 14600
              HWI-ST388-W7D:269:B020WACXX:1:2307:17129:160814	417	0	14520	3	[(0, 101)]	0	14755	101	CGCCCCGGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTAGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCA	:?@:=?)<8:8C;CDH3):C@EHHGGE=GG?FBH8GHHG)=FFEEH)BDEH?E@>@DDF@@@CAC;>AC8@<@8A9@<0<8<CD<<2><((4(:(:?@A>:	[('NM', 2), ('NH', 2), ('CC', '15'), ('CP', 102516544.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:1101:9762:35190	355	0	14537	3	[(0, 101)]	0	14572	101	CTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGT	@C@FFDDFDHHHHJJIGGIIJIJJJIJDHGGJJJJJJJGIJJB@FHII?FGGHGIIIIJDHGFEF@@BBCCD-;AC>>A?CCC3<2>:CB>@>>98?BA34	[('NM', 0), ('NH', 2), ('CC', '15'), ('CP', 102516528.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:2302:13220:150064	329	0	14544	3	[(0, 101)]	-1	-1	101	CAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTC	@@@FFFFFHHHHHJJJJJJJJJJJIIJJJJJIIJJFGHJJJIJJJIIJJJFGIJJJJHHHHE@@BBECCACAC@CDDDDDCC?C>;>><52?C@4>ABC34	[('NM', 0), ('NH', 2), ('CC', '15'), ('CP', 102516520.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:1:1306:10005:27456	97	0	14551	3	[(0, 101)]	0	14808	101	CCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCACGTCAGAGCAA	@;?DADDDDHDFFI@DEF<DBA<?<CBG49:9DA:C@3@9?@G<D:@;@;@CHIIG>=77=C=?CC@C?BABCCC?CCAC3(<?BCCCCC###########	[('NM', 1), ('NH', 2), ('CC', '15'), ('CP', 102516512.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:2204:12310:69039	153	0	14564	3	[(0, 101)]	-1	-1	101	CTGGTCTCCACACCGTGCGGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTG	#####################B>50C<5'/83356.;?AEAGAE;A@F=3=)49BEIGF?BF?B8DFB?D:11*C?*>C9FA9EHFAGHB?DH>DD=:<B@	[('NM', 2), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:1304:12406:19285	99	0	14568	3	[(0, 101)]	0	14731	101	TCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATTTCAGAGCAACGGCCCACGTCTGGTTC	@@@?BA++AD>DB+ACBEHBA@?CBGHCFEIIFF>?DBDEF?DBFDBFEC@3@F@AEE@DEE;7A);.7;@@@############################	[('NM', 3), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:1:2103:16843:14609	355	0	14569	3	[(0, 101)]	0	14726	101	CTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCT	@@<DDEFDDDHHFHCGFGHEHI9B@FBGA@AGHE<FHEGF9BDB?B=CC@FCCDFFECDGHECC>;(77?@;BCE@C:5@5>3(862;@B?BCCCACD?##	[('NM', 0), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:2305:7247:113327	329	0	14570	3	[(0, 101)]	-1	-1	101	TCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGTGCTGAGCAGCTTGTCCTGGCTGCGTCCATGGCAGAGCAACGGCGCAAGTCTGGGTCTG	11?;;B;D+2ADDC+<C+<?EE<B++:??D)???BD9*?##############################################################	[('NM', 5), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:1104:5276:115086	97	0	14571	3	[(0, 101)]	0	14789	101	CCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTCTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGG	@B@FFFFFHFHHHIJJJJJIJIJJIEIIJIJDEGGCAEFHEHDGHIJ@DEHGIIIIJIIJACHGHHEFFFFFFDE@@>65ABD;=B9?BBDCEDCC93<A9	[('NM', 1), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:1101:9762:35190	403	0	14572	3	[(0, 101)]	0	14537	101	CACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGC	?ADCCCDDBBA?-@B<DCB><BB@BBADCEECEFFFFFFHHHFFGIEGIIHGIJIHEGF;CGCDDCJJJIJJJJHIIIIIJJJJIIJIHFHDFFFFFFC@@	[('NM', 1), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:2205:10161:121730	339	0	14587	3	[(0, 101)]	0	14460	101	CCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGNAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATG	9@>A5)>9BB?<B?CCCDDCCDC@<:(28<(,#DDB<5(A?B??A>ACEFFFFFGFHHEEA<D;GIHHGGHFDIEECFJIGIIEGHGIHHGHDFFEFF@CC	[('NM', 1), ('NH', 2), ('CC', '15'), ('CP', 102516480.0), ('HI', 0)]
              Last edited by vyellapa; 06-04-2012, 11:17 PM.

              Comment


              • #8
                GATK :Failed to load reference dictionary

                Hi,
                When I use DepthOfCoverage of GATK, the error is 'Failed to load reference dictionary' .
                I can't get why.
                My command as below:
                $java -jar GenomeAnalysisTK.jar \ -T DepthOfCoverage \ -R refall/allRef.fa \ -I mapping/DNAnew.sorted.bam \ -o mapping/dcDNA.targetCoverage

                Thanks

                Comment


                • #9
                  Do you have the reference ".dict" file and all the indicies for the reference you are using at the same location where your reference is?

                  Comment


                  • #10
                    Hi,
                    I try the CreatSequenceDictionary to get .dict file of reference,but unsuccessful. My reference is some DNA sequence get from NCBI not hg19 or chromosome reference. I just wanna get the coverage of reads on the reference . I don't know how to use CoverageBySample of GATK.
                    Please give some hints, thank you so much!

                    cheng

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 11:49 AM
                    0 responses
                    15 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-24-2024, 08:47 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    61 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X