Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Lien
    Member
    • Dec 2009
    • 47

    Invalid format for mpileup use with VarScan

    Dear all,

    I'm trying to call variants on RNA-seq data.
    I aligned the paired-end reads with Bowtie/Tophat and then generated a mpileup file with Samtools version 0.1.16 using the following command:
    samtools mpileup -q 1 -Q 13 -f human_g1k_v37.fasta test.sorted.bam > test.sorted.mpileup

    This mpileup file looks ok. Then I try to run VarScan version 2.3.2, with the following code:
    java -jar VarScan.v2.3.2.jar mpileup2snp /test.sorted.mpileup --min-var-freq 0.08 --p-value 0.01 > /test.sorted.mpileup.varscan

    Only SNPs will be reported
    Min coverage: 8
    Min reads2: 2
    Min var freq: 0.08
    Min avg qual: 15
    P-value thresh: 0.01
    Reading input from /test.sorted.mpileup

    Initially, everything seems fine, but then VarScan throws me this error:

    Error: Invalid format for pileup at line 419813325
    22 31372123 A 1018 ...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,......,.....,,,,,,,,,,,,,,............,,,,,,,,,,,,,,.....................,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.......................,,,,,,,,,,.........t,,..........................................,,,,,,,,,,,,,,,,,,,,,,,,,,,..........,,,,,,,,,,,,,,,,,,,,,,..,,,,,,,,,.,,,,.................,,...........................,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,....,,,,,,,,,,,...,,..,......................................................,,,........,,,,.....................................,,,,,,,............,,................................................,.........,.....,..............,.,,,,,........,.,,,,,,,..,,,,,,.,,,,,,,,.,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,.......,,,,,,,,,,,,,,,,...........,,,,.,,,,,,,,,,,,,,,,,,,,,,,,,......,,,,,,,..,,,,..,,,,,,,,,,,,,,,,,.............,,,,,.,,,,,,,,.....,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.GHJHHIJH@JIEECJI39<IG:9JI<DCDCD?JJJJIFJFGJ1JJIIHC)3DAHCC9G)?JJEEJHJBJ@:JIIHIJJGFAIIJIH@IDI@H?IIIJJIGEJJGJH1IHGJEJC?@EIIIJHJ@JIJJJFEJJHIIJFJEGJJFGHJGJHJIIIJHCGJJ?GJHJ?F<ACJIGEEFJBD9DACCC?FJHJJCIJEJJIGJJJIJJG@EJAJCJHEFJID?DCDC?#CDCCDC?DGJCDGIEFEHFBJH?GJIJJGGIC?<JD?IG@JAIFIIDIJHCD<CAIIGJJJJDJ;JEJIIIJGIEJEDHJ;GJJJIJGJIGG>@IJJIJCJJJGGJJC?HAHGFHADGCGHH?3<?DDH<<DGCFFDFDF?FDF#C???D@D#FFDDCCD?(3CDC<CCCCCC<D@+#DBDC#DDDA?C#9C?D@=CD@=9

    The generated varscan-file up till that line seems fine. However, I don't really know how to work around this corrupt line. Would it be easiest if I just removed that line? And how can I do this on the mpileup file? Or are there other options?

    Many thanks,
    Lien
  • dkoboldt
    Member
    • Mar 2009
    • 62

    #2
    It's strange to suddenly encounter an invalid-format line in SAMtools mpileup output. If I take it *exactly* as you pasted it, there's no delimiter (tab) between the last base call in column 5 and the first base quality value ("G"). However, if I put a tab between those, VarScan read the line just fine.

    In terms of immediate action, you could simply remove the line (99.9% of reads show no variant anyway) with grep or vi or another command-line tool. You might also want to send me lines 419813320-419813330 of your pileup file and I'll take a look.

    Comment

    • Lien
      Member
      • Dec 2009
      • 47

      #3
      Hi Dan,

      I also don't know how this strange line is formed. I performed the exact same commands on similar files, and they seem to work fine.
      I just removed this line, so hopefully everything will work out now. I just wasn't sure if I could just delete this file without consequence.

      Thanks for your help,
      Lien

      Comment

      • vyellapa
        Member
        • Oct 2011
        • 59

        #4
        I have a similar error and am curious if the reason for this is found? Im running it in a pipeline and it would be easier if I could do something as a remedial step without having to check for the error messages.

        Thank you,
        Teja

        Code:
        Error: Invalid format for pileup at line 71
        1       10277   C       0

        Comment

        • dkoboldt
          Member
          • Mar 2009
          • 62

          #5
          Vyellapa, can you send me the first 75 lines of your pileup file? Send it to dkoboldt (at) genome [dot] wustl [dot] edu

          Comment

          • dkoboldt
            Member
            • Mar 2009
            • 62

            #6
            Hello all,

            We have just released VarScan v2.3.5 which should correct the invalid mpileup warning:

            VarScan files. Full list of files for VarScan, Variant detection in next-generation sequencing data

            Comment

            • wdemos
              Member
              • Jun 2012
              • 31

              #7
              v2.3.5 similar format error

              I am also trying to call snps and indels using VarScan. I have the latest release (v2.3.5) installed so I can output to vcf format. I am using the following command and getting the invalid format error:

              -bash-3.2$ java -jar VarScan.v2.3.5.jar mpileup2cns /research/sample.mpileup -min-coverage 8 --min-reads2 2 --min-var-freq 0.01 --min-avg-qual 15 --p-value 0.01 --strand-filter 0 --output-vcf 1 --variants 0 > /research/sample.vcf
              Only variants will be reported
              Min coverage: 8
              Min reads2: 2
              Min var freq: 0.01
              Min avg qual: 15
              P-value thresh: 0.01
              Reading input from /research/sample.mpileup
              Error: Invalid format for pileup at line 1
              ï¿ï¿½BC��<�BCFYc1c2c3c4c5c6c7c8c9c10c11c12c13c14c15c16c17c18c19c20c21c22cXcYcMt5/research/Wdemos_work/sample_reordered_sorted.bam%##samtoolsVersion=0.1.17 (r973:277)

              Can anyone please lend me a hand to figure out why it isn't working please? Also, I do not understand why the sorted. bam file in another directory is being referred to in the error message.

              This is how I generated my pileup file:
              samtools mpileup -q 1 -C50 -DSuf /ref/human_v37.fa /research/sample_reordered_sorted.bam > /research/sample.pileup

              thanks
              Last edited by wdemos; 03-18-2014, 05:57 AM.

              Comment

              • wdemos
                Member
                • Jun 2012
                • 31

                #8
                the -u option was causing an issue. I was running it in a wrapper and not technically piping it in to VarScan

                Comment

                • dkoboldt
                  Member
                  • Mar 2009
                  • 62

                  #9
                  Thanks for letting me know!

                  Comment

                  • hugorody
                    Junior Member
                    • May 2013
                    • 9

                    #10
                    short solution

                    Try to use this command before you call variants:

                    sed -n '/\t0\t/!p' file.mpileup > file.mpileup2

                    use the file.mpileup2 to call variants.

                    Comment

                    • bioliyezhang
                      Member
                      • Mar 2011
                      • 19

                      #11
                      Originally posted by vyellapa View Post
                      I have a similar error and am curious if the reason for this is found? Im running it in a pipeline and it would be easier if I could do something as a remedial step without having to check for the error messages.

                      Thank you,
                      Teja

                      Code:
                      Error: Invalid format for pileup at line 71
                      1       10277   C       0
                      Hi, Teja:

                      I wonder whether you solved the problem with mpileup, if so, would you mind giving me some suggestions on that? Thanks.

                      Best,
                      Liye

                      Comment

                      • vyellapa
                        Member
                        • Oct 2011
                        • 59

                        #12
                        Looks like Dan Kabolt fixed it the next release of Varscan. If youre still having an error with the new release too, im not sure what else could help.

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by SEQadmin2


                          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                          Here are nine questions we think about, in roughly the order they matter, before...
                          Today, 07:11 AM
                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM
                        • SEQadmin2
                          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                          by SEQadmin2


                          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                          Introduction

                          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                          05-22-2026, 06:42 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Yesterday, 06:09 AM
                        0 responses
                        16 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        34 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        41 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        48 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...