Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • r.rosati
    Member
    • Aug 2015
    • 95

    BBmap on Ion data

    Hello,
    I tried to run BBmap to call variants on four BAM files (2 parents, 2 children) generated from exome sequencing on an Ion Proton machine (TMAP software).

    The bbmap command was:
    Code:
    /opt/bbmap/callvariants.sh in=IonXpress_009_rawlib.bam,IonXpress_010_rawlib.bam,IonXpress_011_rawlib.bam,IonXpress_012_rawlib.bam ref=./hg19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1> err.txt 2> out.txt &
    I got an error as following (I'm including also the last lines of output):

    Code:
    Finished second pass.
    Writing output.
    Merging [(IonXpress_009_rawlib, individual_IonXpress_009_rawlib.vcf.gz), (IonXpress_010_rawlib, individual_IonXpress_010_rawlib.vcf.gz), (IonXpress_011_rawlib, individual_IonXpress_011_rawlib.vcf.gz), (IonXpress_012_rawlib, individual_IonXpress_012_rawlib.vcf.gz)]
    Exception in thread "main" java.lang.NumberFormatException: For input string: "162,590"
    	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
    	at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
    	at java.lang.Float.parseFloat(Float.java:451)
    	at var2.MergeSamples.processHeader(MergeSamples.java:233)
    	at var2.MergeSamples.processRow(MergeSamples.java:177)
    	at var2.MergeSamples.mergeFiles(MergeSamples.java:151)
    	at var2.MergeSamples.mergeSamples(MergeSamples.java:117)
    	at var2.CallVariants2.process(CallVariants2.java:380)
    	at var2.CallVariants2.main(CallVariants2.java:52)
    	at var2.CallVariants.main(CallVariants.java:48)
    I unzipped the vcf files and used grep to find "162,590". I got a match at line 9 of the vcf for one individual:
    Code:
    ##readLengthAvg=162,590
    The other individuals also had similar formatting:
    (IonXpress_010)
    Code:
    ##readLengthAvg=160,055
    (IonXpress_011)
    Code:
    ##readLengthAvg=167,107
    (IonXpress_012)
    Code:
    ##readLengthAvg=169,228

    This is the full header for barcode 009:
    Code:
    ##fileformat=VCFv4.2
    ##BBMapVersion=36.92
    ##ploidy=2
    ##rarity=1,00000
    ##minallelefraction=0,10000
    ##reads=43874924
    ##pairedReads=0
    ##properlyPairedReads=0
    ##readLengthAvg=162,590
    ##properPairRate=0,00000
    ##totalQualityAvg=23,525
    ##mapqAvg=77,583
    ##reference=./hg19/hg19.fasta
    ##contig=<ID=chr1,length=249250621>
    ##contig=<ID=chr2,length=243199373>
    ##contig=<ID=chr3,length=198022430>
    ##contig=<ID=chr4,length=191154276>
    ##contig=<ID=chr5,length=180915260>
    ##contig=<ID=chr6,length=171115067>
    ##contig=<ID=chr7,length=159138663>
    ##contig=<ID=chr8,length=146364022>
    ##contig=<ID=chr9,length=141213431>
    ##contig=<ID=chr10,length=135534747>
    ##contig=<ID=chr11,length=135006516>
    ##contig=<ID=chr12,length=133851895>
    ##contig=<ID=chr13,length=115169878>
    ##contig=<ID=chr14,length=107349540>
    ##contig=<ID=chr15,length=102531392>
    ##contig=<ID=chr16,length=90354753>
    ##contig=<ID=chr17,length=81195210>
    ##contig=<ID=chr18,length=78077248>
    ##contig=<ID=chr19,length=59128983>
    ##contig=<ID=chr20,length=63025520>
    ##contig=<ID=chr21,length=48129895>
    ##contig=<ID=chr22,length=51304566>
    ##contig=<ID=chrX,length=155270560>
    ##contig=<ID=chrY,length=59373566>
    ##contig=<ID=chrM,length=16569>
    ##FORMAT=<ID=FAIL,Description="Fail">
    ##FORMAT=<ID=PASS,Description="Pass">
    ##INFO=<ID=SN,Number=1,Type=Integer,Description="Scaffold Number">
    ##INFO=<ID=STA,Number=1,Type=Integer,Description="Start">
    ##INFO=<ID=STO,Number=1,Type=Integer,Description="Stop">
    ##INFO=<ID=TYP,Number=1,Type=String,Description="Type">
    ##INFO=<ID=R1P,Number=1,Type=Integer,Description="Read1 Plus Count">
    ##INFO=<ID=R1M,Number=1,Type=Integer,Description="Read1 Minus Count">
    ##INFO=<ID=R2P,Number=1,Type=Integer,Description="Read2 Plus Count">
    ##INFO=<ID=R2M,Number=1,Type=Integer,Description="Read2 Minus Count">
    ##INFO=<ID=PPC,Number=1,Type=Integer,Description="Paired Count">
    ##INFO=<ID=LS,Number=1,Type=Integer,Description="Length Sum">
    ##INFO=<ID=MQS,Number=1,Type=Integer,Description="MAPQ Sum">
    ##INFO=<ID=MQM,Number=1,Type=Integer,Description="MAPQ Max">
    ##INFO=<ID=BQS,Number=1,Type=Integer,Description="Base Quality Sum">
    ##INFO=<ID=BQM,Number=1,Type=Integer,Description="Base Quality Max">
    ##INFO=<ID=EDS,Number=1,Type=Integer,Description="End Distance Sum">
    ##INFO=<ID=EDM,Number=1,Type=Integer,Description="End Distance Max">
    ##INFO=<ID=IDS,Number=1,Type=Integer,Description="Identity Sum">
    ##INFO=<ID=IDM,Number=1,Type=Integer,Description="Identity Max">
    ##INFO=<ID=COV,Number=1,Type=Integer,Description="Coverage">
    ##INFO=<ID=MCOV,Number=1,Type=Integer,Description="Minus Coverage">
    ##INFO=<ID=CED,Number=1,Type=Integer,Description="Contig End Distance">
    ##INFO=<ID=HMP,Number=1,Type=Integer,Description="Homopolymer Count">
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
    ##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Fraction">
    ##INFO=<ID=RAF,Number=1,Type=Float,Description="Revised Allele Fraction">
    ##INFO=<ID=DP4,Number=4,Type=Integer,Description="Ref+, Ref-, Alt+, Alt-">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
    ##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Allele Depth">
    ##FORMAT=<ID=AF,Number=1,Type=Float,Description="Allele Fraction">
    ##FORMAT=<ID=SC,Number=1,Type=Float,Description="Score">
    ##FORMAT=<ID=PF,Number=1,Type=String,Description="Pass Filter">
    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	IonXpress_009_rawlib
    Anything I should have done differently to not cause the error?
    Thank you!

    Comment

    • Brian Bushnell
      Super Moderator
      • Jan 2014
      • 2709

      Oh, that's really strange. It's due to your locale settings, I guess. In the US, decimals are printed like "##readLengthAvg=162.590" rather than "##readLengthAvg=162,590". For some reason Java is using commas as the delimiter when printing the floating point numbers, but not when trying to read floating point numbers, which is extremely irritating.

      This will take a little research to fix (other than you changing your computer's locale to US or something like that); I'll get back to you.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        @r.rosati: See if this helps. https://docs.oracle.com/cd/E23824_01...033/glmha.html

        Comment

        • r.rosati
          Member
          • Aug 2015
          • 95

          Originally posted by GenoMax View Post
          Thank you... Indeed I ran it on a ssh session from a computer that had pt_BR as LC_NUMERIC.

          I've read Zhou's comment on indels and I thought I'd first run the callvariants script on a test exome from CEPH 1347-02 DNA.
          Should I realign the reads (I'm reading it's advised for non-BBmap aligned reads?) i.e.
          Code:
          callvariants.sh in=IonXpress_014_rawlib.bam ref=./hg19/hg19.fasta out=CEPH_BBmap_variants_TMAP_BAM_realigned.vcf ploidy=2 realign threads=16 1> err.txt 2> out.txt &

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            Yes, I do recommend "realign" for non-BBMap-aligned reads.

            Comment

            • Marmottontaine
              Junior Member
              • Feb 2017
              • 3

              BBSplit / BBMap error

              Hello,
              I'm using BBSplit to remove mouse reads in human samples. My command is the following :
              bbsplit.sh ambiguous=best ambiguous2=toss path=$8 maxindel=900000 refstats=stat_alignment_$samplename.txt threads=4 qtrim=f untrim=f in1=$8/trim_R1_$samplename.fq in2=$8/trim_R2_$samplename.fq ref=$2/$1.fa,$2/Grcm38.fa basename=decontamin_$samplename.%_#.fq outu1=unmapped1_$samplename.fq outu2=unmapped2_$samplename.fq

              Which give me the following message where the parameters used are not what I specified :

              java -Djava.library.path=/home/audrey/bbmap/jni/ -ea -Xmx48113m -cp /home/audrey/bbmap/current/ align2.BBSplitter ow=t fastareadlen=500 minhits=1 minratio=0.56 maxindel=20 qtrim=rl untrim=t trimq=6 ambiguous=best ambiguous2=toss path=. maxindel=900000 refstats=stat_alignment_L6.txt threads=4 qtrim=f untrim=f in1=./trim_R1_L6.fq in2=./trim_R2_L6.fq ref=/home/audrey/reference_genomes/GATK/hg38.fa,/home/audrey/reference_genomes/GATK/Grcm38.fa basename=decontamin_L6.%_#.fq outu1=unmapped1_L6.fq outu2=unmapped2_L6.fq
              Executing align2.BBSplitter [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, ambiguous=best, ambiguous2=toss, path=., maxindel=900000, refstats=stat_alignment_L6.txt, threads=4, qtrim=f, untrim=f, in1=./trim_R1_L6.fq, in2=./trim_R2_L6.fq, ref=/home/audrey/reference_genomes/GATK/hg38.fa,/home/audrey/reference_genomes/GATK/Grcm38.fa, basename=decontamin_L6.%_#.fq, outu1=unmapped1_L6.fq, outu2=unmapped2_L6.fq]

              Converted arguments to [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, ambiguous=best, ambiguous2=toss, path=., maxindel=900000, refstats=stat_alignment_L6.txt, threads=4, qtrim=f, untrim=f, in1=./trim_R1_L6.fq, in2=./trim_R2_L6.fq, basename=decontamin_L6.%_#.fq, outu1=unmapped1_L6.fq, outu2=unmapped2_L6.fq, ref_hg38=/home/audrey/reference_genomes/GATK/hg38.fa, ref_Grcm38=/home/audrey/reference_genomes/GATK/Grcm38.fa]
              Creating merged reference file ref/genome/1/merged_ref_6706777893850.fa.gz
              Ref merge time: 112.084 seconds.
              Executing align2.BBMap [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, ambiguous=best, ambiguous2=toss, maxindel=900000, refstats=stat_alignment_L6.txt, threads=4, qtrim=f, untrim=f, in1=./trim_R1_L6.fq, in2=./trim_R2_L6.fq, outu1=unmapped1_L6.fq, outu2=unmapped2_L6.fq, ref=ref/genome/1/merged_ref_6706777893850.fa.gz, out_hg38=decontamin_L6.hg38_#.fq, out_Grcm38=decontamin_L6.Grcm38_#.fq]

              BBMap version 36.92
              Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.560
              Reference set statistics will be written to stat_alignment_L6.txt
              Set threads to 4
              Retaining first best site only for ambiguous mappings.
              NOTE: Deleting contents of ref/genome/1 because reference is specified and overwrite=true
              Writing reference.
              Executing dna.FastaToChromArrays2 [ref/genome/1/merged_ref_6706777893850.fa.gz, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

              Set genScaffoldInfo=true
              Writing chunk 1
              Writing chunk 2
              Writing chunk 3
              Writing chunk 4
              Writing chunk 5
              Writing chunk 6
              Writing chunk 7
              Writing chunk 8
              Writing chunk 9
              Writing chunk 10
              Writing chunk 11
              Writing chunk 12
              Writing chunk 13
              Set genome to 1

              Loaded Reference: 0.005 seconds.
              Loading index for chunk 1-13, build 1
              No index available; generating from reference genome: /media/audrey/26a8cc89-53d7-4f9b-984c-f8d2c0d519fd/Ievgenia/PDX/Exome/test/ref/index/1/chr1-3_index_k13_c2_b1.block
              Indexing threads started for block 0-3
              Indexing threads finished for block 0-3
              No index available; generating from reference genome: /media/audrey/26a8cc89-53d7-4f9b-984c-f8d2c0d519fd/Ievgenia/PDX/Exome/test/ref/index/1/chr4-7_index_k13_c2_b1.block
              Indexing threads started for block 4-7
              Indexing threads finished for block 4-7
              No index available; generating from reference genome: /media/audrey/26a8cc89-53d7-4f9b-984c-f8d2c0d519fd/Ievgenia/PDX/Exome/test/ref/index/1/chr8-11_index_k13_c2_b1.block
              Indexing threads started for block 8-11
              Indexing threads finished for block 8-11
              No index available; generating from reference genome: /media/audrey/26a8cc89-53d7-4f9b-984c-f8d2c0d519fd/Ievgenia/PDX/Exome/test/ref/index/1/chr12-13_index_k13_c2_b1.block
              Indexing threads started for block 12-13
              Indexing threads finished for block 12-13
              Generated Index: 571.883 seconds.
              Analyzed Index: 10.807 seconds.
              Reads that map to multiple references will be considered unmapped.
              Started output stream: 6.057 seconds.
              Creating ref-set statistics table: 0.012 seconds.
              Cleared Memory: 4.121 seconds.
              Processing reads in paired-ended mode.
              Started read stream.
              Started 4 mapping threads.


              Could you please explain me how to set and force the parameters?
              Many thanks in advance.

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                Which exact parameters are you referring to?

                Comment

                • Marmottontaine
                  Junior Member
                  • Feb 2017
                  • 3

                  Thank you GenoMax for your answer.
                  I let minratio at the default value. So it should be 0.65 as said in bbsplit help. here it's 0.56.
                  And for the parameters : maxindel, qtrim and untrim, they are both set as the default then as I asked in the same command line. Will it be problematic?
                  Many thanks.

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    I see. @Brian hopefully will be along later today with an official answer but my guess is that it probably has to do with ambigous2=toss option that you have selected.

                    Comment

                    • Brian Bushnell
                      Super Moderator
                      • Jan 2014
                      • 2709

                      And for the parameters : maxindel, qtrim and untrim, they are both set as the default then as I asked in the same command line. Will it be problematic?
                      Later parameters override earlier parameters, so that's not a problem.

                      As for 0.65 versus 0.56, that looks like either a typo, or else I changed the default at some point but forgot to update the help info; I'll fix that. Thanks! In the meantime you can always manually add "minratio=0.65" to override it.

                      Edit: To clarify, 0.56 is the intended behavior, and the description is in error.
                      Last edited by Brian Bushnell; 02-13-2017, 10:18 AM.

                      Comment

                      • Marmottontaine
                        Junior Member
                        • Feb 2017
                        • 3

                        Thank you very much GenoMax and Brian for your helpful answers.

                        Comment

                        • shimingt
                          Junior Member
                          • May 2016
                          • 9

                          Converting a coverage file into a csv format for mmgenome

                          Hello there,

                          Is there a way I could convert the mapping files that bbmap produced into csv coverage file that can be used for the mmgenome software (link: http://madsalbertsen.github.io/mmgen...ad_data.html)?

                          Thanks!

                          Comment

                          • Brian Bushnell
                            Super Moderator
                            • Jan 2014
                            • 2709

                            BBMap can directly output coverage using the "covstats" or "basecov" flags, or it can be generated from sam files with pileup.sh. Can you post a few lines from one of these mmgenome csv files so I can see what the format looks like?

                            Comment

                            • shimingt
                              Junior Member
                              • May 2016
                              • 9

                              Dear Brian,

                              Thanks for your reply.

                              This is from one of the mapping files (.csv) for mmgenome.

                              tanshiming@S620100019205:~/Downloads$ head C13.11.14.csv
                              "Name","Consensus length","Total read count","Single reads","Reads in pairs","Average coverage","Reference sequence","Reference length","Reference common name","Reference Latin name"
                              "1 mapping","5621","103","19","84","1.443","1","8264","",""
                              "2 mapping","370","6","2","4","0.625","2","1027","",""
                              "3 mapping","1665","198","52","146","13.536","3","1665","",""
                              "4 mapping","94","1","1","0","0.01","4","9056","",""
                              "5 mapping","3042","95","19","76","3.205","5","3343","",""
                              "6 mapping","901","8","2","6","9.663E-3","6","98207","",""
                              "7 mapping","5788","147","11","136","2.612","7","6480","",""
                              "8 mapping","14602","381","47","334","2.782","8","15790","",""
                              "9 mapping","1403","1056","282","774","85.135","9","1403","",""
                              tanshiming@S620100019205:~/Downloads$

                              Comment

                              • sebastian_1
                                Junior Member
                                • Mar 2017
                                • 3

                                Hello,

                                I am new to this forum so I apologize if I posted in the wrong place.

                                Currently I am trying to assemble a single cell genome from some Illumina Hiseq 2x250 bp reads.
                                One of my first steps is to do normalization of the data. Previously I was using bbnorm.sh because it's fast and very good. However, with this data I cannot use bbnorm.sh on our server due to memory issues. (I have around 100 million PE reads).
                                So I tried to use bbnorm on a cluster where I allocated 16 cores and 700GB of RAM.

                                I run the software using this command line:
                                bbnorm.sh in1=st1c_R1.fastq.gz in2=st1c_R2.fastq.gz out1=st1c_R1_norm.fastq.gz out2=st1c_R2_norm.fastq.gz threads=12 target=60 mindepth=2

                                The software runs fine, but the process will be killed by the scheduler because of high CPU usage.

                                =>> PBS: job killed: ncpus 59.52 exceeded limit 16 (sum)
                                I tried to increase the number of threads requested to 30, lower the number of threads in the command line to 10, but still I have the same problem.

                                The cluster has installed BBMap version 36.92. It looks like for some reason the number of threads which I define is not taken into account, even if during the run it tells me that the number of threads was set to 12.

                                Settings:
                                threads: 12
                                k: 31
                                deterministic: true
                                toss error reads: false
                                passes: 1
                                bits per cell: 16
                                cells: 145.37B
                                hashes: 3
                                base min quality: 5
                                kmer min prob: 0.5

                                target depth: 240
                                min depth: 2
                                max depth: 300
                                min good kmers: 15
                                depth percentile: 64.8
                                ignore dupe kmers: true
                                fix spikes: false
                                Any suggestions? I really cannot ask for the entire node just for me.
                                I understand that due to other components like pigz, it can use more than 12 cores, but here it looks like uses 60 cores, or all available cores.

                                Thank you,
                                Sebastian

                                Comment

                                Latest Articles

                                Collapse

                                • GATTACAT
                                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by GATTACAT
                                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                  07-01-2026, 11:43 AM
                                • SEQadmin2
                                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by SEQadmin2


                                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                  Here are nine questions we think about, in roughly the order they matter, before...
                                  06-18-2026, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 07-02-2026, 11:08 AM
                                0 responses
                                9 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-30-2026, 05:37 AM
                                0 responses
                                13 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-26-2026, 11:10 AM
                                0 responses
                                20 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-17-2026, 06:09 AM
                                0 responses
                                54 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...