![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Introducing BBMerge: A paired-end read merger | Brian Bushnell | Bioinformatics | 132 | 06-19-2020 04:15 AM |
Converter for vcf to bed format | ketan_bnf | Bioinformatics | 4 | 09-03-2013 05:43 AM |
Need Sequence Format Converter | byou678 | Bioinformatics | 5 | 10-23-2012 01:17 PM |
BOAT aligner output format converter? | rahul.m.dhodapkar | Bioinformatics | 0 | 06-30-2010 07:28 AM |
MAQ .map alignment format converter | fadista | Bioinformatics | 0 | 10-24-2008 06:27 AM |
![]() |
|
Thread Tools |
![]() |
#21 |
Member
Location: South Africa Join Date: Sep 2013
Posts: 12
|
![]()
Hi,
Can I use reformat or any other bbtools script to split my fasta file into sub-files? eg X.fa (100 sequences) -> X01.fa X02.fa....X10.fa (each with 10 sequences)? I don't mind whether I need to select the number of sequences per file or total number of files and it doesn't really matter what order the sequences are in as long as there is no duplication of sequences. Cheers, Dave |
![]() |
![]() |
![]() |
#23 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Reformat won't do that, but you can use partition.sh:
Code:
partition.sh in=X.fa out=X%.fa ways=10 |
![]() |
![]() |
![]() |
#24 |
Member
Location: Canada Join Date: Apr 2013
Posts: 17
|
![]()
Hi Brian Bushnell,
when I used mapPacBio.sh for mapping pacbio reads. I met the errors as following: Exception in thread "Thread-23" java.lang.AssertionError: Read 20, length 10550, exceeds the limit of 6019 You can map the reads in chunks by reformatting to fasta, then mapping with the setting 'fastareadlen=6019' at align2.AbstractMapThread.run(AbstractMapThread.java:480) But I did not find how I can reformat it. Could you help me figure out this issue? Thanks, Fuyou |
![]() |
![]() |
![]() |
#25 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]()
You can use
Code:
reformat.sh in=your_file.fastq out=newfile.fa That said I think mapPacBio.sh should automatically split reads longer than 6k when it does mapping. Is that not working? |
![]() |
![]() |
![]() |
#26 | |
Member
Location: Canada Join Date: Apr 2013
Posts: 17
|
![]() Quote:
Thanks, Fuyou |
|
![]() |
![]() |
![]() |
#27 |
Junior Member
Location: Canada Join Date: Jul 2014
Posts: 4
|
![]()
hello folks, I am trying to work on a FASTQ file using reformat.sh, although I have correctly installed Java and tested it in the command line, I still can't get it to work. It seems the problem is that I don't have the FASTQ file in the same directory as the BBMap folder, could that be an issue?
|
![]() |
![]() |
![]() |
#28 |
Registered Vendor
Location: Eugene, OR Join Date: May 2013
Posts: 521
|
![]()
pepe84, do you provide a path to the file? Please copy your command as tried, and then copy the error message.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com |
![]() |
![]() |
![]() |
#29 |
Junior Member
Location: Canada Join Date: Jul 2014
Posts: 4
|
![]()
here is the command:
java -cp C:\BBMap\current\jgi.ReformatReads in=“C:\BBMap\resources\SRRXXXXX.fastq” out1=EFB_R1.fq out2=EFB_R2.fq And here is the error: Error: Could not find or load main class in=C:\BBMap\resources\SRRXXXXX.fastq Just an FYI I am using the command line on windows. Thanks, I appreciate any help |
![]() |
![]() |
![]() |
#30 |
Junior Member
Location: Germany Join Date: Jan 2012
Posts: 3
|
![]()
Hi!
I have a interleaved fastq containing unmapped reads produced by segemehl -u. I want to deinterleave it into the two mate pair files as well as removing/saving the singletons into a separate file. Currently, reformat.sh cannot deal with it, even if I give outsingle= as parameter. The header contains the strand information (i. e. 2:N:0:2). Is there some way to get at least the pairing reads extracted without singletons in between? -- Kind regards, Mathias |
![]() |
![]() |
![]() |
#31 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]()
You could use `repair.sh` to separate the singletons out afterwards.
|
![]() |
![]() |
![]() |
#32 | |
Junior Member
Location: Germany Join Date: Jan 2012
Posts: 3
|
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#33 |
Director NGS Services, Lucigen
Location: Madison WI USA Join Date: Dec 2013
Posts: 12
|
![]()
In version 37.52, the parameters under Sam and bam processing options are confusing to me
Sam and bam processing options: mappedonly=f Toss unmapped reads. unmappedonly=f Toss mapped reads. pairedonly=f Toss reads that are not mapped as proper pairs. unpairedonly=f Toss reads that are mapped as proper pairs. primaryonly=f Toss secondary alignments. Set this to true for sam to fastq conversion. if 'mappedonly' is false, shouldn't that mean to KEEP unmapped plus mapped reads? Likewise, 'pairedonly' false (to me) means KEEP unpaired and paired In the end, I want my bam to only contain paired reads, so I've been running it with 'pairedonly=t' , but reformat.sh says 'input is being processed as unpaired' for my bam file. Last edited by milw; 02-26-2019 at 11:31 AM. |
![]() |
![]() |
![]() |
#34 | |
Junior Member
Location: Scotland Join Date: Sep 2019
Posts: 1
|
![]()
Hi,
first time poster so my apologies for any horrid faux pas', and also for the thread necromancy! ![]() I'm trying to convert fastq's to an unmapped SAM (ultimately to a cram to test space saving and to check no loss of data when it's reconverted back to fastq's again) and I saw a thread suggesting BBMAP. However I'm having the same issue as pepe84. The command: reformat.sh in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam results in: "java -ea -Xm200m -cp /opt/science/BBMap/sh/current/ jgi.ReformatReads in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam Error: Could not find or load main class jgi.ReformatReads" When I copy/paste the full java line quoted in the error but remove the space between "/current/" and "jgi.ReformatReads" I instead get: "Error: Could not find or load main class in=.opt.science.blah.fastq.gz" I've tried it with the fastq file both in and out of the BBMap directory to see if it would help, but got the same error. Any advice would be gratefully accepted ![]() Roy. Quote:
|
|
![]() |
![]() |
![]() |
#35 |
Member
Location: oz Join Date: Apr 2010
Posts: 12
|
![]()
Hi,
I'm trying to use BBMap version 38.08 to retrieve fastq sequences from a bam file. However, I keep getting a problem where the quality output is merely: JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ Here's some lines from the bam file: Code:
HISEQ:378:C7F64ANXX:3:1207:13039:83924 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 15 60 125M = 644 754 GGGGGAGTGATAAAAATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGA bbbbbfffffffffffffffffffffffffffffffffffffbffffffffef_ebffffffffffffffdfcefffffbfffffffbbfffffffdfefd_\ebOdefOWZW_bWefffdWce[ NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:1106:10647:86342 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 30 60 125M = 669 764 ATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAG aabbbbffffffffffffffbbeffffffffffePaaebcfeefffffffeffYeffff\efPe]PePPPbc\bbPedP^PeaP]\dYbc]edcfOPOYd_bfeOcfOYOZ\\OdefffNObeOf NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:1208:17933:95359 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 32 60 125M = 620 713 ATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAA aab_`dedbefe]ZPPP^PePdZbbPPY_cbffbfffdfPePPbPP[d][effdfffcebbffcbYPbP\P[PYdb\d]Pd\NdP]P]\eaP[aeePec_OYYOOOYea\O]_O_^dW]edcfef NM:i:11 MD:Z:14T2C9A1C23T8A0C11G3G23G10G10 MC:Z:125M AS:i:70 XS:i:0 HISEQ:378:C7F64ANXX:3:2305:17850:4846 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 52 60 125M = 625 690 ATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGG abbbbffffffffffffffffffffffffffffeffffffffffffeffffff]db\aeffffffffffffffP]ecaeffff]efffeeff_fffffffffffeffffffffffffffffffef NM:i:9 MD:Z:7A1C23T8A0C11G3G23G10G30 MC:Z:117M8S AS:i:80 XS:i:0 HISEQ:378:C7F64ANXX:3:2205:15096:32122 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 59 60 125M = 675 741 AACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGGTTGAGGATGCATCAAACCTTGGGAAGGAATAAGT `aa_`ffaefP^Z\bPdP^_cebPPYbadfffbfbecac_a_Pef]P\Y[PbedPP[e\ed_facYPefff_efePbYYbP]\PP[deO\NN]e[aOOZbOOYaeef]bcb_OeOWb]ZWbOObO NM:i:2 MD:Z:90T5A28 MC:Z:125M AS:i:115 XS:i:0 HISEQ:378:C7F64ANXX:3:1307:17567:99979 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 68 60 125M = 718 775 GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC ababbfffffffffffffffffffffffffffffffffffcffdfeffff]efffffefcffffffffefffffffecfffaefffffffffffffffffffeffffffffffffffb]e]fa]e NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:2211:20485:88833 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 68 60 125M = 654 711 GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC aabaaecfffffffffffffffffffffffffffffdefffffefffffffdeffffdefffffffffdffffefffffffefffffffff]cfdfdffffffffffcffffffffbeffffeff NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:1212:7300:28109 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 71 60 125M = 636 690 CTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTC bbbbbfffffffffffffffffffffffffffffffffffffaffffeefffff^efffffffffcffPeff]efffffffffffefffffffffffffffdfffffdfffff]bfdffffffff NM:i:7 MD:Z:14T8A0C11G3G23G10G49 MC:Z:125M AS:i:90 XS:i:0 HISEQ:378:C7F64ANXX:3:2303:16430:40702 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 72 60 125M = 694 747 TTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCC abbbaffffffffffffdffffffeffffffffffeefffeffffefffefffcffffffffffdffffff_fffffaefffffff]edfffffffff]fffffffffdffcefffffff]ae_b NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:2116:16496:33002 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 95 60 125M = 722 752 CAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCCAAAACTATATAGATAGATAGAGC bbbbbffffffffffffdffffeeffffffdffffffeffffffffffdfffff]ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 Code:
/home/xub/host/opt/bbmap/bbmap/reformat.sh qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz java -ea -Xmx200m -cp /home/xub/host/opt/bbmap/bbmap/current/ jgi.ReformatReads qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz Executing jgi.ReformatReads [qin=64, qout=33, requiredbits=16, overwrite=t, in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam, out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz] Could not find sambamba. Found samtools 1.8 Input is being processed as unpaired Input: 464 reads 58000 bases Output: 230 reads (49.57%) 28750 bases (49.57%) Time: 0.634 seconds. Reads Processed: 464 0.73k reads/sec Bases Processed: 58000 0.09m bases/sec Here's some of the output: Code:
@HISEQ:378:C7F64ANXX:3:2205:16922:87749 CAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTTGAGCCTATT + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ @HISEQ:378:C7F64ANXX:3:1210:20568:23121 AAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTT + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ @HISEQ:378:C7F64ANXX:3:2212:11893:40357 GGTTTATGGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAAT + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ @HISEQ:378:C7F64ANXX:3:2210:7117:7877 GGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGG + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ Thanks for any advice. Cheers. |
![]() |
![]() |
![]() |
#36 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]()
Did you move any of the bbmap folder contents after you downloaded and uncompressed bbmap code?
Make sure the top level directory with BBMap is in your $PATH. Something like Code:
export PATH=$PATH:/opt/science/BBMap Quote:
|
|
![]() |
![]() |
![]() |
#37 |
Junior Member
Location: USA Join Date: Jun 2020
Posts: 1
|
![]()
Hello,
I have been using the reformat.sh script for a while (nice stuff!) but am running into an issue. I need to get a specific number of reads from a file and am using the `--samplerate` option to do that. For example, if a file has 100 reads, and I need 10, I set the sample rate to 0.1. Unfortunately, it seems that for large files with very specific sample rates, the actual number of reads returned is not the product of the total reads and the sample rate. Here is an example output: Code:
Executing jgi.ReformatReads [samplerate=0.5582187961, in1=SRR2976833.fastq.gz, in2=, out=tempFile1.fastq.gz] Input is being processed as unpaired Input: 509774 reads 122293939 bases Processed: 284893 reads 68330214 bases Output: 284893 reads (55.89%) 68330214 bases (55.87%) Time: 2.295 seconds. Reads Processed: 284k 124.12k reads/sec Bases Processed: 68330k 29.77m bases/sec Please let me know why this is happening and if there is a solution. Thanks! PJ |
![]() |
![]() |
![]() |
#38 | |
Junior Member
Location: lahore Join Date: Apr 2020
Posts: 3
|
![]() Quote:
Its really helpful information thanks. ![]() |
|
![]() |
![]() |
![]() |
Tags |
ascii33, ascii64, bbduk, bbmap, bbtools, fasta, fastq, interleavei33, quality trim, reformat, scarf, subsample |
Thread Tools | |
|
|