SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Introducing BBMerge: A paired-end read merger Brian Bushnell Bioinformatics 128 02-24-2019 06:49 AM
Converter for vcf to bed format ketan_bnf Bioinformatics 4 09-03-2013 04:43 AM
Need Sequence Format Converter byou678 Bioinformatics 5 10-23-2012 12:17 PM
BOAT aligner output format converter? rahul.m.dhodapkar Bioinformatics 0 06-30-2010 06:28 AM
MAQ .map alignment format converter fadista Bioinformatics 0 10-24-2008 05:27 AM

Reply
 
Thread Tools
Old 03-20-2018, 01:17 AM   #21
DrYak
Member
 
Location: South Africa

Join Date: Sep 2013
Posts: 12
Default

Hi,

Can I use reformat or any other bbtools script to split my fasta file into sub-files?

eg X.fa (100 sequences) -> X01.fa X02.fa....X10.fa (each with 10 sequences)?

I don't mind whether I need to select the number of sequences per file or total number of files and it doesn't really matter what order the sequences are in as long as there is no duplication of sequences.

Cheers,
Dave
DrYak is offline   Reply With Quote
Old 03-20-2018, 03:58 AM   #22
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,881
Default

faSplit from Jim Kent's utilities is a much better option for splitting fasta files.

Run faSplit to look at inline help for multiple options available.
GenoMax is offline   Reply With Quote
Old 03-20-2018, 09:07 AM   #23
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Reformat won't do that, but you can use partition.sh:

Code:
partition.sh in=X.fa out=X%.fa ways=10
That will produce 10 output files with an equal number of sequences and no duplication.
Brian Bushnell is offline   Reply With Quote
Old 08-01-2018, 05:47 PM   #24
sunnycqcn
Member
 
Location: Canada

Join Date: Apr 2013
Posts: 17
Default

Hi Brian Bushnell,
when I used mapPacBio.sh for mapping pacbio reads. I met the errors as following:
Exception in thread "Thread-23" java.lang.AssertionError: Read 20, length 10550, exceeds the limit of 6019
You can map the reads in chunks by reformatting to fasta, then mapping with the setting 'fastareadlen=6019'
at align2.AbstractMapThread.run(AbstractMapThread.java:480)

But I did not find how I can reformat it.
Could you help me figure out this issue?
Thanks,
Fuyou
sunnycqcn is offline   Reply With Quote
Old 08-02-2018, 05:08 AM   #25
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,881
Default

You can use
Code:
reformat.sh in=your_file.fastq out=newfile.fa
to convert the reads to fasta format.

That said I think mapPacBio.sh should automatically split reads longer than 6k when it does mapping. Is that not working?
GenoMax is offline   Reply With Quote
Old 08-02-2018, 05:29 AM   #26
sunnycqcn
Member
 
Location: Canada

Join Date: Apr 2013
Posts: 17
Default

Quote:
Originally Posted by GenoMax View Post
You can use
Code:
reformat.sh in=your_file.fastq out=newfile.fa
to convert the reads to fasta format.

That said I think mapPacBio.sh should automatically split reads longer than 6k when it does mapping. Is that not working?
It is not working. I used fasta format.
Thanks,
Fuyou
sunnycqcn is offline   Reply With Quote
Old 01-30-2019, 12:54 PM   #27
pepe84
Junior Member
 
Location: Canada

Join Date: Jul 2014
Posts: 4
Default

hello folks, I am trying to work on a FASTQ file using reformat.sh, although I have correctly installed Java and tested it in the command line, I still can't get it to work. It seems the problem is that I don't have the FASTQ file in the same directory as the BBMap folder, could that be an issue?
pepe84 is offline   Reply With Quote
Old 01-30-2019, 02:20 PM   #28
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 464
Default

pepe84, do you provide a path to the file? Please copy your command as tried, and then copy the error message.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 01-31-2019, 04:26 AM   #29
pepe84
Junior Member
 
Location: Canada

Join Date: Jul 2014
Posts: 4
Default

here is the command:
java -cp C:\BBMap\current\jgi.ReformatReads in=C:\BBMap\resources\SRRXXXXX.fastq out1=EFB_R1.fq out2=EFB_R2.fq

And here is the error:
Error: Could not find or load main class in=C:\BBMap\resources\SRRXXXXX.fastq

Just an FYI I am using the command line on windows.

Thanks, I appreciate any help


Quote:
Originally Posted by SNPsaurus View Post
pepe84, do you provide a path to the file? Please copy your command as tried, and then copy the error message.
pepe84 is offline   Reply With Quote
Old 02-13-2019, 11:14 PM   #30
tolot27
Junior Member
 
Location: Germany

Join Date: Jan 2012
Posts: 3
Default deinterleave with singletons

Hi!

I have a interleaved fastq containing unmapped reads produced by segemehl -u. I want to deinterleave it into the two mate pair files as well as removing/saving the singletons into a separate file.

Currently, reformat.sh cannot deal with it, even if I give outsingle= as parameter. The header contains the strand information (i. e. 2:N:0:2).

Is there some way to get at least the pairing reads extracted without singletons in between?

--
Kind regards,
Mathias
tolot27 is offline   Reply With Quote
Old 02-14-2019, 05:50 AM   #31
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,881
Default

You could use `repair.sh` to separate the singletons out afterwards.
GenoMax is offline   Reply With Quote
Old 02-14-2019, 11:45 PM   #32
tolot27
Junior Member
 
Location: Germany

Join Date: Jan 2012
Posts: 3
Default

Quote:
Originally Posted by GenoMax View Post
You could use `repair.sh` to separate the singletons out afterwards.
Thanks for pointing me into this direction. Unfortunately, repair.sh did not produce well ordered files. Fortunately, bbsplitpairs.sh could be used instead of the reformat.sh/repair.sh combination and extracted the correct pairing reads as well as singletons into a separate file.
tolot27 is offline   Reply With Quote
Old 02-26-2019, 10:29 AM   #33
milw
Director NGS Services, Lucigen
 
Location: Madison WI USA

Join Date: Dec 2013
Posts: 12
Default I'm confused about sam/bam options

In version 37.52, the parameters under Sam and bam processing options are confusing to me
Sam and bam processing options:

mappedonly=f Toss unmapped reads.
unmappedonly=f Toss mapped reads.
pairedonly=f Toss reads that are not mapped as proper pairs.
unpairedonly=f Toss reads that are mapped as proper pairs.
primaryonly=f Toss secondary alignments. Set this to true for sam to fastq conversion.

if 'mappedonly' is false, shouldn't that mean to KEEP unmapped plus mapped reads?
Likewise, 'pairedonly' false (to me) means KEEP unpaired and paired

In the end, I want my bam to only contain paired reads, so I've been running it with 'pairedonly=t' , but reformat.sh says 'input is being processed as unpaired' for my bam file.
__________________
Scott Monsma
Sr Scientist at Lucigen

Last edited by milw; 02-26-2019 at 10:31 AM.
milw is offline   Reply With Quote
Reply

Tags
ascii33, ascii64, bbduk, bbmap, bbtools, fasta, fastq, interleavei33, quality trim, reformat, scarf, subsample

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO