SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Introducing BBMerge: A paired-end read merger Brian Bushnell Bioinformatics 126 08-06-2018 04:16 PM
Converter for vcf to bed format ketan_bnf Bioinformatics 4 09-03-2013 05:43 AM
Need Sequence Format Converter byou678 Bioinformatics 5 10-23-2012 01:17 PM
BOAT aligner output format converter? rahul.m.dhodapkar Bioinformatics 0 06-30-2010 07:28 AM
MAQ .map alignment format converter fadista Bioinformatics 0 10-24-2008 06:27 AM

Reply
 
Thread Tools
Old 03-20-2018, 02:17 AM   #21
DrYak
Member
 
Location: South Africa

Join Date: Sep 2013
Posts: 12
Default

Hi,

Can I use reformat or any other bbtools script to split my fasta file into sub-files?

eg X.fa (100 sequences) -> X01.fa X02.fa....X10.fa (each with 10 sequences)?

I don't mind whether I need to select the number of sequences per file or total number of files and it doesn't really matter what order the sequences are in as long as there is no duplication of sequences.

Cheers,
Dave
DrYak is offline   Reply With Quote
Old 03-20-2018, 04:58 AM   #22
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,845
Default

faSplit from Jim Kent's utilities is a much better option for splitting fasta files.

Run faSplit to look at inline help for multiple options available.
GenoMax is offline   Reply With Quote
Old 03-20-2018, 10:07 AM   #23
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Reformat won't do that, but you can use partition.sh:

Code:
partition.sh in=X.fa out=X%.fa ways=10
That will produce 10 output files with an equal number of sequences and no duplication.
Brian Bushnell is offline   Reply With Quote
Old 08-01-2018, 06:47 PM   #24
sunnycqcn
Member
 
Location: Canada

Join Date: Apr 2013
Posts: 17
Default

Hi Brian Bushnell,
when I used mapPacBio.sh for mapping pacbio reads. I met the errors as following:
Exception in thread "Thread-23" java.lang.AssertionError: Read 20, length 10550, exceeds the limit of 6019
You can map the reads in chunks by reformatting to fasta, then mapping with the setting 'fastareadlen=6019'
at align2.AbstractMapThread.run(AbstractMapThread.java:480)

But I did not find how I can reformat it.
Could you help me figure out this issue?
Thanks,
Fuyou
sunnycqcn is offline   Reply With Quote
Old 08-02-2018, 06:08 AM   #25
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,845
Default

You can use
Code:
reformat.sh in=your_file.fastq out=newfile.fa
to convert the reads to fasta format.

That said I think mapPacBio.sh should automatically split reads longer than 6k when it does mapping. Is that not working?
GenoMax is offline   Reply With Quote
Old 08-02-2018, 06:29 AM   #26
sunnycqcn
Member
 
Location: Canada

Join Date: Apr 2013
Posts: 17
Default

Quote:
Originally Posted by GenoMax View Post
You can use
Code:
reformat.sh in=your_file.fastq out=newfile.fa
to convert the reads to fasta format.

That said I think mapPacBio.sh should automatically split reads longer than 6k when it does mapping. Is that not working?
It is not working. I used fasta format.
Thanks,
Fuyou
sunnycqcn is offline   Reply With Quote
Old 01-30-2019, 01:54 PM   #27
pepe84
Junior Member
 
Location: Canada

Join Date: Jul 2014
Posts: 4
Default

hello folks, I am trying to work on a FASTQ file using reformat.sh, although I have correctly installed Java and tested it in the command line, I still can't get it to work. It seems the problem is that I don't have the FASTQ file in the same directory as the BBMap folder, could that be an issue?
pepe84 is offline   Reply With Quote
Old 01-30-2019, 03:20 PM   #28
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 464
Default

pepe84, do you provide a path to the file? Please copy your command as tried, and then copy the error message.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 01-31-2019, 05:26 AM   #29
pepe84
Junior Member
 
Location: Canada

Join Date: Jul 2014
Posts: 4
Default

here is the command:
java -cp C:\BBMap\current\jgi.ReformatReads in=C:\BBMap\resources\SRRXXXXX.fastq out1=EFB_R1.fq out2=EFB_R2.fq

And here is the error:
Error: Could not find or load main class in=C:\BBMap\resources\SRRXXXXX.fastq

Just an FYI I am using the command line on windows.

Thanks, I appreciate any help


Quote:
Originally Posted by SNPsaurus View Post
pepe84, do you provide a path to the file? Please copy your command as tried, and then copy the error message.
pepe84 is offline   Reply With Quote
Old 02-14-2019, 12:14 AM   #30
tolot27
Junior Member
 
Location: Germany

Join Date: Jan 2012
Posts: 3
Default deinterleave with singletons

Hi!

I have a interleaved fastq containing unmapped reads produced by segemehl -u. I want to deinterleave it into the two mate pair files as well as removing/saving the singletons into a separate file.

Currently, reformat.sh cannot deal with it, even if I give outsingle= as parameter. The header contains the strand information (i. e. 2:N:0:2).

Is there some way to get at least the pairing reads extracted without singletons in between?

--
Kind regards,
Mathias
tolot27 is offline   Reply With Quote
Old 02-14-2019, 06:50 AM   #31
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,845
Default

You could use `repair.sh` to separate the singletons out afterwards.
GenoMax is offline   Reply With Quote
Old 02-15-2019, 12:45 AM   #32
tolot27
Junior Member
 
Location: Germany

Join Date: Jan 2012
Posts: 3
Default

Quote:
Originally Posted by GenoMax View Post
You could use `repair.sh` to separate the singletons out afterwards.
Thanks for pointing me into this direction. Unfortunately, repair.sh did not produce well ordered files. Fortunately, bbsplitpairs.sh could be used instead of the reformat.sh/repair.sh combination and extracted the correct pairing reads as well as singletons into a separate file.
tolot27 is offline   Reply With Quote
Reply

Tags
ascii33, ascii64, bbduk, bbmap, bbtools, fasta, fastq, interleavei33, quality trim, reformat, scarf, subsample

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:52 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO