SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Introducing BBMerge: A paired-end read merger Brian Bushnell Bioinformatics 126 08-06-2018 03:16 PM
Converter for vcf to bed format ketan_bnf Bioinformatics 4 09-03-2013 04:43 AM
Need Sequence Format Converter byou678 Bioinformatics 5 10-23-2012 12:17 PM
BOAT aligner output format converter? rahul.m.dhodapkar Bioinformatics 0 06-30-2010 06:28 AM
MAQ .map alignment format converter fadista Bioinformatics 0 10-24-2008 05:27 AM

Reply
 
Thread Tools
Old 03-20-2018, 01:17 AM   #21
DrYak
Member
 
Location: South Africa

Join Date: Sep 2013
Posts: 12
Default

Hi,

Can I use reformat or any other bbtools script to split my fasta file into sub-files?

eg X.fa (100 sequences) -> X01.fa X02.fa....X10.fa (each with 10 sequences)?

I don't mind whether I need to select the number of sequences per file or total number of files and it doesn't really matter what order the sequences are in as long as there is no duplication of sequences.

Cheers,
Dave
DrYak is offline   Reply With Quote
Old 03-20-2018, 03:58 AM   #22
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,794
Default

faSplit from Jim Kent's utilities is a much better option for splitting fasta files.

Run faSplit to look at inline help for multiple options available.
GenoMax is offline   Reply With Quote
Old 03-20-2018, 09:07 AM   #23
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Reformat won't do that, but you can use partition.sh:

Code:
partition.sh in=X.fa out=X%.fa ways=10
That will produce 10 output files with an equal number of sequences and no duplication.
Brian Bushnell is offline   Reply With Quote
Old 08-01-2018, 05:47 PM   #24
sunnycqcn
Member
 
Location: Canada

Join Date: Apr 2013
Posts: 17
Default

Hi Brian Bushnell,
when I used mapPacBio.sh for mapping pacbio reads. I met the errors as following:
Exception in thread "Thread-23" java.lang.AssertionError: Read 20, length 10550, exceeds the limit of 6019
You can map the reads in chunks by reformatting to fasta, then mapping with the setting 'fastareadlen=6019'
at align2.AbstractMapThread.run(AbstractMapThread.java:480)

But I did not find how I can reformat it.
Could you help me figure out this issue?
Thanks,
Fuyou
sunnycqcn is offline   Reply With Quote
Old 08-02-2018, 05:08 AM   #25
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,794
Default

You can use
Code:
reformat.sh in=your_file.fastq out=newfile.fa
to convert the reads to fasta format.

That said I think mapPacBio.sh should automatically split reads longer than 6k when it does mapping. Is that not working?
GenoMax is offline   Reply With Quote
Old 08-02-2018, 05:29 AM   #26
sunnycqcn
Member
 
Location: Canada

Join Date: Apr 2013
Posts: 17
Default

Quote:
Originally Posted by GenoMax View Post
You can use
Code:
reformat.sh in=your_file.fastq out=newfile.fa
to convert the reads to fasta format.

That said I think mapPacBio.sh should automatically split reads longer than 6k when it does mapping. Is that not working?
It is not working. I used fasta format.
Thanks,
Fuyou
sunnycqcn is offline   Reply With Quote
Reply

Tags
ascii33, ascii64, bbduk, bbmap, bbtools, fasta, fastq, interleavei33, quality trim, reformat, scarf, subsample

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:13 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO