SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sorting of fasta file according to gene name in alphabetical order garethboy Bioinformatics 9 10-06-2016 02:59 PM
Issue with FASTA header in QIIME Jluis Bioinformatics 3 08-10-2016 08:21 AM
How do I remove a part of the header of a fasta file in perl? Katty1 Bioinformatics 3 05-13-2016 07:11 AM
Adding count number at the end of header in a fasta file garethboy Bioinformatics 3 04-09-2015 05:58 AM
fasta header polijana Bioinformatics 2 03-31-2013 03:01 PM

Reply
 
Thread Tools
Old 10-06-2016, 10:06 AM   #1
zillur
Senior Member
 
Location: Puerto Rico

Join Date: Sep 2014
Posts: 106
Default Sorting fasta file according to header

Hi there,
I have a fasta file like this:
Code:
[zillur@genomics filter]$ head new_12.fasta 
>000000M00365:7:000000000-A48JK:1:1110:10044:9619
TACGGAGGGTGCAAGCGTTATCCGGAATCACTGGGTTTAAAGGGTGCGTAGGCGGATATATAAGTCAGAGGTGAAAGCTCGCAGCTTAACTGCGGAATTGCCTTTGATACTGTTTATCTTGAATTATGTTGAGGTTAGCGGAATGAGTCAT
>000000M00365:7:000000000-A48JK:1:2105:14983:8496
TACGGAGGGGGTTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTACGTAGGCGGATTGGAAAGTATGGGGTGAAATCCCAGGGCTCAACCCTGGAACTGCCCTGTAAACTATCAGTCTAGAGTTCTGGAGAGGTGAGTGGAATTGCTAGG
>000000M00365:7:000000000-A48JK:1:2113:12381:28279
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTTGATAAGTCAGATGTGAAATCCCCGGGCTTAACCTGGGAACTGCATTTGATACTGTCAGACTAGAGTATGTTAGAGGAATGCGGAATTCCGGGT
>000001M00365:7:000000000-A48JK:1:1110:15899:9619
TACGAACTGTGCAAACGTTATTCGGAATCACTGGGCTTAAAGGGTGCGTAGGCGGGTTTGTAAGTCAGAGGTGAAAGTTTGCAGCTTAACTGTAAAATTGCCTTTGAAACTGTAGAACTTGAGTAGCGTTGAGGTCAGCGGAATGTGACAT
>000001M00365:7:000000000-A48JK:1:2105:15157:8497
TACGAAGGTCCCAAGCGTTATTCGGAATCACTGGGCGTAAAGGGAGCGTAGGCGGCGTGGAAAGTCAGATGTGAAATCTCAAGGCTCAACCTTGAAACTGCATCCGATACTTCCATGCTAGAGGACTGGAGAGGTGTTTGGAATTATCGGT
I want to sort this file according to header informations. How can I do this?

Best Regards
Zillur
zillur is offline   Reply With Quote
Old 10-06-2016, 11:26 AM   #2
wdecoster
Member
 
Location: Antwerp, Belgium

Join Date: Oct 2015
Posts: 97
Default

Can you be more specific about which header information? Alphabetical sorting?
wdecoster is offline   Reply With Quote
Old 10-06-2016, 11:39 AM   #3
zillur
Senior Member
 
Location: Puerto Rico

Join Date: Sep 2014
Posts: 106
Default

Thank you very much. alphabetically/numerically whichever convenient.

Best Regards
Zillur
zillur is offline   Reply With Quote
Old 10-06-2016, 11:55 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

And the reason you want to do this, if I may ask?
GenoMax is offline   Reply With Quote
Old 10-06-2016, 12:00 PM   #5
zillur
Senior Member
 
Location: Puerto Rico

Join Date: Sep 2014
Posts: 106
Default

Thanks.
Quote:
And the reason you want to do this, if I may ask?
Yeah sure. I wanted to create fastq file using my .qual ahd fasta file using qiime. But it gaves me:
Code:
KeyError: 'QUAL header (M00365:7:000000000-A48JK:1:1101:14885:1320) does not match FASTA header (M00365:7:000000000-A48JK:1:1101:16466:1388)
In my qual file I have many other sequences including my fasta. So, I think sorting may resolve the issue. I appreciate your suggestions.

Best Regards
Zillur
zillur is offline   Reply With Quote
Old 10-06-2016, 09:29 PM   #6
Persistent LABS
Member
 
Location: Pune, India

Join Date: Apr 2016
Posts: 20
Default

I guess sort on linux will work.
cat file.fasta|paste - -|sort|sed 's/\t/\n/g'
Try this.
__________________
Persistent LABS
Persistent LABS is offline   Reply With Quote
Old 10-07-2016, 03:22 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

Following is untested but you could give it a try and see if it works. It may avoid the sort etc. You will find reformat.sh in BBMap suite.

Code:
reformat.sh in=your_fasta_file.fa qfin=your_qual_file.qual out=fastq_format_file.fq
GenoMax is offline   Reply With Quote
Old 10-07-2016, 11:56 AM   #8
zillur
Senior Member
 
Location: Puerto Rico

Join Date: Sep 2014
Posts: 106
Default

Thank your very much. I have tried this:
Quote:
cat file.fasta|paste - -|sort|sed 's/\t/\n/g'
But it doesn't resolve all:
Code:
(qiime191) [zillur@genomics final]$ head new_sorted_1.fasta 
>M00365:7:000000000-A48JK:1:1101:10000:14343
TACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTGCGTAGGCGGATTATTAAGTTAGGGGTGAAATCCCGAGGCTCAACCTCGGAACTGCCCTTAAAACTGTTGGTCTTGAGTTCTGGAGAGGTGAGTGGAATTGCTAGT
>M00365:7:000000000-A48JK:1:1101:10000:18084
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGCTAGGTCAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGATACTGCCTAGCTAGAGTATGTTAGAGGAATGCGGAATTCCAGGT
>M00365:7:000000000-A48JK:1:1101:10000:25105
TACGAAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGAGTTCGTAGGCGGGTTATTAAGTCAGATGTGAAATCCCAGGGCTCAACCTTGGAACTGCATTTGAAACTGGTAACCTAGAGACTAGGAGAGGTCAGTGGAATACCGAGT
>M00365:7:000000000-A48JK:1:1101:10000:5055
CACGTAGGGGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCTGTTCAGTAAGTCAGGTGTGAAAATCCAAGGCTCAACCTTGGGACGCCACCTGATACCGCTGTGACTAGAGTCCGGTAGAGGAGATTGGAATTCCTGG
>M00365:7:000000000-A48JK:1:1101:10001:16084
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGCTAGGTCAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGATACTGCCTAGCTAGAGTATGTTAGAGGATTGCGGAATTCCAGGT
refomart.sh gives me:
Code:
[zillur@genomics final]$ ./bbmap/reformat.sh in=new_15.fasta qfin=qual_.1.qual out=f_nw_15_ql_.1.fq
java -ea -Xmx111g -cp /home/zillur/Desktop/zillur/yadira/study_1799_split_library_seqs_and_mapping/filter/final/bbmap/current/ jgi.ReformatReads in=new_15.fasta qfin=qual_.1.qual out=f_nw_15_ql_.1.fq
Executing jgi.ReformatReads [in=new_15.fasta, qfin=qual_.1.qual, out=f_nw_15_ql_.1.fq]

Input is being processed as unpaired
Exception in thread "Thread-1" java.lang.AssertionError: Quality and Base headers differ for read 0
	at stream.FastaQualReadInputStream.toReadList(FastaQualReadInputStream.java:128)
	at stream.FastaQualReadInputStream.toReads(FastaQualReadInputStream.java:110)
	at stream.FastaQualReadInputStream.fillBuffer(FastaQualReadInputStream.java:94)
	at stream.FastaQualReadInputStream.hasMore(FastaQualReadInputStream.java:54)
	at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:643)
	at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
What should I do now?

Best Regards
Zillur
zillur is offline   Reply With Quote
Old 10-07-2016, 12:27 PM   #9
khericlim
Junior Member
 
Location: Boston, MA

Join Date: Oct 2016
Posts: 1
Default

When you sort the fasta file, did you also sort the qual file?

Quote:
Originally Posted by zillur View Post
In my qual file I have many other sequences including my fasta.
What do you mean by having other sequences in your qual file?
khericlim is offline   Reply With Quote
Old 10-07-2016, 02:05 PM   #10
dgscofield
Member
 
Location: Uppsala, Sweden

Join Date: Nov 2010
Posts: 27
Default

If you have BioPerl ≥ 1.6.922 and Sort::Naturally, then

https://github.com/douglasgscofield/...ipts/fastaSort

shows how to sort on sequence name, using natural sort as it seems you require.
dgscofield is offline   Reply With Quote
Old 10-07-2016, 11:53 PM   #11
Persistent LABS
Member
 
Location: Pune, India

Join Date: Apr 2016
Posts: 20
Default

Quote:
Originally Posted by zillur View Post
Thank your very much. I have tried this: But it doesn't resolve all:
Code:
(qiime191) [zillur@genomics final]$ head new_sorted_1.fasta 
>M00365:7:000000000-A48JK:1:1101:10000:14343
TACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTGCGTAGGCGGATTATTAAGTTAGGGGTGAAATCCCGAGGCTCAACCTCGGAACTGCCCTTAAAACTGTTGGTCTTGAGTTCTGGAGAGGTGAGTGGAATTGCTAGT
>M00365:7:000000000-A48JK:1:1101:10000:18084
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGCTAGGTCAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGATACTGCCTAGCTAGAGTATGTTAGAGGAATGCGGAATTCCAGGT
>M00365:7:000000000-A48JK:1:1101:10000:25105
TACGAAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGAGTTCGTAGGCGGGTTATTAAGTCAGATGTGAAATCCCAGGGCTCAACCTTGGAACTGCATTTGAAACTGGTAACCTAGAGACTAGGAGAGGTCAGTGGAATACCGAGT
>M00365:7:000000000-A48JK:1:1101:10000:5055
CACGTAGGGGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCTGTTCAGTAAGTCAGGTGTGAAAATCCAAGGCTCAACCTTGGGACGCCACCTGATACCGCTGTGACTAGAGTCCGGTAGAGGAGATTGGAATTCCTGG
>M00365:7:000000000-A48JK:1:1101:10001:16084
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGCTAGGTCAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGATACTGCCTAGCTAGAGTATGTTAGAGGATTGCGGAATTCCAGGT
refomart.sh gives me:
Code:
[zillur@genomics final]$ ./bbmap/reformat.sh in=new_15.fasta qfin=qual_.1.qual out=f_nw_15_ql_.1.fq
java -ea -Xmx111g -cp /home/zillur/Desktop/zillur/yadira/study_1799_split_library_seqs_and_mapping/filter/final/bbmap/current/ jgi.ReformatReads in=new_15.fasta qfin=qual_.1.qual out=f_nw_15_ql_.1.fq
Executing jgi.ReformatReads [in=new_15.fasta, qfin=qual_.1.qual, out=f_nw_15_ql_.1.fq]

Input is being processed as unpaired
Exception in thread "Thread-1" java.lang.AssertionError: Quality and Base headers differ for read 0
	at stream.FastaQualReadInputStream.toReadList(FastaQualReadInputStream.java:128)
	at stream.FastaQualReadInputStream.toReads(FastaQualReadInputStream.java:110)
	at stream.FastaQualReadInputStream.fillBuffer(FastaQualReadInputStream.java:94)
	at stream.FastaQualReadInputStream.hasMore(FastaQualReadInputStream.java:54)
	at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:643)
	at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
What should I do now?

Best Regards
Zillur
The sort example has sorted your data alphabetically. If you try to sort your qual file, I think you will get the same order of headers.
__________________
Persistent LABS
Persistent LABS is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO