SEQanswers

Go Back   SEQanswers > Applications Forums > Metagenomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CLC paired read orientation- max. and min. read distance gevielr Bioinformatics 1 11-10-2013 02:56 PM
OTU representatives analysis in MEGAN Sam299 Bioinformatics 1 11-22-2012 03:33 PM
SOLiD paired-end read analysis with BFAST 0.7.0a idonaldson Bioinformatics 4 03-20-2012 08:31 PM
Average Read Coverage for 454 paired end read data lisa1102 Core Facilities 8 10-18-2011 08:40 AM
paired-read transcriptome data analysis joseph Illumina/Solexa 1 09-25-2008 10:50 AM

Reply
 
Thread Tools
Old 11-11-2015, 12:59 PM   #1
salamay
Member
 
Location: canada

Join Date: May 2014
Posts: 20
Exclamation Paired-read analysis in MEGAN

I am attempting to utilize blastn output from paired end files for MEGAN analysis. However, selecting the paired-read checkbox in the "import from blast" window does not seem to work. I have tried changing file names to correspond to an R1 R2 suffix scheme but nothing seems to work. Has anyone been able to get this to work? Also would there be anyway to do this through the MEGAN command-line?

I am using MEGAN 5.1 but I also have MEGAN 5.0.3 and both behave similarly.

Thanks!
salamay is offline   Reply With Quote
Old 02-02-2016, 09:39 AM   #2
A_sapidissima
Member
 
Location: West Virginia, Eastern Panhandle

Join Date: Apr 2014
Posts: 11
Default

Did you ever figure this out? I am having the same problem and its driving me nuts. The first issue I ran into was that the read names had spaces in them. That lead to an error when I imported the reads and blast files that the read names were not unique. I used awk to remove the spaces in the names of the reads.

Read 1s are named now with no spaces as follows:
HTML Code:
>NS500476:25:HLLCNBGXX:1:11101:7199:4320_1:N:0:6/1
GTCGGAAACGACCGGGTGCTCGGAGTGCCGGTTCTGGTCATCCTCGCCGCGATCTGTTGCATCGTACTGC
ATTACATGCTGTCGCAGACCCGTTTCGGCCAGCACACCTATGCCATGGGCGCCAGCAAGGCCGCCGCAAG
CCGCGCCGGCA
Read 2s are named now with no spaces as follows:
HTML Code:
>NS500476:25:HLLCNBGXX:1:11101:7199:4320_1:N:0:6/2
CCAGCGGAAAATCGGCCTGTGTACAGAACCCCCGCGATACCCGCAATGACGGCAGAGAGAATGTAGATCT
TCAGAGTCAGAATCTTTATGTAGAAGCCTGAGCGACTTTAGGAGTACTTGCTGGCGCCGATGGCATAGGT
GTGCTGCCAGA
Blast tabular output R1 with no spaces are as follows:
HTML Code:
NS500476:25:HLLCNBGXX:1:11101:7199:4320_1:N:0:6/1 gi|652682316|ref|WP_027031178.1| 67.6 37 12 0 1 111 187 223 1.6e-05 56.2
Blast tabular output R2 with no spaces are as follows:
HTML Code:
NS500476:25:HLLCNBGXX:1:11101:18020:4335_1:N:0:6/2 gi|938913364|ref|WP_054696212.1| 68.1 47 15 0 1 141 101 147 2.8e-10 72.0
The suffix for read 1 and read 2 looks like it should just be '1' and '2' to me. However that does not work. I have also tried '/1' and '/2', and '_1:N:0:6/1' and '_1:N:0:6/2' to no avail. The end result is that no reads are ever mapped to the blast hits. What is the correct suffix to enter? Do the read names need to be in a different format with a different suffix?
A_sapidissima is offline   Reply With Quote
Old 02-03-2016, 11:04 AM   #3
A_sapidissima
Member
 
Location: West Virginia, Eastern Panhandle

Join Date: Apr 2014
Posts: 11
Default

Well, I emailed Daniel Huson, and he was super helpful with this. It turns out that I had made an error in my tabular blast output file where it was space delimited instead of tab delimited which caused problems.

So, for me, my original reads were named like this for read 1 of the first pair:
NS500476:25:HLLCNBGXX:1:11101:7199:4320 1:N:0:6/1

read 2 is similar except is has a /2 on the end. There is not supposed to be a space between the 4320 and the 1. In the tabular blast output file, the read names end where the space is. Similarly, the fasta files of the reads are not unique because the space causes both members of a pair to be named the same without the suffix part.

Instead of re-running the blast, I filled in the spaces in the read names in the fasta files with underscores. Then, in each tabular blast file, I added '_1:N:0:6/1" to the names of the reads for read1, and '_1:N:0:6/2' to the end of the read names for read 2. This is where I screwed up and replaced the tabs with spaces in the tabular blast file. Make sure those stay as tabs.

Now, when I import my reads, and I specify the suffix as '/1' for read 1 and '/2' for read 2, it works great. If you run into a problem, make sure your read names have no spaces in them, and if you have mucked around in your files, make sure you haven't messed up the formatting.
A_sapidissima is offline   Reply With Quote
Old 02-05-2016, 07:28 AM   #4
salamay
Member
 
Location: canada

Join Date: May 2014
Posts: 20
Default

Quote:
Originally Posted by A_sapidissima View Post
Well, I emailed Daniel Huson, and he was super helpful with this. It turns out that I had made an error in my tabular blast output file where it was space delimited instead of tab delimited which caused problems.

So, for me, my original reads were named like this for read 1 of the first pair:
NS500476:25:HLLCNBGXX:1:11101:7199:4320 1:N:0:6/1

read 2 is similar except is has a /2 on the end. There is not supposed to be a space between the 4320 and the 1. In the tabular blast output file, the read names end where the space is. Similarly, the fasta files of the reads are not unique because the space causes both members of a pair to be named the same without the suffix part.

Instead of re-running the blast, I filled in the spaces in the read names in the fasta files with underscores. Then, in each tabular blast file, I added '_1:N:0:6/1" to the names of the reads for read1, and '_1:N:0:6/2' to the end of the read names for read 2. This is where I screwed up and replaced the tabs with spaces in the tabular blast file. Make sure those stay as tabs.

Now, when I import my reads, and I specify the suffix as '/1' for read 1 and '/2' for read 2, it works great. If you run into a problem, make sure your read names have no spaces in them, and if you have mucked around in your files, make sure you haven't messed up the formatting.
I never did end up solving this with MEGAN, so thanks for posting your solution! I ended up analyzing each member of the paired reads separately and then I just wrote my own script to combine the results by looking at which read had a lower evalue and using that assignment for the pair, if both members had differing assignments. I'm sure the way MEGAN does it is a bit more sophisticated.
salamay is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO