SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Knowledge about SOAPdenovo-trans pbrand Bioinformatics 5 02-07-2017 09:25 PM
RNAseq Alternative Splicing awk Bioinformatics 33 06-10-2015 03:26 AM
SOAPdenovo-Trans: Seg fault kenietz De novo discovery 5 05-31-2013 05:53 PM
RPKM for alternative splicing genes? baohua100 Bioinformatics 2 09-15-2009 12:57 AM

Reply
 
Thread Tools
Old 07-25-2012, 08:16 AM   #1
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default SOAPdenovo-trans alternative splicing

Hi,
I am working with several assemblers to find the best one for my RNA-Seq data.
Besides Trinity and Oases I used SOAPdenovo -trans.

While Oases found massive sequences that have possible alternative splice products, SOAPdenovo-trans did not find a single one. I used 12 different k-mers from 19 to 89, e 1,3,5 and d 1,3,5 with all combinations. I allowed up to 10 alternative splicing products.

Is this behavior normal for this program?

Cheers,
Philipp
pbrand is offline   Reply With Quote
Old 08-06-2012, 02:54 PM   #2
Kate.W
Member
 
Location: Canada

Join Date: Aug 2012
Posts: 10
Default

Hello,

I have noticed some weird behaviour too using soapdenovo-Trans, still I can't answer to your question. Anyhow, how could you try several k-mer sizes going from 19 to 89 as, I believe, Soapdenovo-trans is limited to 31? Cheers,

K8
Kate.W is offline   Reply With Quote
Old 08-06-2012, 03:30 PM   #3
Kate.W
Member
 
Location: Canada

Join Date: Aug 2012
Posts: 10
Default

oops... didn't see the SOAPdenovo-Trans-127mer file...
Kate.W is offline   Reply With Quote
Old 10-09-2012, 01:48 AM   #4
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

Which file is splice variants supposed to be in? I don't even see a file that would contain that information in my output data

This thread claims that the trans and regular SOAP are giving the same output.

I tried the 31kmer version and the 127mer version of this and in both cases I do not get the sequence of all the contigs. The .readOnContig and the .cnt2Read files show that all the contigs have reads but the .contig file is missing the sequence data of many contigs, including several with a high read count.
Jeremy is offline   Reply With Quote
Old 10-09-2012, 03:43 AM   #5
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default

Hi Jeremy,
the variants, should be saved to the .scafSeq file. But with my data I didn't manage to get any of them. Maybe it is because I used single-end reads..
What kind of reads do you have?

Philipp
pbrand is offline   Reply With Quote
Old 10-09-2012, 06:12 PM   #6
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

I have paired end reads. Ah yes I see them, looks like I got a few splice variants. The locus numbering is not consecutive. Is it the same for you?
Code:
>scaffold1 Locus_0_0 5 891 COMPLEX
153823     0          -   175 
177411     154        -   246 
169783     410        +   212 
125249     614        +   133 
122882     760        +   131 
>scaffold2 Locus_0_1 2 478 COMPLEX
153823     0          -   175 
171731     260        +   218 
>scaffold3 Locus_0_2 4 783 COMPLEX
169195     0          +   210 
169783     302        +   212 
125249     506        +   133 
122882     652        +   131 
>scaffold4 Locus_1_0 2 406 LINEAR
122884     0          +   131 
154865     260        -   177 
>scaffold5 Locus_4_0 3 698 LINEAR
174798     0          +   230 
180285     272        -   274 
122890     598        +   131 
>scaffold6 Locus_5_0 3 490 LINEAR
158619     0          -   184 
164579     190        +   197 
122892     390        +   131 
>scaffold7 Locus_6_0 2 354 LINEAR
125953     0          +   134 
122894     254        +   131 
>scaffold8 Locus_8_0 2 428 LINEAR
122898     0          +   131 
168645     251        +   208
Is there some site or blog that gives all the details of the output files?
Jeremy is offline   Reply With Quote
Old 10-09-2012, 11:34 PM   #7
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default

As I said, I haven't managed to get any splice variants so I can't say anything about it
I also haven't found a site with suitable information on the outputs, yet.
But there is a command that controls the amount of splice variants. -t it's 5 on default. Maybe your results change when you increase the -t value.

Could you post your configuration file? I am curious to see whether I made a mistake writing mine.

Philipp
pbrand is offline   Reply With Quote
Old 10-10-2012, 12:51 AM   #8
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

Even without splice variants you should still get locus information in the .scaff file right? I only just started playing with the program so for the moment everything is almost on default. I did change the -G option though since I noticed that my insert sizes have a wider spread than 50.

config
Code:
max_rd_len=150
[LIB]
avg_ins=320
asm_flags=3
reverse_seq=0
rank=1
q1=*file.fastq
q2=*file.fastq
commands
Code:
SOAPdenovo-Trans-31kmer all -s config -K 31 -G 100 -o *file
I think I just figured out how the contigs work, for something as strange as that they really should have some output descriptions.

The .newcontigindex lists all contigs in consecutive order (no missing numbers), both the .readoncontig and .cnt2read files show that reads were used to makes all contigs BUT the .contig file only has about half the contigs. The .newcontigindex has a 2 for contigs that I do get a sequence for and a 0 for contigs that are not in the output file. I think contigs with a 0 are assembled using the reverse reads then the reverse complement is integrated into the forward contigs.

But, the confusing part for me was that not all of my contigs had a reverse complement. This information is in .contigindex which lists how many reverse complements each of the forward contigs has. I have 49 forward contigs without a reverse complement making the contig numbering system in the .contig file appear random. I can't find the file that lists which contig was the reverse complement to which, based on read count per contig it looks to be often consecutively numbered contigs, but not always. sigh.

Would be nice to know what the headers are for the .links file too ...

I think I'll just try a few other programs, I have no idea exactly what this one did.

Last edited by Jeremy; 10-10-2012 at 12:53 AM.
Jeremy is offline   Reply With Quote
Old 10-10-2012, 01:15 AM   #9
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default

Strangely, I do not have entries in .ctg2read, .readONcontig and .links..
It must have something to do with single-end paired-end libraries, because my config file doesn't seem to be incomplete.

I also worked with Trinity and Oases and both did a better job than SOAP, anyway.
Maybe this thread helps http://seqanswers.com/forums/showthread.php?t=17959

Cheers
pbrand is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:39 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO