Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOAPdenovo-trans alternative splicing

    Hi,
    I am working with several assemblers to find the best one for my RNA-Seq data.
    Besides Trinity and Oases I used SOAPdenovo -trans.

    While Oases found massive sequences that have possible alternative splice products, SOAPdenovo-trans did not find a single one. I used 12 different k-mers from 19 to 89, e 1,3,5 and d 1,3,5 with all combinations. I allowed up to 10 alternative splicing products.

    Is this behavior normal for this program?

    Cheers,
    Philipp

  • #2
    Hello,

    I have noticed some weird behaviour too using soapdenovo-Trans, still I can't answer to your question. Anyhow, how could you try several k-mer sizes going from 19 to 89 as, I believe, Soapdenovo-trans is limited to 31? Cheers,

    K8

    Comment


    • #3
      oops... didn't see the SOAPdenovo-Trans-127mer file...

      Comment


      • #4
        Which file is splice variants supposed to be in? I don't even see a file that would contain that information in my output data

        This thread claims that the trans and regular SOAP are giving the same output.

        I tried the 31kmer version and the 127mer version of this and in both cases I do not get the sequence of all the contigs. The .readOnContig and the .cnt2Read files show that all the contigs have reads but the .contig file is missing the sequence data of many contigs, including several with a high read count.

        Comment


        • #5
          Hi Jeremy,
          the variants, should be saved to the .scafSeq file. But with my data I didn't manage to get any of them. Maybe it is because I used single-end reads..
          What kind of reads do you have?

          Philipp

          Comment


          • #6
            I have paired end reads. Ah yes I see them, looks like I got a few splice variants. The locus numbering is not consecutive. Is it the same for you?
            Code:
            >scaffold1 Locus_0_0 5 891 COMPLEX
            153823     0          -   175 
            177411     154        -   246 
            169783     410        +   212 
            125249     614        +   133 
            122882     760        +   131 
            >scaffold2 Locus_0_1 2 478 COMPLEX
            153823     0          -   175 
            171731     260        +   218 
            >scaffold3 Locus_0_2 4 783 COMPLEX
            169195     0          +   210 
            169783     302        +   212 
            125249     506        +   133 
            122882     652        +   131 
            >scaffold4 Locus_1_0 2 406 LINEAR
            122884     0          +   131 
            154865     260        -   177 
            >scaffold5 Locus_4_0 3 698 LINEAR
            174798     0          +   230 
            180285     272        -   274 
            122890     598        +   131 
            >scaffold6 Locus_5_0 3 490 LINEAR
            158619     0          -   184 
            164579     190        +   197 
            122892     390        +   131 
            >scaffold7 Locus_6_0 2 354 LINEAR
            125953     0          +   134 
            122894     254        +   131 
            >scaffold8 Locus_8_0 2 428 LINEAR
            122898     0          +   131 
            168645     251        +   208
            Is there some site or blog that gives all the details of the output files?

            Comment


            • #7
              As I said, I haven't managed to get any splice variants so I can't say anything about it
              I also haven't found a site with suitable information on the outputs, yet.
              But there is a command that controls the amount of splice variants. -t it's 5 on default. Maybe your results change when you increase the -t value.

              Could you post your configuration file? I am curious to see whether I made a mistake writing mine.

              Philipp

              Comment


              • #8
                Even without splice variants you should still get locus information in the .scaff file right? I only just started playing with the program so for the moment everything is almost on default. I did change the -G option though since I noticed that my insert sizes have a wider spread than 50.

                config
                Code:
                max_rd_len=150
                [LIB]
                avg_ins=320
                asm_flags=3
                reverse_seq=0
                rank=1
                q1=*file.fastq
                q2=*file.fastq
                commands
                Code:
                SOAPdenovo-Trans-31kmer all -s config -K 31 -G 100 -o *file
                I think I just figured out how the contigs work, for something as strange as that they really should have some output descriptions.

                The .newcontigindex lists all contigs in consecutive order (no missing numbers), both the .readoncontig and .cnt2read files show that reads were used to makes all contigs BUT the .contig file only has about half the contigs. The .newcontigindex has a 2 for contigs that I do get a sequence for and a 0 for contigs that are not in the output file. I think contigs with a 0 are assembled using the reverse reads then the reverse complement is integrated into the forward contigs.

                But, the confusing part for me was that not all of my contigs had a reverse complement. This information is in .contigindex which lists how many reverse complements each of the forward contigs has. I have 49 forward contigs without a reverse complement making the contig numbering system in the .contig file appear random. I can't find the file that lists which contig was the reverse complement to which, based on read count per contig it looks to be often consecutively numbered contigs, but not always. sigh.

                Would be nice to know what the headers are for the .links file too ...

                I think I'll just try a few other programs, I have no idea exactly what this one did.
                Last edited by Jeremy; 10-10-2012, 12:53 AM.

                Comment


                • #9
                  Strangely, I do not have entries in .ctg2read, .readONcontig and .links..
                  It must have something to do with single-end paired-end libraries, because my config file doesn't seem to be incomplete.

                  I also worked with Trinity and Oases and both did a better job than SOAP, anyway.
                  Maybe this thread helps http://seqanswers.com/forums/showthread.php?t=17959

                  Cheers

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  39 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  41 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  35 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X