Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOAPdenovo-trans without paired reads

    Hi all of you,

    I'm trying to configure a transcriptome sequence assembly using SOAPdenovo-trans, but I can't find a appropiate manual, my problem starts because I don't have a paired-reads sequencing and as I'm using this asssembly to make some test of another softwares, I downloaded it from SRA, and changed its format to fastq. The idea is to implement this to a illumina sequencing that's coming soon.

    What specifications do I have to use to configure a assembly? in computing cappabilities I think I have enough computing power to do this.


    Best regards and thanks

  • #2
    A detailed manual is something I would like also, the site has the basic information required for getting the program to run (http://soap.genomics.org.cn/SOAPdenovo-Trans.html), it just lacks a proper description of the output files leaving the user to try and guess how each one is made.

    The website states: "There are some other files that provide useful information for advanced users.", which is not actually very helpful at all. Granted, most of the files pretty much explain themselves if you are familiar with assembly programs, but still, some information would be nice.

    The program automatically detects which format your data is in so just directing it to a single file should be fine. Since the program does detect paired status I think that means you can't just give it the first half of a paired end run because the naming system indicates that it is paired end data.

    Which step do you run into problems and what does the error report say?

    Comment


    • #3
      Hi Jeremy, thanks for your answer...

      Yes I tried to use this manual, but when I was readind I didin't notice a real help. I tried to use the configuration example that appears in this site, and I changed the specifications by two ways:


      #maximal read length
      max_rd_len=50
      [LIB]
      #average insert size
      avg_ins=200
      #if sequence needs to be reversed
      reverse_seq=0
      #in which part(s) the reads are used
      asm_flags=3
      #in which order the reads are used while scaffolding
      rank=1
      #fastq file for read 1
      q1=/home/oscar/software/SOAPdenovo/SOAPdenovo-trans/SOAPdenovo-Trans/DRR000031.fastq
      #fastq file for read 2 always follows fastq file for read 1
      q2=/home/oscar/software/SOAPdenovo/SOAPdenovo-trans/SOAPdenovo-Trans/DRR000031.fastq
      #fasta file for read 1
      f1=/home/oscar/software/SOAPdenovo/SOAPdenovo-trans/SOAPdenovo-Trans/DRR000031.fastq
      #fastq file for read 2 always follows fastq file for read 1
      f2=/home/oscar/software/SOAPdenovo/SOAPdenovo-trans/SOAPdenovo-Trans/DRR000031.fastq
      #fastq file for single reads
      q=/home/oscar/software/SOAPdenovo/SOAPdenovo-trans/SOAPdenovo-Trans/DRR000031.fastq
      #fasta file for single reads
      f=/home/oscar/software/SOAPdenovo/SOAPdenovo-trans/SOAPdenovo-Trans/DRR000031.fastq
      #a single fasta file for paired reads
      p=/home/oscar/software/SOAPdenovo/SOAPdenovo-trans/SOAPdenovo-Trans/DRR000031.fastq
      When I used this configuration, I supposed that SOAP is working, I don't know if it's ok, there is my problem... But when I make it run in a workstation, It doesn't finish and when I see it, a couple of hours after apparently the machine reboots.


      #maximal read length
      max_rd_len=101
      [LIB]
      #average insert size
      avg_ins=300
      #if sequence needs to be reversed
      reverse_seq=0
      #use for contig building only
      asm_flags=3
      #in which order the reads are used while scaffolding
      rank=1
      #fastq files
      q1=/home/oscar/software/SOAPdenovo/SOAPdenovo-trans/SOAPdenovo-Trans/DRR000031.fastq
      When I use theese other configuration, appears a "fragmentation error", and the asembly doesn't start.


      The order to try to use SOAP is it.

      ./SOAPdenovo-Trans-127mer all -s example.config -o test -p 6

      I have to tell that the reads where downloaded in SRA format, this is the file Tthat I downloaded and with sra toolkit I converted the file to fastq. As it has 36-mer reads I used SOAPdenovo-Trans-127mer.

      Is there a problem in my configuration files that you could notice? I would be really glad if you can help me. Regards,


      Oscar
      Last edited by rexxi; 10-09-2012, 06:16 AM.

      Comment


      • #4
        Hi Oscar
        I can see a few problems. Firstly, since your reads are 36-mers you will want to use SOAPdenovo trans 31kmer, the 127mer version is for longer reads, pretty sure that is why you get a fragmentation error.

        As for the config file, you need to adjust it to suit your read type, so the max length should be 36, since you have 36-mer reads not 101. The avg_ins option is for paired end reads so I would just leave that out.

        I read somewhere (but can't remember where) that the optimum kmer size (that's the -K option) is 1/3rd of your read length. I think that it varies based on your library type though so try a few different kmers.

        Comment


        • #5
          I realize this was last active a while ago, but since it's still one of the top Google results, I've found a few more answers.

          For single end reads, you use one of the options below, rather than q1.

          q=/path/to/input/fastq/file.fastq
          or
          f=/path/to/input/fasta/file.fasta

          Also, I was able to leave out the avg_ins option with no problems. These options are mentioned in the example.config file you (Oscar) quoted above, which can also be found at http://soap.genomics.org.cn/SOAPdenovo-Trans.html#comm7.

          Cheers,
          Aaron

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X