Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Velvet options and the results, examine contigs

    Hi,

    I'm working sequence analysis, and just a few days ago did assembly of short read using Velet.
    Okay, I can say I just began.
    Whatever, Reads are from Solexa, paired-end, 76 bases in length.

    I made various cases by options such as seed length, min contig length, cov cutoff, exp cov, etc.
    Among them, seed length and cov cutoff seemed to have critical effect on assembly result.
    Min contig length had also some,but exp cov looked not that notable, in my humble sight.

    Seed length is quite expected to do so, but I have no idea what was going on with 'cov cutoff'.
    With other options fixed, no value given to it, 2 given, 5 given, and 10 made result of considerable difference.
    As larger it was, the number of contigs decreased and the size of those dramatically increased.

    Therefore, I wanted to know how many reads contributed into contigs, that is, how many reads made those.
    But I can't find what file or option to refer.

    Additionally, I also wonder how the option 'cov cutoff' made those huge contigs.
    Actually, no value (default) and 5 showed average size of contigs differ more than ten times.
    Contigs were constructed less than ten percent. (Contig length sum was not that different.)
    Finally, are the larger contigs reliable indeed?

    Regards,
    Kim

  • #2
    you may try the velvetg command with options '-amos_file yes -read_trkg yes' that creates a velvet_asm.afg file that contains some of that information...
    --
    bioinfosm

    Comment


    • #3
      Options

      thanks.

      I already had used '-read_trkg yes' option, but I wasn't able to understand what was changed by that one,
      and it seemed there's no output file newly created.

      And, I added '-amos_file yes' as your comment. That made a new file named 'velvet_asm.afg'.
      Below is the beginning of it :
      -------------velvet_asm.afg---------------
      {LIB
      iid:1
      {DST
      mea:181615104
      std:-9223372036854775808
      }
      }
      {LIB
      iid:2
      {DST
      mea:181615104
      std:-9223372036854775808
      }
      }
      {LIB
      iid:3
      {DST
      mea:96
      std:1504
      }
      }
      {RED
      iid:1
      eid:1
      seq:
      ATGAGCTTCCTTTGATCTCTAGCTTTCCCGATGGCTATGATCCCGTTCCAACACCTCGTG
      ATGATAATTCATATAC
      .
      qlt:
      ATGAGCTTCCTTTGATCTCTAGCTTTCCCGATGGCTATGATCCCGTTCCAACACCTCGTG
      ATGATAATTCATATAC
      .
      }
      (.............)
      --------------------------------

      Now I'm trying to use 'amos2ace', generating ACE file from AMOS file (.afg).
      Then I can find out the reads in each contig.
      If anyone knows another way to identify that information, please inform me.

      Anyway, thanks again to bioinfosm.

      Comment


      • #4
        If you scroll further down in the afg file, there are lines beginning with
        src:
        These are the read IDs contained in that contig. The ID comes from velvet's Sequences file.

        Could you share how amo2ace helped you..
        --
        bioinfosm

        Comment


        • #5
          when I run kmer/cvCut permutations I gzip the afg files and compile statistics on them later:

          Code:
          open ASM,"<:gzip","$rootDir/velvet_asm.afg.gz" or die "can't find .afg file";
          while(<ASM>){
              if(/\{RED/){$reads++}
              elsif(/\{TLE/){$tiles++}
          }
          --
          Jeremy Leipzig
          Bioinformatics Programmer
          --
          My blog
          Twitter

          Comment


          • #6
            Hello,
            I am doing transcriptome de novo assembly on SOLiD data, for that I am running velvet and oases pipeline, in the assembly outoff 31 million reads only 1 million reads taking part in assembly!! what is the reason behind that??
            Your answers and help would be appreciated.

            Comment


            • #7
              woah, old thread. ;-)

              how long are you reads? did you trim before running velvet and oases? which kmers you choose, take care that you do not choose a higher kmer than your minimum read length. Could it be, that the read quality is messed up?

              Comment


              • #8
                The read length is 50bp. yes I trimmed the reads before running velvet and oases.
                The kmer length that we prefer is 25-31. Even after doing that only 2806127 reads out of 31405877 filtered reads participate in the
                assembly process. Well coming on to the qaulaity of reads we also performed quality filteration. Hope that covers everything what you asked. But I am still surprised why I am not able to assemble large number of reads from my input data. Could you please provide a robust elaboration on it??
                Last edited by waterboy; 03-15-2011, 09:05 PM.

                Comment


                • #9
                  it is really hard to guess.

                  You trimmed all reads to the same length of 50bp?
                  What total coverage for your transcriptome do you expect with this 31405877 reads. Only option you have left so far is try even lower kmers and a really low cov_cutoff (e.g. 2).

                  Comment


                  • #10
                    Does anyone know how to allow for SNPs when assembling sequences with Velvet?

                    I have downloaded Velvet 1.1.05 and obtained some contigs using my reads, but they don't have SNPs and I am quite sure they should.

                    Thanks

                    Comment


                    • #11
                      Getting number of reads in each contig:

                      I used contrib/extraContigReads.pl in velvet to make contig file and its corresponding reads. After making all the contig files (contig_1_reads.txt,contig_1_reads.txt,........), i just run

                      grep -c '>' contig_*.txt > Outputfile.txt

                      The Output file contain contignumber and its corresponding reads..
                      Last edited by ramadatta.88; 10-06-2011, 12:28 AM. Reason: assigning title

                      Comment


                      • #12
                        Getting number of reads in each contig:

                        Getting number of reads in each contig:

                        I used contrib/extraContigReads.pl in velvet to make contig file and its corresponding reads. After making all the contig files (contig_1_reads.txt,contig_1_reads.txt,........), i just run

                        grep -c '>' contig_*.txt > Outputfile.txt
                        That would definately take ages when you are dealing with millions of reads..Best thing would be using AMOS package.

                        convert the afg file output by velvet to a bnk file

                        ~/src/amos-3.0.0/src/Bank/bank-transact -m my.afg -b my.bnk -c

                        run depth-of-coverage output

                        ~/src/amos-3.0.0/src/Validation/analyze-read-depth my.bnk -d -r > Outputfile

                        The output file contains the contigs ID, Corresponding reads, depth of contig. etc..Hope this helps someone

                        Comment


                        • #13
                          Dear all,

                          Following is the velvet_asm.afg file generated by velevt, But I want to have my Read Id's from the Sequence file here in this format. What Id's should I change?? both iid and eid or what??

                          Thanks,
                          RAhul

                          -------------velvet_asm.afg---------------
                          {LIB
                          iid:1
                          {DST
                          mea:181615104
                          std:-9223372036854775808
                          }
                          }
                          {LIB
                          iid:2
                          {DST
                          mea:181615104
                          std:-9223372036854775808
                          }
                          }
                          {LIB
                          iid:3
                          {DST
                          mea:96
                          std:1504
                          }
                          }
                          {RED
                          iid:1
                          eid:1
                          seq:
                          ATGAGCTTCCTTTGATCTCTAGCTTTCCCGATGGCTATGATCCCGTTCCAACACCTCGTG
                          ATGATAATTCATATAC
                          .
                          qlt:
                          ATGAGCTTCCTTTGATCTCTAGCTTTCCCGATGGCTATGATCCCGTTCCAACACCTCGTG
                          ATGATAATTCATATAC
                          .
                          }
                          (.............)
                          --------------------------------
                          Rahul Sharma,
                          Ph.D
                          Frankfurt am Main, Germany

                          Comment


                          • #14
                            Originally posted by rahularjun86 View Post
                            Dear all,

                            Following is the velvet_asm.afg file generated by velevt, But I want to have my Read Id's from the Sequence file here in this format. What Id's should I change?? both iid and eid or what??

                            Thanks,
                            RAhul

                            -------------velvet_asm.afg---------------
                            {LIB
                            iid:1
                            {DST
                            mea:181615104
                            std:-9223372036854775808
                            }
                            }
                            {LIB
                            iid:2
                            {DST
                            mea:181615104
                            std:-9223372036854775808
                            }
                            }
                            {LIB
                            iid:3
                            {DST
                            mea:96
                            std:1504
                            }
                            }
                            {RED
                            iid:1
                            eid:1
                            seq:
                            ATGAGCTTCCTTTGATCTCTAGCTTTCCCGATGGCTATGATCCCGTTCCAACACCTCGTG
                            ATGATAATTCATATAC
                            .
                            qlt:
                            ATGAGCTTCCTTTGATCTCTAGCTTTCCCGATGGCTATGATCCCGTTCCAACACCTCGTG
                            ATGATAATTCATATAC
                            .
                            }
                            (.............)
                            --------------------------------
                            I would change the "eid" if you must have your own IDs in that file. The "iid" is an internal ID, and my guess is that changing the form of this might cause problems with software designed for reading afg files.

                            Comment


                            • #15
                              Velvet: how many reads into each contig

                              Hi, I guess this is the right place to ask...

                              I am trying to figure out how many reads made it into each contig after I run velvet. I used the flag -read_trkg yes and -amos_file yes.
                              I found some advise online on how to process them but I am a bit stuck.
                              So from the amos file I have tried this path:

                              $ /home/admin/amos-3.1.0/src/Bank/bank-transact -m velvet_asm.afg -b velvet_asm.bnk -c

                              and then
                              $/home/admin/amos-3.1.0/src/Validation/analyze-read-depth velvet_asm.bnk -i -d -r > output

                              The problem is that the file output is something like this :
                              1 1646 1340 1340 108.275 108.275
                              2 1 97 97 1.03093 1.03093
                              3 2085 1589 1589 113.904 113.904
                              4 175 75 75 219.547 219.547
                              5 187 162 162 105.944 105.944
                              6 124 76 76 157.171 157.171
                              7 1023 768 768 112.914 112.914
                              8 9 99 99 9.09091 9.09091
                              9 151 168 168 72.8214 72.8214
                              10 2224 1602 1602 123.306 123.306
                              11 6582 5312 5312 107.116 107.116
                              12 2850 2487 2487 98.653 98.653
                              13 3 105 105 2.62857 2.62857
                              14 3 103 103 2.91262 2.91262
                              15 4350 3350 3350 115.28 115.28
                              16 1788 1268 1268 128.341 128.341
                              17 1746 1325 1325 116.024 116.024

                              could anyone tell me what the filed are or if there is a different way to get the information I need??

                              Thank you,

                              F.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X