Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by VidJa View Post
    Hi Boetie,

    Do you plan to include the use of SAM/BAM as output instead of the default bowtie, that way you clould dropin all sam/bam capable aligners, like BWA (for longer reads with the sw option), Smalt http://www.sanger.ac.uk/resources/software/smalt/
    or Stampy http://www.well.ox.ac.uk/project-stampy
    and probably a full bag of other aligners.
    After some testing with Bowtie, i came to the conclusion that going for a SAM/BAM output instead of the default bowtie output is much slower.

    SAM output will generate output for all reads, no matter they are mapped against a contig or not. Bowtie output only contains reads that are mapped. Since SSPACE only contains a fraction of the contigs (only begin and ends of each contig), the number of reads that maps is also low. SSPACE goes through all output lines, thus the higher the number of output lines, the slower the program goes.

    Say there are 1M reads, and only 10.000 reads map. Bowtie will generate 10.000 output lines, while SAM produces 1M output lines, making it 100 times slower to read in the output file.

    Therefore, I'll make it possible to insert tab-delimited files with information about paired reads in the format;

    <read1_tig> <read1_start> <read1_end> <read2_tig> <read2_start> <read2_end>

    I will provide a script that can convert SAM output files to a TAB file. This way, all SAM capable aligners can be used.

    In addition, multiple TAB files of different libraries can be given, as well as a combination of TAB and normal paired-reads. For example; if you have a paired-end library of 200bp and one with 500bp. For both libraries you map the reads to the contigs, generating two SAM files, which you can convert to .tab file. Both could be given to SSPACE, first SSPACE scaffolds the contigs using the 200bp library. Next, the positions of the contigs are updated by determining their new position within the scaffolds. Then, the 500bp library is used for scaffolding the previous scaffolds generated with the 200bp library.

    Still in testing fase though, but the results till now look ok. I get similar results if i input a paired-end fastQ file, or a .tab file.

    I'll keep you updated!

    Kind regards,
    Boetsie

    Comment


    • #62
      Originally posted by boetsie View Post

      SAM output will generate output for all reads, no matter they are mapped against a contig or not. Bowtie output only contains reads that are mapped. Since SSPACE only contains a fraction of the contigs (only begin and ends of each contig), the number of reads that maps is also low. SSPACE goes through all output lines, thus the higher the number of output lines, the slower the program goes.
      Boetsie
      Hi Boetsie,
      I don't know about all mappers that output SAM format, but BWA for example can output SAM to stdout if you don't give it the -o outfile.sam option. If you are only interested in reads that map, or reads where only one mate of the pair maps, you can either pipe the output through samtools and filter on the flag, or pipe through a perl or awk one liner that filters the sam output on the flag. The flag contains info about whether the read maps and/or the mate maps.
      -John

      Comment


      • #63
        Originally posted by jstjohn View Post
        BWA for example can output SAM to stdout if you don't give it the -o outfile.sam option.
        -John
        Actually on most *nix systems including mac and every linux I have worked on so far you can use "/dev/fd/0" as the file name when you want to output something to standard out and the program doesn't give that option.

        For example:
        echo "hello world">/dev/fd/0

        Comment


        • #64
          Originally posted by jstjohn View Post
          Actually on most *nix systems including mac and every linux I have worked on so far you can use "/dev/fd/0" as the file name when you want to output something to standard out and the program doesn't give that option.

          For example:
          echo "hello world">/dev/fd/0
          Shouldn't that be /dev/fd/1 ?? file descriptor #0 is usually stdin. Granted /0 works but I think that this is a side effect and not something to be relied on.

          Besides that example proves nothing since you are taking stdout and (if you use /dev/fd/1) putting it into stdout. What you need for proof is a program like:

          Code:
          #!/usr/bin/perl
          open (TEST, '>', $ARGV[0]) or die "Can not open file $ARGV[0]\n";
          print TEST "This is a test\n";
          exit;
          If I name the above 't.pl' then I can run the following 4 examples:

          1) if there is no file name given then nothing is output; either in a file nor in stdout.

          Code:
          rm -f test.tmp; ls test.tmp; ./t.pl ; ls test.tmp
          ls: test.tmp: No such file or directory
          ls: test.tmp: No such file or directory
          2) If a file name is given then the file is created but no stdout.

          Code:
          rm -f test.tmp; ls test.tmp; ./t.pl test.tmp ; ls test.tmp
          ls: test.tmp: No such file or directory
          test.tmp
          3) If I give the stdout file descriptor as the file name then I get text on stdout but not in the file.

          Code:
           rm -f test.tmp; ls test.tmp; ./t.pl /dev/fd/1 ; ls test.tmp
          ls: test.tmp: No such file or directory
          This is a test
          ls: test.tmp: No such file or directory
          4) Likewise I can use the stdin file descriptor

          Code:
          rm -f test.tmp; ls test.tmp; ./t.pl /dev/fd/0 ; ls test.tmp
          ls: test.tmp: No such file or directory
          This is a test
          ls: test.tmp: No such file or directory
          5) But the above to descriptor #0 is really a side effect as can be shown by piping stdout to another program. In this case I'll use 'od' (octal dump). Using descriptor #1 (stdout -- the recommended descriptor) I get 'od' output.

          Code:
           rm -f test.tmp; ls test.tmp; ./t.pl /dev/fd/1 | od -c; ls test.tmp
          ls: test.tmp: No such file or directory
          0000000   T   h   i   s       i   s       a       t   e   s   t  \n
          0000017
          ls: test.tmp: No such file or directory
          6) Whereas if I use the incorrect descriptor #0 (stdin) I do not get proper output but instead just see the text.
          Code:
           rm -f test.tmp; ls test.tmp; ./t.pl /dev/fd/0 | od -c; ls test.tmp
          ls: test.tmp: No such file or directory
          This is a test
          0000000
          ls: test.tmp: No such file or directory
          Whew! Now back to what I was doing. Which is running sspace in various configurations.
          Last edited by westerman; 06-15-2011, 07:48 AM. Reason: minor correction

          Comment


          • #65
            Nice tests! Good to see how that /dev/fd stuff works.

            Comment


            • #66
              Hi Boetsie,

              great explanation, I expected sam/bam to be slower. Another nice aligner which could be considered is PASS: http://pass.cribi.unipd.it/cgi-bin/pass.pl
              Very fast and has its own output type or gff, besides sam/bam. It has an option to attempt contig association and handles long reads.

              Comment


              • #67
                Thank you for the suggestion Vidja, appreciate your help. However, right now i think including an independent script for conversion of .sam to .tab would be the best idea. SAM input can be possible, but I have to find a way to find the paired reads. With bowtie I do this by including /1 and /2 to the reads, so I know which is the left and right read (hence; i do a single read mapping, because Bowtie only finds read-pairs on same contig). Readnames differ for each platform, so I should probably let the user include the postfix of both reads. That's why i think at the moment .tab is the best way.

                Conclusion; have to think about it

                Comment


                • #68
                  Hi Boetsie,

                  Just out of curiosity is this script written in 32 bit perl? I see an issue with the program not taking beyond a certain level of memory.

                  Comment


                  • #69
                    problem with SSPACE

                    Hi

                    I have a problem of using SSPACE. the process was failed at bowtie-build. I can use bowtie to do index with my genome file, but it seems not working in SSPACE... What's the problem?

                    Thanks

                    Comment


                    • #70
                      Originally posted by christinawu2008 View Post
                      Hi

                      I have a problem of using SSPACE. the process was failed at bowtie-build. I can use bowtie to do index with my genome file, but it seems not working in SSPACE... What's the problem?

                      Thanks
                      I quote an older post in this thread;
                      The problem was mainly solved by going through the directory were the main SSPACE script (SSPACE_v1-x.pl) and folders are stored using the command line. Then, do one of the following;

                      chmod a+x bowtie/*

                      or

                      chmod 777 *

                      in your command line.

                      If this won't work, then you may try to download the newest Bowtie version at http://sourceforge.net/projects/bowt...bowtie/0.12.7/

                      Replace the files in the bowtie folder with the ones you've downloaded.
                      Kind regards,
                      Boetsie

                      Comment


                      • #71
                        Originally posted by KanyeDidIt View Post
                        Hi Boetsie,

                        Just out of curiosity is this script written in 32 bit perl? I see an issue with the program not taking beyond a certain level of memory.
                        Yes, i think it is written in 32 bit..

                        Comment


                        • #72
                          Hi Boetsie,
                          I am trying to use SSPACE, for scaffolding with out any success, I have the following questions for you:
                          1 Does it use contigs from ABySS assemblyor CLCbio, and add paired end and mate pair??
                          2 Why am I getting empty result, with no improvement in N50 or anything. I have tested every possible option, -k2 -k5
                          can you please give some suggestions about it.

                          Thank you,
                          with kind regards,
                          Ashu

                          Comment


                          • #73
                            Hi again Ashu,

                            1. yes, this is possible. I've used both assemblies for scaffolding before, including paired-end and/or matepair reads.

                            2. Could you maybe send me your libraryfile and summaryfile as private message, so i can check what is going on.

                            Regards,
                            Boetsie

                            Comment


                            • #74
                              Hi boetsie,

                              I tried SSPACE, it is pretty good. However, i want to use sequencing technologies besides illumina, for scaffolding. Is the *.tab file for that? Also, i have the assemblies already in *.caf, *.maf and *.ace formats. Can i extract info on the mates from these (former two) files and use it in SSPACE? Do you have any scripts that can do that properly?

                              Sincerely yours
                              Nestor

                              Comment


                              • #75
                                Hello there,

                                Great tool! my organellar assembly improved quite nicely.

                                Now that I have the scaffolds I wanted to ask here how may upload them to genBank to comply with their requisite of uploading an AGP file for scaffolds containing: contig and amount of Ns linking them:



                                Is there a way or any suggestions on how may achieve this AGP files using sspace output files? It will be great.

                                Thank you very much for any info,

                                Best Regards,

                                G

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                31 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X