Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ShoRAH

    I'm looking for people that already used or is trying to use ShoRAH with very short reads.

  • #2
    Yes I have used ShoRAH.... but only with 454 reads. Currently I am trying (for the first time) working with some illumina reads, but I am not sure what all it assumes for illumina files.

    If there is anybody out there with experience in using the sam/bam tools, could you please reply so I can ask you some questions? Thanks much in advance,

    Nash

    Comment


    • #3
      Hi Naragram and Pedrolance

      I just found this post after submitting a thread about ShoRAH. I'm currently running ShoRAH for haplotype reconstruction from Illumina short read data. The analysis has been going for two days now - no results as of yet. If I manage to get results I will update on my progress.

      Naragram, since you have used ShoRAH for 454 reads, can you please help me to understand what the paramers mean?

      I am running the analysis with input data of 90 000 short reads mapped to a 11kb reference genome on -j = 100, -t = 1000 and -K = 10. I left out the rest.

      What is the significance/function of -a (alpha), -K (start value for number of clusters), -k (number of reads per start cluster), -t (history time) and -R (randomseed)? How do they influence my analysis?
      '
      I am a complete novice, it has taken me 4 months to get the program to work without any prior knowledge of linux or command line, and that's with receiving lots of help from knowlegeable bioinformatics people. They have never used ShoRAH and don't know what the parameters mean in context with the analysis I'm doing. I'm trying to read and read but the literature has limited scope for someone in the business of only biology.

      ck

      Comment


      • #4
        ckseq,

        Okay, one of the first things I'd like to ask you is the version of ShoRAH you are trying to run....if it's 0.6 (the new one), a lot has CHANGED! You might want to go through the minimal documentation available on their website. If you have an older version (perhaps more stable) 0.5, then I can help you a bit...

        For v0.5, my run-shorah command is as follows:

        shorah.py -f $1 -r $2 -j 1000 -s 1 -w $3 -a 0.1 -k -t $4

        where, $1 is the fasta formatted input file of 454 Reads (Illumina Reads are completely different!), $2 is the reference (you need to get the shortest reference that covers your reads, else, you won't get any decent haps!), $3 is a tricky window parameter that you get by initially using a very large number (say, 10000) and then look at the dec.log output to find out what window size is suggested by ShoRAH that covers your amplicon. Finally, $4 is the threshold parameter which is set to a fairly high value of 0.7 by default, and based on your read quality and reference, you may have to play with it (I mean, reduce it to 0.5 or so) to get the haps you may be looking for.

        Phew...yes, it's a bit of a wild goose chase, but, you can work with 454 seqs pretty reliably using ShoRAH 0.5.

        For Illumina reads, I used ShoRAH 0.6 which ONLY work with *.sorted.bam seq files (NO, you can't use fasta files here) there are a lot of changes and I am still trying to get some sense out of my outputs so far...

        As far as your specific parameter question is concerned, I haven't used the alpha (a) parameter, the -K parameter, or the -R parameter... I used the -k and -t parameter with default values, and I think -k is used just to keep the intermediate files.

        Hope this helps...if you still have any more specific questions, please let me know and I shall try to help. Also, Dr. Zagordi who is the principal developer of the ShoRAH s/w is very helpful and always replies to any of your emails directly. Good luck!

        Cheers, Nash

        Comment


        • #5
          Hi Nash

          Thanks for you helpful reply! I'm not familiar with the global analysis or 454 data parameters. I sure will refer back to your post whenever I need to wortk with 454 data! Thanks!

          I'm not sure about the version of ShoRAH, since I didnt install it on the server. I can with certainty say though that we had our fair share of sorted BAM file problems. We don't have a licence for novolign to start off with. If I understood the procedure correctly we used bowtie in the end. It won't work with paired-end reads though so we used both the ''forward'' and ''reverse'' datasets but ignored the paired end data option when we mapped. Still mapped to 90 000 reads and for a 11kb genome, it goes to show that paired end reads would have been overkill in any case!

          Do you perhaps know the default values for the -k and -t parameters you used? I'm not certain whether the values we set in my analysis are the default and that's why I'm asking. Writing up a Masters thesis so theres and inherent need to be anal about the values and parameters.

          Two and a half days and counting. I hope it's not due to the 100 iterations, or -t=1000 or -k=10. Then on the other side of the fence I realize that it's necessary to have enough iterations and hope that 100 might be sufficient.

          Oh the joys of not being equipped with compuational biology skills. Then again, one has to start somewhere!

          Thanks again for the help

          Comment


          • #6
            ck,

            "-k" parameter is just a binary flag (TRUE or FALSE; default is FALSE) that saves the intermediate files for you to look at. "-t" is the threshold I talked about and I have used values as low as 0.5 at times for some of the sequences that would not generate any haps for a high default value of 0.7. However, as I found later on, the real sensitive parameter is your reference sequence.... the shorter and better overlapping with your amplicon region the ref is, the better are your haps!

            Good luck again...

            Nash

            Comment


            • #7
              Naragram,

              I am indeed using version 0.6. Thanks for your input. I also contacted Dr Zagordi and he informed me that -t should always be signifficantly less than -j (which was not the case in my first attempt). I restarted the analysis and it kept running and running forever, not writing anything to the .smp or .dbg file.

              We decided to kill it again and try to do a pilot analysis with very little data, 8k reads mapped to 11kb genome.

              I hope this one works so that we can maybe increase the amount of data. At least something is writing to the .smp and .dbg file now!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              69 views
              0 likes
              Last Post seqadmin  
              Working...
              X