Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • lobSTR: a new tool for profiling STRs from short reads

    Hi everyone,

    We have developed a new tool, lobSTR for profiling short tandem repeats from whole genome sequencing datasets. The tool takes in raw reads in FASTA/FASTQ or BAM format and reports genotypes at hundreds of thousands of STR markers.

    You can download the tool and find complete usage instructions at our website (http://jura.wi.mit.edu/erlich/lobSTR/index.html).

    We would love for people to try it out. Feel free to contact me ([email protected]) with any feedback or questions.
    Thanks!
    Melissa

  • #2
    Hi Melissa,

    Your lobSTR looks good when I ran on a data set with read length=50!. However, I wasn't able to run it on a set of short read data (read length=36) as it returned nothing even when I I set --min-read-length to 34. Could you please advise what are the best parameter values for this short read data set?

    Also, the help (lobSTR --help) says that "--min-read-length should be at least two times fft-window-size", while the default values for those two are 45 and 24 respectively. Clearly these do not satisfy the recommendation. Could you please comment on this?

    Many thanks,
    Minh

    Comment


    • #3
      Hi Minh,

      Unfortunately lobSTR is not able to process 36bp reads using the default settings. This is because using windows of 24bp cannot fit enough windows into a single read to pick out the STR-containing region vs. the flanking region. Since most STRs in the reference are around 40bp long or so, most of them cannot be completely spanned by 36bp reads, so to get reasonable results reads should really be at least 45bp long. The longer the better.

      If you still want to try aligning 36bp reads, the following settings are able to align some 36bp reads:

      lobSTR -f test.36bp.fq -q --index-prefix ../index_trf_hg19/lobSTR_ --out test --fft-window-size 20 --fft-window-step 10 --extend-flank 2 --entropy-threshold 0.2 --min-read-length 36 -m 0

      but again, this is not recommended and will not give very accurate alignments.

      Sorry about the discrepancy in the help message. The default got changed from a minimum of 48bp to 45bp and the help message should have changed to reflect this. It is not technically required that read length be exactly twice the read length, but it should be close to that and this was meant to be a guideline for choosing window sizes. Thanks for pointing this out, I will change the help message.

      Hope that helps! Let me know if anything is unclear.
      ~Melissa

      Comment


      • #4
        Hi, mgymrek,

        How's lobstr on Exome data? Compared to whole genome, is it all i need to do is change the ref and index profile?

        Thanks!

        Comment


        • #5
          Hi guo,

          We have found that lobSTR works great on exome data. You actually do not need to make any usage changes at all to the reference/index you are using and can run it as usual.

          ~M

          Comment


          • #6
            I would be cautious genotyping repeats from exome data as there could be significant issues with allelic bias. Also I am not sure that you get a lot of repeats from exome capture as it was my understanding that capture arrays tried to avoid repeats because of their secondary structure?

            Anyone know more about this?

            Comment


            • #7
              As for exome capture trying to avoid repeats, I don't know much about this, but would love to hear input from someone that does.

              We have seen a slight bias toward picking up shorter alleles, both because there is likely some bias in amplifying those alleles, and because longer alleles are harder in general for lobSTR to detect because of short read lengths. We can routinely type several hundred STRs from exonic regions. Most of them are trinucleotides, but you will also pick up at lot of STRs nearby exonic regions. There are surely STRs that we miss in the exomes, but the ones we have been able to type seem to give robust high coverage calls.

              Comment


              • #8
                For those interested, today we released RepeatSeq for similar purpose, thread here:

                Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X