Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Draft genome scaffolding with RNAseq paired-end reads

    Hello all,

    I used Tophat to map 100bp, PE Illumina transcriptome reads to a draft genome (133062 contigs).
    Our main goal was SNP mining, but I have been suggested the reads could also be used for scaffolding.

    I have no experience in genome assembly and scaffolding, but I assume that if I can find read pairs where the 2 reads are mapped to different genomic contigs, the 2 genomic contigs could then be connected.

    How can I search the BAM alignments for such read pairs?

    Alternatively I could use an assembler that can combine different types of reads such as Mira, but I thought it would take longer, and the genomic reads are not available anyway.

    Thank you!

  • #2
    Look for reads where the "rname" field and "rnext" field are different (and rnext is not "=" or "*"); those have the reads mapped to different contigs.

    Comment


    • #3
      Thank you for the info Brian,
      good starting point, saved me a lot of reading and guesswork.

      I also thought of filtering for MAPQ = 50 (should be uniquely mapped reads)
      and properly paired reads (FLAG = 83|99|147|163)

      The following command should then extract the alignments of interest:

      samtools view -q 50 accepted_hits.bam |gawk '($2 == 83 || $2 == 99 || $2 == 147 || $2 == 163) && $7 !~/[*=]/ {print $3, $7}' > output

      And thus obtain a list of joined contigs.
      However, while it is possible to determine which contigs are joined, I assume the lenght of N bases padding between them cannot.
      Not only the region may not be transcribed, but the insert size for paired reads that have a mate in a different contig appears to be always 0 (at least that is what Tablet shows).

      Or do I have other options I'm unaware of?

      Comment


      • #4
        The insert size of reads mapped to different contigs is unknown. Scaffolding tools can use the distribution of insert sizes of pairs on the same contig, or user-supplied insert size numbers, to determine how many Ns to pad.

        This might be easier if you just use a standalone scaffolding tool. There are various out there, but I don't have a recommendation. Here's a paper comparing some of them:

        Comment


        • #5
          Thank you again for the input,
          I'll check the paper and see if using the above filtered alignments can work.

          Comment


          • #6
            Maybe this program will help:



            I have not used it myself though.

            Comment


            • #7
              L-rna-scaffolder may also help.

              I have used it with varying success.

              Comment


              • #8
                Thank you all for your answers,
                I'll try some of the suggested tools, more likely those that do not have too many dependencies..

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X