Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Program for aligning particular set of reads to an entire NGS dataset

    Hi,

    I have about 100 cDNA sequences (let's call them "ref.") for which I would like to know how many reads from the original Illumina dataset (10 millions of reads; let's call them "reads") align to them fully (i.e. the entire ref. sequence is in the read; see "read1" below) or partially (see "reads 2, 3, 4" below) without gaps.

    Example:
    Code:
    [COLOR="red"]ref.                   AGTTCGGCCGCTCACCGCACCGTCACGCCATCCAGGCATC[/COLOR]
    read1  ATGCGCTAGCTAGCATAGTTCGGCCGCTCACCGCACCGTCACGCCATCCAGGCATCTTGGACCGCATAGCATC
    read2              ATTAAGTTCGGCCGCTCACCGCACC
    read3                                CCGCACCGTCACGCCATCCAGGCATCATGCGCGATCTCAGC
    read4                        GCCGCTCACCGCACC
    Is there any "mapping" program to do that?

    Can I use Bowtie2 (although it seems a bit complicated to use when I look at the extensive list of the option arguments)? It seems like I would have to input one file containing all the sequences (ref. + reads), which would probably align all the sequences to each other and take ages?
    Also should I used the raw reads (paired-end) or the merged+unmerged reads?

    Thanks for your help !

  • #2
    bowtie2 is good. Yes there are lot of arguments but that is because different people want to do different things. For example in your case you will want to use the non-default '--local' mapping.

    You will not input just one file. Instead you will create an index file for your reference(s) and then input the R1 and R2 read files separately.

    Comment


    • #3
      Got it. Thanks westerman !

      Comment


      • #4
        i did mapping using tophat, where length of reference was minimum 150 bp and max 50,000bp (worked on approx 40,000 reference sequence separately). I mapped paired end reads collectively rather than separate. Both mapping could end up with slight or major difference in mapping (It should be bothered for short stretch reference where reference length is less than 300 bp (just hypothetical statement) . Doing mapping of paired end R1 and R2 seperately, will be followed by selecting those reads that mapped in both mapping ?? right ?? Now how we will encounter the insert size parameter ?? and how i can perform the local mapping in tophat ?? is there any way to do so ??

        Comment


        • #5
          Originally posted by archana2287 View Post
          i did mapping using tophat, where length of reference was minimum 150 bp and max 50,000bp (worked on approx 40,000 reference sequence separately). I mapped paired end reads collectively rather than separate. Both mapping could end up with slight or major difference in mapping (It should be bothered for short stretch reference where reference length is less than 300 bp (just hypothetical statement) . Doing mapping of paired end R1 and R2 seperately, will be followed by selecting those reads that mapped in both mapping ?? right ?? Now how we will encounter the insert size parameter ?? and how i can perform the local mapping in tophat ?? is there any way to do so ??
          This appears to have limited relevance to this thread, so I suggest you create a new thread to ask the question. And please take your time to phrase it clearly.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          66 views
          0 likes
          Last Post seqadmin  
          Working...
          X