Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to open one gap on reads with bowtie2 ?

    Hi everyone.

    Here, i'm trying to open one gap on a few reads.

    To test that things, i use 1.10⁶ reads from illumina sequencing.
    i have modify my reference genom by insertion of 2 exogenous sequence that simulate fake gap (1 and 2kb).

    In a 1rst step, i have validated that my reads can overlap the 2 side of the fake gap by using topHat2. To deal with that tools i have added common splicing site to the both end of the fake gap to simulate exon/exon junction.
    Now my reference genome look like :
    Code:
    <reference_genom_seq>'GT'<fake_gap>'AG'<reference_genom_seq>
    And.. It's work great. Thus, i know that a few reads from my data set can overlap the fake gap ( One gap per read ).

    In a second time, i try to deal with the bowtie2 parameter (i,e Scoring options )

    here, i present a few cmd line used for this experiment:

    Code:
    bowtie2 -p 4 --no-unal --ignore-quals --mp 60000 --rdg 1,1 --score-min L,-100000,-100000 -x my_ref_genom my_fastq -S my_aln.sam
    --ignore-quals(because i dont care of my Q phred value here)
    --mp 60000( i don't want mismatch -> tophat2 aligned my reads without mismatch)
    --rdg 1,1 (i d'ont know why but i can't set value under 1 from this 2 parameter. In a dream world i want to set <int2>=0 for --rdg <int1>,<int2> and find a good value for <int1> to have just one gap per read )
    --score-min L,-1000000,-1000000 (this is a extrem threshold to get better chance to open a gap)

    result->no gap open + clean alignement ( no mismatch)

    Code:
    bowtie2 -p 4 --no-unal --rdg 1,1 -x my_ref_genom my_fastq -S my_aln.sam
    result->no gap open + mismatch

    Code:
    bowtie2 -p 4 --no-unal --ignore-quals --gbar 25 --rdg 1,1 -x my_ref_genom my_fastq -S my_aln.sam
    --gbar 25 (to have a significant number of read base overlapping the both side of the fake gap and to overcome a issue with a gap penalties in the 1rst seeding (seed=22))

    result->no gap open + mismatch

    So, i have 3 question:

    1_if we follow the manual page, we can read that the gap penalties is calculate on this base : <int1> + N * <int2>. with my 1kb gap length (N=1000) and --rdg 1,1 a gap penalties is around 2000.
    with a threshold set at -1000000 (my read length is 100 for the x of the threshold in f(x) = 0 + -0.6 * x), why no gap is open on read overlaping the fake gap ?
    wich paramater can i set up to open a gap on few read from my data set ?

    2_If tophat can deal with my fake gap, can i set up tophat to deal with other gap different from exon/exon junction ?

    3_Do you know an other tools that can deal with my problem ?

    Thank you a lot to read me ( sorry for that large post ) and thanks in advance for any reply.

    Rémi

  • #2
    Rémi,

    I suggest you try BBMap, which is very good at dealing with gapped alignments whether or not they are related to exons.

    (index)
    bbmap.sh ref=reference.fasta -Xmx29g

    (map)
    bbmap.sh in=reads.fq out=mapped.sam -Xmx29g

    This will look for gaps up to 16kb. If you want to look for longer gaps, use the flag "maxindel=100000" to (for example) look for up to 100000bp gaps. The flag "-Xmx30g" specifies how much memory BBMap is allowed to use; this should be set to around 85% of the system's physical RAM.

    Comment


    • #3
      I think maybe what you need are tools for finding SVs, Structural Variants.

      Have a look at the SeqWiki for a list of software.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X