Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CAP3 for forward and reverse reads

    Hi Everyone,

    I'm trying to use CAP3. I have two small files of paired reads- one with forward reads and the other with reverse reads. I'm trying to use their manual and figuring out how I can specify forward and reverse reads, but I'm a little confused by it. I want to know if anyone else has used this and could help me out. I just don't understand what they mean by "dots" and how to go about doing that.

    The information from the manual that pertains to what I'm doing:


    Input to CAP3

    CAP3 takes as input a file of sequence reads in FASTA format.
    If the names of reads contain a dot ('.'), CAP3 requres that
    the names of reads sequenced from the same subclone contain
    the same substring up to the first dot.
    CAP3 takes two optional files: a file of quality values
    in FASTA format and a file of forward-reverse constraints.

    The file of quality values must be named "xyz.qual", and
    the file of forward-reverse constraints must be named "xyz.con",
    where "xyz" is the name of the sequence file.
    CAP3 uses the same format of a quality file as Phrap.

    Each line of the constraint file specifies one forward-reverse constraint
    of the form:

    ReadA ReadB MinDistance MaxDistance

    where ReadA and ReadB are names of two reads, and
    MinDistance and MaxDistance are distances (integers) in base pairs.
    The constraint is satisfied if ReadA in forward orientation occurs
    in a contig before ReadB in reverse orientation, or
    ReadB in forward orientation occurs in a contig before ReadA
    in reverse orientation, and their distance is between MinDistance
    and MaxDistance.
    CAP3 works better if a lot more constraints are used.

    We have a separate program named "formcon" to generate
    a constraint file from the sequence file.
    The program takes an input file of fragments in FASTA format
    and two integers (minimum distance and maximum distance in bp).
    The minimum distance and maximum distances specify a lower and
    a upper limit on the subclone length, respectively.
    It produces a file of forward-reverse constraints for CAP3.
    It is assumed that a pair of forward and reverse reads must
    contain a dot in their names and a pair of forward and reverse reads
    have a common name up to the first dot.
    Because CAP3 uses reads whose ends are clipped, instead of raw reads,
    to measure their distance, the distance seen by CAP3 could be different
    from the insert size by 1000 to 1500 bp. For example,
    if the insert size is 2000 to 3000 bp, we recommend that you use
    500 for the minimum distance and 4000 for the maximum distance.
    The results are in the file with name ending in ".con".

    Any help would be appreciated, thanks!

  • #2
    Hmm, somethings like this:

    cat mySeq.fasta
    >MyReadA.f
    ACGT
    >MyReadA.r
    TCGA
    >MyReadB.f
    ACGT
    >MyReadB.r
    TCGA
    plus the quality file.

    cat mySeq.con
    MyReadA.f MyReadA.r 1000 2000
    MyReadB.f MyReadB.r 1000 2000
    You can use for forward/reverse whatever is approbiate; I just used 'f' and 'r' as an example.

    Comment


    • #3
      Thanks! Now I'm having issues creating that .con file.

      If I have those reads in fasta format, is there a script that would create the .con file for me? And would I need to change these headers? The headers are like this, I just have it simplified below.

      >D3NH4HQ1:107:C0LN7ACXX:1:1101:10356:54822 1:N:0:ATCACG

      My fasta file is:
      >xyz /1
      ATGC
      >xyz /2
      GCCC
      >abc /1
      TAAT
      >abc /2
      GGGC

      so with a file with hundreds of reads how can I extract that information?

      I would want:

      xyz /1 xyz /2 200 500
      abc /1 abc /2 200 500

      Comment


      • #4
        You have Illumina reads .. sure you want to use cap3?

        Maybe you should have a look at MIRA (http://sourceforge.net/apps/mediawiki/mira-assembler/).

        If you still want to use cap3 you have to write a tiny perl script (or sh or awk) to do this job for you.

        Comment


        • #5
          Actually the reads represent genes. So one file of /1 reads and the other of /2 reads represent one gene, not a whole transcriptome.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          58 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          45 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X