Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by ssully View Post
    But 454 paired end reads are two 'end' reads connected by a linker sequence. Does the IonHammer corrector actually recognize those and split the reads before correcting? Or do the 454 PE reads have to first be split into left/right by linker removal, then run through --only-error-correction?

    ...and oriented rf (reverse-forward) if they are to be interpreted as Illumina mate pairs? (yes, they are low coverage)
    You need to split them before, yes. Make sure you specified the correct library type (mate pairs) and the correct orientation (whatever you have, e.g. even ff is supported). See http://spades.bioinf.spbau.ru/releas...al.html#sec3.2 for more information

    Comment


    • #32
      I have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:

      schematic of original read

      Code:
      ================================^^^^^^^^^^^^^^^=======================
      454_1--->                             linker    454_2--->
      But I'm a bit confused as to what mp parameters to feed to SPAdes for 454 mate pair reads,
      because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them

      schematic of assembled reads

      Code:
      454_2                                                   454_1
      -------->                  (~3kb)                        -------->
      ==================================================================
      How to make sure SPAdes assembles these pairs in correct order and orientation?

      would a YAML readset section like this work?

      {
      orientation: "ff",
      type: "mate-pairs",
      right reads: [
      "/FULL_PATH_TO_DATASET/454_1.fastq"
      ],
      left reads: [
      "/FULL_PATH_TO_DATASET/454_2.fastq"
      ]
      },


      or should it be


      {
      orientation: "ff",
      type: "mate-pairs",
      right reads: [
      "/FULL_PATH_TO_DATASET/454_2.fastq"
      ],
      left reads: [
      "/FULL_PATH_TO_DATASET/454_1.fastq"
      ]
      },

      ?



      (I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )
      Last edited by ssully; 12-02-2014, 07:10 PM.

      Comment


      • #33
        Originally posted by ssully View Post
        I have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:

        schematic of original read

        Code:
        ================================^^^^^^^^^^^^^^^=======================
        454_1--->                             linker    454_2--->
        But I'm a bit confused as to what mp parameters to feed to SPAdes for 454 mate pair reads,
        because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them

        schematic of assembled reads

        Code:
        454_2                                                   454_1
        -------->                  (~3kb)                        -------->
        ==================================================================
        How to make sure SPAdes assembles these pairs in correct order and orientation?

        would a YAML readset section like this work?

        {
        orientation: "ff",
        type: "mate-pairs",
        right reads: [
        "/FULL_PATH_TO_DATASET/454_1.fastq"
        ],
        left reads: [
        "/FULL_PATH_TO_DATASET/454_2.fastq"
        ]
        },


        or should it be


        {
        orientation: "ff",
        type: "mate-pairs",
        right reads: [
        "/FULL_PATH_TO_DATASET/454_2.fastq"
        ],
        left reads: [
        "/FULL_PATH_TO_DATASET/454_1.fastq"
        ]
        },

        ?



        (I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )
        The second variant looks correct to me, basically you need to specify the first and the second read of a fragment and how they were read (in which direction).

        Anyway, you can simply feed the data to SPAdes and check whether it inferred the insert size distribution properly.

        Comment


        • #34
          I don't know; the second variant seems to be saying to me , 'the reads from the right side of the library read (post-linker, 454_2.fastq) belong at the right end of the genome fragment' -- which would be incorrect.

          For me it really comes down to what 'right reads' and 'left reads' means in the YAML specification:

          e.g. does 'right reads' refer to a read's position in the 454 mate pair library read (i.e., right side/post-linker in the 454 read, but maps to the left end of the genomic fragment) or with respect to the genome (i.e., maps to the right end of the genomic fragment...but comes from the left side/pre-linker half of the 454 read)


          (it's also unusual to me that 'right read' is specified before 'left read' in the YAML, for both paired end and mate pair types, given that sequences are typically read by humans from left to right, 5' to 3'... is there a particular reason for that?)

          But anyway I can try inputting it both ways, in two runs, and see which one assembles the 454 mate pairs correctly.
          Last edited by ssully; 12-03-2014, 01:12 PM.

          Comment


          • #35
            I worked out the correct orientation and order of 454 paired reads input for SPAdes, and have corrected the reads with --iontorrent option (ionhammer). Btu now I have questions regarding ionhammer error correction -- does it pay any attention to fastq quality scores?


            here is an original paired-end sff read (converted to fastq -- note 'sanger style' quality scores, and lower case for low-quality bases). I have underlined that the part that constitutes the 'post linker' read.

            sff to fastq
            @GIDY76W02G4JWL
            Code:
            tcagTTATTGATCAGTATTAGAATGAGGCCTATTAATAGCCAATTATCACATTTTGGATCTATTTTGTATCGATGATATCATTTATCGATAATCATCATAGTTATTTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA[U]TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGctgagactgccaaggcacacaggggatagg[/U]n
            +
            III;;;;BIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII;:8599>>@:9////92EBEDDDGIIIIIIFEC?:??IIHHHEIIIIIIIIIICCECC:??C==?EEEIEGHHIIIIIIIIGHFHGIIIIC?==CIIIIEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII!
            here is the postlinker read, after sffToCA (a tool from Celera Assembler) has removed the linker from the original sff read and split it into two reads (parameters were set to perform NO quality trimming, since I expected ionhammer to do that -- so all the 'low quality' bases remain at the end of the read, but are converted to upper case. Fastq scores remain the same):

            sffToCA
            Code:
            @GIDY76W02G4JWLb clr=0,95 clv=1,0 max=1,0 tnt=1,0 rnd=t
            TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG
            +
            IEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII

            here was my spades command
            Code:
            spades.py --only-error-correction --iontorrent --dataset 454_4.yaml -t 8 --sc -k 21,33,55  --disable-gzip-output -o sff2ca_spades_corrected

            and here is the output of ionhammer for the above read
            Code:
            >GIDY76W02G4JWLb
            TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCT[U]G[/U]TTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG
            The only difference is the removal of a single G base (at the underlined position) in the middle of the read (not even as part of a homopolymer)...all of the low-quality (originally lower case) bases remain.

            So, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?
            Last edited by ssully; 12-06-2014, 07:50 AM.

            Comment


            • #36
              Originally posted by ssully View Post
              So, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?
              This is more or less expected. IonHammer is conservative - when it fails to correct something it preserves the original read and postpones the final decision to assembler. In general we suggest not to trim reads when the coverage is low.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM
              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-14-2024, 06:13 AM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-08-2024, 08:03 AM
              0 responses
              71 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-07-2024, 08:13 AM
              0 responses
              80 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-06-2024, 09:51 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X