Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina FASTQ format question...

    I am having some confusion over an Illumina FASTQ formatted file I have been asked to assemble. Looking at the file though has lead me to some confusion. Originally I had expected two files each containing one 36bp long read for each paired-end. Instead what I got was one file with sequences and quality lines that are 77 characters long.

    I had inquired from the originator of the file what is going on and they said that the file simply hadn't been split and that the lines were in fact the paired-end reads concatenated. They suggested that I simply split the sequence up and write them out into two files.

    My problem is with the math, 77 is not 36*2. This leaves me wondering what is going on with the remaining 5 bases. So I would like to see if someone can clear up my confusion by answering a couple of questions.

    Is this file a "standard" Illumina/Solexa sequence file?
    What is the deal with the concatenated reads?
    Why wouldn't I want to last 5 bases? Are they adaptors? Low-quality?

    For now I am going to do as suggested and just split the 77 bases in two 36 bases sequences and toss the last 5.

    Thanks for any help you can provide in clearing up my confusion.

    -steve

  • #2
    Originally posted by scozza View Post
    Is this file a "standard" Illumina/Solexa sequence file?
    What is the deal with the concatenated reads?
    Why wouldn't I want to last 5 bases? Are they adaptors? Low-quality?
    First time I see something like that. I would expect, as you say, two
    separate files, one per each read. Are you sure this is not a fragment-76bp
    run?

    I suggest you map the data treating the reads as FR-77 before doing
    anything else.
    -drd

    Comment


    • #3
      Originally posted by drio View Post
      First time I see something like that. I would expect, as you say, two
      separate files, one per each read. Are you sure this is not a fragment-76bp
      run?

      I suggest you map the data treating the reads as FR-77 before doing
      anything else.
      No I am not sure. The info I have at this point comes from an email exchanges I had with the group that sequenced it in which they said that the reads had not been separated for me and that I should run a Perl script they provided to do the splitting.

      Still waiting to hear back from them.

      -steve

      Comment


      • #4
        Do you have the summary.(htm|xml) file? What's the % of alignment telling you? Are you seeing stats for READ1 and READ2?
        -drd

        Comment


        • #5
          Originally posted by drio View Post
          Do you have the summary.(htm|xml) file? What's the % of alignment telling you? Are you seeing stats for READ1 and READ2?
          I didn't but fortunately I didn't need it. My contact at the group that sequenced this got back to me. It turns out these are 77bp single-end reads. Somewhere some miscommunication happened. This is a load off my mind because I thought either I was crazy or the assembler I was using was buggy.

          Thanks Drio appreciate your help!

          -steve

          Comment


          • #6
            Build a histogram of the quality values per cycle (we use a local tool called fastqcheck to do this). It clearly shows the end of the first read and the start of the other by the gradual decay in quality per cycle resetting back up to a high quality again (1st cycle of 2nd read). This will indicate the actual number of cycles rather than the claimed number, and it's nicely independent of any html or illumina QC so you can run it on data passed to you from more random sources. It's also a good QC check and can show sudden dips or loss of signal.

            Is it possible that this run was a tagged/indexed run too? The index tag normally resides between the 1st and 2nd read, and again it'll be clearly visible as a sudden jump in the quality values.

            James

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            66 views
            0 likes
            Last Post seqadmin  
            Working...
            X