Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina solexa 75bp format problem

    I don't know why every read ends with 22 Ns. Please tell me.

    @HWI-EAS241:5:1:10:83#0/1
    GCCCCGTCCATCACTTCTGCGATGCCGCGAATGCCCAATGGCAAGCCGNCGGGNNNNNNNNNNNNNNNNNNNNNN
    +HWI-EAS241:5:1:10:83#0/1
    [a``_`X_O\Q\YQ[Z\O[a\WXNXZZBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    @HWI-EAS241:5:1:10:1808#0/1
    TGCTGCGGCCCAATGGAGCCACGTTGCCCTGGTGCTTGCCCTTGGGATNGTGGNNNNNNNNNNNNNNNNNNNNNN
    +HWI-EAS241:5:1:10:1808#0/1
    [aaaaaaa\UX_aaa\U__`a`a`a_^Ua``P\a_aa_\TWa`BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    @HWI-EAS241:5:1:10:1866#0/1
    TGGCCGCCTGCGTCACGCCGATTGTCAGCGCCGTGGGCCATGAAACCGNCGTGNNNNNNNNNNNNNNNNNNNNNN

  • #2
    There is something else funny in those records - the spaces in both the sequences and the quality strings. Are those spaces real, or some a cut & paste corruption, or quirk of the forum editor?
    Last edited by maubp; 08-28-2009, 05:11 AM. Reason: fixed typo

    Comment


    • #3
      When you say "every read" do you literally mean EVERY read? Is it the entire flow cell, one lane, part of lane? Did anything happen to the instrument between cycles 53-54, such as reagents being refilled or software restarted?

      Comment


      • #4
        Yes, every read. I used the velvet to assemble the genome. I did not know if it would affect the result of assemble. Whether should I remove the Ns first?

        Comment


        • #5
          The only time a N gets put into the sequence is when the base caller cannot match a cluster in the current tile. Typically this happens at the edge, when clusters "wander" on and off the image. Based on the fact your read quality went kaput in last 20-odd bases, I would guess one of the reagents ran out or was bad -- most likely the incorporation mix -- and you got no cluster illumination.

          You do need to trim the N's out before you put the sequences into velvet. Probably easiest to do by rerunning gerald with the USE_BASES param set to Y52n*.

          Edit: although it does occur to me the N followed by 4 called bases **might** indicate a laser issue -- highly unlikely, in my opinion, but you might want to discuss it with your FAS.
          Last edited by dcjamison; 08-31-2009, 06:53 AM.

          Comment


          • #6
            If you want to just edit the FASTQ file, here is a tiny Biopython script to do this for you (take just the first 52 bases of each read):

            Code:
            from Bio import SeqIO	 
            trimmed= (rec[:52] for rec in \	 
                      SeqIO.parse(open("original.fastq"), "fastq"))	 
            out_handle = open("trimmed.fastq", "w")	 
            SeqIO.write(trimmed, out_handle, "fastq")	 
            out_handle.close()
            That should work on Biopython 1.51 or later (and probably 1.50 from memory).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X