Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina paired-end reads...

    Hi,

    Does anyone know if the two sequences produced by an Illumina paired-end read show the two sequences as being read from opposite ends or are the reads correspond to just forward reading?

    Also, the two sequences *.1_sequence.txt and *.2_sequence.txt output from the paired-end reads are not of the same size (these are fastq formatted files). In my case one file is a GB bigger than the other and both are 30GB+ in size. What I really would like to know is if I split the large file into parts, do I interpret the 1_sequence as being going forward and 2_sequence going reverse? Or do I consider both going in just the same direction (say, 5' to 3')?

    Any answers or pointers would be highly appreciated...

    TiA,

    Nash

  • #2
    I would VERY carefully check the integrity of the files if they are at all different in size when uncompressed, let alone a GB. Your one file is probably truncated. The only time that you would expect them not to have exactly the same filesize is if you read farther on one side than the other, which is not something which is done very often (but does have interesting applications in the literature)

    The two reads are shown in opposite orientations in every dataset I have received -- they are shown as they are read.

    I.e., each would be read as the --> in the below diagram of the DNA
    Code:
    --->=====   
    =====<---

    Comment


    • #3
      It's possible one of your reads has an indexed attached, which would make it larger. It's also possible that the sequencer ran out of reagents near the end of one of the reads, making it smaller. Can you check if the read lengths are different between the two files, or are they the same and you have a different number of lines in each file?

      Comment


      • #4
        Originally posted by krobison View Post
        I would VERY carefully check the integrity of the files if they are at all different in size when uncompressed, let alone a GB. Your one file is probably truncated. The only time that you would expect them not to have exactly the same filesize is if you read farther on one side than the other, which is not something which is done very often (but does have interesting applications in the literature)

        The two reads are shown in opposite orientations in every dataset I have received -- they are shown as they are read.

        I.e., each would be read as the --> in the below diagram of the DNA
        Code:
        --->=====   
        =====<---
        Thank you very much for enlightening me about these reads! Well, on closer examination the 2_sequence does have longer reads (by 8 nts) in each of the lines as compared to the 1_sequence read. So I am not at all sure how I need to treat these outputs. I have also asked our core to let me know why the reads are of different length and whether we should repeat the sequencing in order to get better results...

        if you can give me some more pointers on these paired-end reads, I'd appreciate. Thanks again,

        Nash

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X