Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired end seq read lengths are different

    Hi all,

    I recently acquired a dataset from GEO (HiSeq 2500, accession: GSE107029). It is a paired-end but the read_1 is 94 bp while read_2 is 100 bp. Since I've never seen paired-end data with different read length for read_1 and read_2 from HiSeq 2500, I am wondering if anyone can help me understand why read_1 and read_2 have different read lengths.

    Here are a few reads from read_1 and read_2. I downloaded data using
    fastq-dump --split-files SRR6300667

    Read_1 (SRR6300667_1.fastq)
    @SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=94
    CGGAATGCAGCAATCAATGTCGTCGGAAGATCCTGAATAAATCCTACTGTATCTGAAAGAAGAACACTGTAGCCGCTTGGCAGGACCATTTTTC
    +SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=94
    DFDHHH<EEFGGIHIIHCGFDGDF@GHFADFHGICFAEHGHECCAG@GG;EH>CCEA73?;B>@CCCCCCCCCCBBBB???C?BB@@?CCEEC#
    @SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=94
    GGCTCCCCCCTGCAAATGAGCCCCAGCCTTCTCCATGGTGGTGAAGACGCCAGTGGACTCCACGACGTACTCAGCGCCAGCATCGCCCCACTTG
    +SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=94
    FHHHHHJJJJJJJIIJJJJJIJJJJJIJJJJJJIJJJJGHHAEHIHIIHHFFCEEEEDDDDDDDDDDDABDDDDDDDDBDDDDDBDDDDDDDD@
    @SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=94
    TCCTTTAGCTGACCACTTCTTCAAGTAGGCCGGGGATACAAAATCCTTTTGCATGAGGAAAGCTGAAATTCCACACAGGTACCACAAGATATTA
    +SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=94
    EHHHHHEGBGGCHIJGHIFHIHIIIIIIJJJHIIJAHGFGIIJJCFGGGIIBCHHEHGFDEFFEEECCCEDCCCCBDDD:@CCACBBDCDDEED


    Read_2 (SRR6300667_2.fastq)
    @SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=100
    CGATGACCAGAAAAATGGTCCTGCCAAGCGGCTACAGTGTTCTTCTTTCAGATACAGTAGGATTTATTCAGGATCTTCCGACGACATTGATTGCTGCATT
    +SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=100
    @<ADDDDHBHFFEGGGE<CFGHIIIIGCEGDHIGI@GGGCFGHIIIIIHCHAGGHIG@@D>DGHGCACAEEHDFFFFFEDA>B@;,5@3>ADC:A@CCC:
    @SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=100
    ATGTTCCAATATGATTCCACCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTCATCAATGGAAATCCCATCACCATCTTCCAGG
    +SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=100
    CCFFFFDHHHDADEHGGGJJJEECHGDFHGIIJCDGHIGIJJFGAHEHGGGHGBHGEHIIIGHFHEHDDDD@EACECEECDDCC>CACD<>CDCCDCCD9
    @SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=100
    AGCCATACAGGAGATGGGAAACCACGCTATGATACTTTCTGGAAACATTTTATATTTGTTATGATGGACATTTTGCTCGATTGGAGCATGCATAATATCT
    +SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=100
    BCFFFFFHHHHHIHIIIJGJFGHIJJIJJJJIIGGHIJIJJJFJIAHHIHHIJIIJJJJJGIJJIJJJIGIHHHHEHFFFEECECDA?CCDDDDCDDEEF

    I am quite confused.

    Thank you,
    Statsteam

  • #2
    It is possible that the sequencer may have had trouble with last 6 cycles on read 1 so that part of the data has been trimmed. That can be one explanation. You should be able to use this data without any issues.

    Comment


    • #3
      There is no technical constraint on the length of Illumina paired-end reads apart from R1 which should be 25 cycles. In some applications using asymmetric read length is more cost effective. For instance, 10x Genomics single cell RNA-seq libraries can be sequenced in 2x100 configuration using 200 cycle sequencing kit but it can be sequenced 28 cycles for R1 and 90 cycles for R2 using 100 cycle kit with identical outcome.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      9 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X