Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to split fastq cointaing shared forward barcodes but different reverse ones?

    Hi, I need your help because I'm completely lost with that. I received a paired-end sequencing containing many samples in a forward and reverse paired end fastq set of files as shown below.
    The reason the sequencing center bring us the sequencing in that format is because they used a different set of primers wich allow them to improve the sequencing quality.

    Code:
    Librerires_S4_L001_R1_001.fastq 
    Librerires_S4_L001_R2_001.fastq
    I was expecting to find a software that could extract the samples in a way similar as shown below, not because is just a personal plan but because is commonly used in some softwares (qiime is an example).

    Code:
    ##  [1] "PN1R1_L001_R1_001.fastq"   "PN1R1_L001_R2_001.fastq"  
    ##  [3] "PN3R2_L001_R1_001.fastq"   "PN3R2_L001_R2_001.fastq"
    My problem started when I figured out that some of those samples share the forward barcode, but the difference is in the reverse one, and I´ve never seen something like that. I assume is a feature of modern sequencing platforms with high capacities and with the propper sofware those could be easily splitted and assign to propper derived fastq files.

    Code:
    Sample         Espacer Forward   Espacer Reverse
    1   PN1R1             A                B
    2   PN3R1             A                C
    3   PN1R2             B                C
    4   PN3R2             B                D
    As you can see, the forward A is contained in two samples, but those doesn´t have the same reverse barcode. As example of the files, show that contain a barcode that can be in the forward and the reverse, an index doing a difference and the forward and reverse primer.

    Code:
      SAMPLE              BARCODE        INDEX   SPECIFIC PRIMER
      For_A  FORWARD    CCTAAACTACGG            CCTACGGGNGGCWGCAG
      For_B  FORWARD    TGCAGATCCAAC      T     CCTACGGGNGGCWGCAG
      Rev_B  REVERSE    TGCAGATCCAAC      A     GACTACHVGGGTATCTAATCC 
      Rev_C  REVERSE    CCATCACATAGG      TC    GACTACHVGGGTATCTAATCC 
      Rev_D  REVERSE    GTGGTATGGGAG      CTA   GACTACHVGGGTATCTAATCC
    What I need is to find a sofware that could pick the samples acording to their respective barcodes, even if those are shared in some side and separate between samples. I've been trying some softwares (qiime1, fastx, mothur) but nothing worked as expected. Also I wanted to check qiime2 and SeekDeep too, but at this point I don´t want to waste time checking each software without having a real idea of what they can do.

    Does somebody know that kind of post processing and give me a tip of a program which does that kind of job? I would be totally grateful for any hint.

    Sorry for this large post but I just wanted to give as much details as I could. Thanks for your time

  • #2
    Did you ask the sequencing centre how they typically demultiplex?

    Comment


    • #3
      It kind of sounds like your sequencing provider hasn't bothered to demux on the dual index.

      You might be able to use bbtools Demuxbyname.sh

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      There's quite a few other threads on the site about it

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      I'd be going back to the sequencing provider to get them to do the job properly with bcl2fastq.

      Comment


      • #4
        Hi! thanks for your reply. Actually I asked the reason why they didin´t performed a demultiplex propperly and they explained me that they used a protocol to improve quality and supposedly bcl2fastq would not work as it should.

        in the other hand I received a sofware called fqgrep which should solve that problem. At the moment I've just compiled it and I will do the first tests. If it works I will post the solution. If it doesn´t I will try the suggestion posted by Bukowski.

        Thanks for your time!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        33 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        48 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        34 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Working...
        X