Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regarding Qiime Metadata Mapping File

    Hello,

    Library specs: Paired End, Read length 150 bp, V3 region 16S rRNA gene
    Platform : MiSeq, Illumina
    Experiment : Wheat Field, rhizosphere samples, Elevated CO2 and temperature
    Computational platform : AWS EC2, Qiime 1.8.0

    I am a Qiime newbie, have total 39 (13x3) samples, which represent 12 Treatments and 1 control with 3 replicates per Treatment and also control.

    According to Qiime Documentation , for creating the metadata file I require Sample ID, Barcode, Primer sequence and description.

    As this sequencing was done by a commercial provider, they refuse to provide barcode sequences.

    Ques1: What should I use as Sample ID ? Does it have to be a part of read name?

    Ques2: For Beta diversity analysis, I would like the 3 replicates pooled for every treatment, how should the mapping file be constructed for this?
    Given that I do not have barcode sequence.

    Any help / pointers / comments are appreciated.

    --
    pg

  • #2
    bump.

    For a MiSeq V3 Data set multiple samples (3 replicate per sample), with barcode used but sequence not available, how to create a meta-data file so the samples can be associated by Qiime with the corresponding SampleId in the file ?

    Comment


    • #3
      Originally posted by gprakhar View Post
      Hello,

      As this sequencing was done by a commercial provider, they refuse to provide barcode sequences.

      Any help / pointers / comments are appreciated.

      --
      pg
      That is odd indeed. Since the barcodes have done their work of separating the samples can you use a subset from illumina barcodes list (any other codes for that matter) to go forward.

      I assume these sequences were demultiplexed on the MiSeq and you do not have the barcodes available in Fastq ID header.

      Comment


      • #4
        Originally posted by GenoMax View Post
        I assume these sequences were demultiplexed on the MiSeq and you do not have the barcodes available in Fastq ID header.
        Unfortunately, no you would not have the index sequences written in FastQ definition line. The MiSeq output only includes the index number (an integer from 1-N where N is the number of libraries listed in the sample sheet) in the read definition line. This differs from the behavior of CASAVA/Bcl2fastq which includes the actual index read in the definition line. Why does Illumina do this? No clue.

        Comment


        • #5
          That is what I figure has happened. I just wanted to confirm.

          MiSeq is meant to be a sequencing "appliance" with minimal "user serviceable" parts so I assume things are kept simple.

          I do not understand why the provider would not make the barcodes available (it's not like they are a state secret).

          Comment


          • #6
            Originally posted by GenoMax View Post
            I do not understand why the provider would not make the barcodes available (it's not like they are a state secret).
            This may just be a communication breakdown. The service provider meaning the MiSeq software does not report the index sequence for each read (like the HiSeq does) so they simply do not have that data to provide.

            I will also add my (completely unsolicited so fee free to ignore it) 2¢ about Qiime and MiSeq data. I often encounter researchers who to want to faithfully reproduce the pipeline in the Qiime tutorial, which assumes the input data still requires demultiplexing, primer and inline barcode trimming. This was designed in the era of 454 data; this isn't the case for MiSeq data. MiSeq data is already demultiplexed; the Illumina sequencing methodology places the index in a separate read, not part of your sequence read so there is no need to trim barcodes. Depending on the method used to generate your 16S amplicons there is no need to trim PCR primer sequences since the sequencing primers used are the same as the PCR primers thus no part of the PCR primer ends up in your final read (e.g. the Caporaso & Knight method and the Schloss method).

            Qiime is a great tool for studying bacterial community diversity but just be aware that all of these pre-processing steps were designed around a different type of input data (e.g. 454). Instead of trying to shoehorn MiSeq data into this pipeline, you need to adjust your pre-processing steps to the standard output of the MiSeq.

            Comment


            • #7
              Originally posted by GenoMax View Post
              That is odd indeed. Since the barcodes have done their work of separating the samples can you use a subset from illumina barcodes list (any other codes for that matter) to go forward.


              Hello,

              I assume these sequences were demultiplexed on the MiSeq and you do not have the barcodes available in Fastq ID header.
              So that means that in the Qiime mapping file I can use any barcode sequence, it only has be a unique one for each sample & same for all replicates of a sample ?

              According to my understanding of the Qiime pe-processing the barcode sequences are used to separate out the samples by the split_libraries.py .. Is that correct ?

              Regards,
              Last edited by gprakhar; 08-05-2014, 12:14 AM.

              Comment


              • #8
                Originally posted by kmcarr View Post
                Unfortunately, no you would not have the index sequences written in FastQ definition line. The MiSeq output only includes the index number (an integer from 1-N where N is the number of libraries listed in the sample sheet) in the read definition line. This differs from the behavior of CASAVA/Bcl2fastq which includes the actual index read in the definition line. Why does Illumina do this? No clue.
                I am still not clear about what exactly should I use as SampleId in the Qiime mapping file.
                But from this post I assume a part of fastq header can be used for this to identify the samples uniquely, is it so ?
                and in case of replicates, does the sampleID remain the same ?

                Comment


                • #9
                  Originally posted by GenoMax View Post
                  That is what I figure has happened. I just wanted to confirm.

                  MiSeq is meant to be a sequencing "appliance" with minimal "user serviceable" parts so I assume things are kept simple.

                  I do not understand why the provider would not make the barcodes available (it's not like they are a state secret).
                  The sequencing was done by a third party sequencing provider.
                  On requesting them for (1) Adapter sequences, (2) barcode and (3) Primer sequence for assembling the Paired end reads.

                  The commercial provider, they gave an FAQ document
                  (1) the V3 Primer seq both F & R
                  (2) link to Illumina chemistry documentation for Adapter sequence
                  (but no Truseq version, so I am still not clear which Adapters to use as that document has about 5 different Truseq versions)
                  (3) as for Barcode,
                  2. What is Barcode sequence used ?
                  The bar code sequences are proprietary sequences and are unable to provide it.

                  Comment


                  • #10
                    Originally posted by kmcarr View Post
                    This may just be a communication breakdown. The service provider meaning the MiSeq software does not report the index sequence for each read (like the HiSeq does) so they simply do not have that data to provide.

                    I will also add my (completely unsolicited so fee free to ignore it) 2¢ about Qiime and MiSeq data. I often encounter researchers who to want to faithfully reproduce the pipeline in the Qiime tutorial, which assumes the input data still requires demultiplexing, primer and inline barcode trimming. This was designed in the era of 454 data; this isn't the case for MiSeq data. MiSeq data is already demultiplexed; the Illumina sequencing methodology places the index in a separate read, not part of your sequence read so there is no need to trim barcodes. Depending on the method used to generate your 16S amplicons there is no need to trim PCR primer sequences since the sequencing primers used are the same as the PCR primers thus no part of the PCR primer ends up in your final read (e.g. the Caporaso & Knight method and the Schloss method).

                    Qiime is a great tool for studying bacterial community diversity but just be aware that all of these pre-processing steps were designed around a different type of input data (e.g. 454). Instead of trying to shoehorn MiSeq data into this pipeline, you need to adjust your pre-processing steps to the standard output of the MiSeq.
                    Hello,

                    I do understand that the pre-processing for MiSeq Paired End data is different.
                    For my data I first assemble the Paired end reads, using PANDAseq. So no need for split_libraries.py

                    As mentioned in first post, I have multiple samples, with 3 replicates per sample.
                    From my understanding of Qiime, I should be able to process all the samples together in a single run of Qiime. To achieve this I would assume the mapping file holds the key.
                    Since I do not have barcode hence the confusion in creating the mapping file.

                    As per GenoMax's reply,
                    this would be achievable with any barcode seq and the unique SampleId would come from the fastq header ??

                    Comment


                    • #11
                      Originally posted by gprakhar View Post
                      I am still not clear about what exactly should I use as SampleId in the Qiime mapping file.
                      But from this post I assume a part of fastq header can be used for this to identify the samples uniquely, is it so ?
                      and in case of replicates, does the sampleID remain the same ?
                      I am not an Qiime expert but the following seems logical. kmcarr (or someone else more knowledgeable) can correct the info.

                      You should use the sampleID you have for the samples as shown in the example here: http://qiime.org/1.6.0/documentation...-file-overview. Be aware that the sampleID that you use to make the file would have to be added to the demultiplexed data files as shown in the example (See "Handling already demultiplexed samples" section). I am not certain if you can create your mapping file without the "barcodes/primers" (ref doc link) and use it. That way you would not need to worry about barcodes.

                      Comment


                      • #12
                        Originally posted by kmcarr View Post

                        I will also add my (completely unsolicited so fee free to ignore it) 2¢ about Qiime and MiSeq data. I often encounter researchers who to want to faithfully reproduce the pipeline in the Qiime tutorial, which assumes the input data still requires demultiplexing, primer and inline barcode trimming. This was designed in the era of 454 data; this isn't the case for MiSeq data. MiSeq data is already demultiplexed; the Illumina sequencing methodology places the index in a separate read, not part of your sequence read so there is no need to trim barcodes. Depending on the method used to generate your 16S amplicons there is no need to trim PCR primer sequences since the sequencing primers used are the same as the PCR primers thus no part of the PCR primer ends up in your final read (e.g. the Caporaso & Knight method and the Schloss method).

                        Qiime is a great tool for studying bacterial community diversity but just be aware that all of these pre-processing steps were designed around a different type of input data (e.g. 454). Instead of trying to shoehorn MiSeq data into this pipeline, you need to adjust your pre-processing steps to the standard output of the MiSeq.
                        I agree completely. Qiime folks should redo this part of the pipeline to account for the switch in predominant sequence technology from 454 to Illumina. Everyone seems to have to do these transformations just to get their data into Qiime.

                        Comment


                        • #13
                          Originally posted by GenoMax View Post
                          I agree completely. Qiime folks should redo this part of the pipeline to account for the switch in predominant sequence technology from 454 to Illumina. Everyone seems to have to do these transformations just to get their data into Qiime.
                          To give Qiime developers some credit, they are making progress in that regard. The latest version 1.8 added join_paired_ends.py and extract_barcodes.py scripts, which is a significant step forward.

                          Comment


                          • #14
                            Originally posted by gprakhar View Post
                            Hello,

                            Library specs: Paired End, Read length 150 bp, V3 region 16S rRNA gene
                            Platform : MiSeq, Illumina
                            Experiment : Wheat Field, rhizosphere samples, Elevated CO2 and temperature
                            Computational platform : AWS EC2, Qiime 1.8.0

                            I am a Qiime newbie, have total 39 (13x3) samples, which represent 12 Treatments and 1 control with 3 replicates per Treatment and also control.

                            According to Qiime Documentation , for creating the metadata file I require Sample ID, Barcode, Primer sequence and description.

                            As this sequencing was done by a commercial provider, they refuse to provide barcode sequences.

                            Ques1: What should I use as Sample ID ? Does it have to be a part of read name?

                            Ques2: For Beta diversity analysis, I would like the 3 replicates pooled for every treatment, how should the mapping file be constructed for this?
                            Given that I do not have barcode sequence.

                            Any help / pointers / comments are appreciated.

                            --
                            pg
                            Not sure if gprakhar has solved this issue. I am both a qiime and sequencing newbie, so I kind of understand what the situation is.

                            The problem is with MiSeq platform, the machine has already demultiplexed the samples. So usually a genomic facility will only provide users with separate demultiplexed sample files. The Illumina adapter sequences and barcode sequences (I mean the barcode you provided to Illumina within the sample sheet) have already been cut in sequences in the separate sample files. Therefore, even the sequencing guy provided you with the barcode sequences, they are useless to solve your problem which is to use QIIME to analyze the data.

                            In QIIME, in order for your data to be analyzed, all your sequences have to be in one fasta file. And each different sequence within one sample has to have a unique sample ID. E.g. sequence No.1 in sample.1 should have a sampleID like sample.1_1. So it is easy to combine sequences from different samples into one file, but it is a little bit tricky to rename all the sequences according to above rule.

                            Fortunately, I just found QIIME do have a function that works for this, at least in the latest version (1.8.0) . The function (? I do not know what is the name of this) is add_qiime_labels.py. You can check this in QIIME documentations on how to use it.

                            A little bit more on the mapping file. In this case, we do not have to provide barcode or linkerPrimer sequence in the mapping file. When you check the mapping file using validate_mapping_file.py, if you add -p -b in the end, the function will not check for barcode and linkerprimer.

                            Regarding to your question 2, it is actually pretty easy. Since I already typed a lot. I'll stop here.
                            Last edited by ETWang; 10-16-2014, 07:26 PM.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            25 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            29 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            25 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X