Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Starting my first research project, but don't know what steps to take.

    So basically what I am doing is looking at 20 specimens (Serratia marcescens), which is 40 fastq files and I am trying to compare them to the ancestral genome aka reference genome.

    My first task was the clean up the raw data, but I am clueless. I was told to first find out whether or not each file contains good reads or bad reads (using FASTQC), then remove the adapter sequences with cutadapt or a python script. I've been tinkering with the FASTQC software and I understand the results to a certain extent, but I still don't know how to clean data that I judge to be 'bad'.

    I cannot get cutadapt to work and I am not sure how I would go about using a python script because my knowledge relevant to biology is very rusty. I was told if I can't get cutadapt to work, then a python script would be faster.

    After cleaning the data, my next task is gene mapping. I think I was told to use BWA and the software is installed, but I am still clueless as how I should use it.

    And the final stage is analysis I think...



    Can anyone point me in the right direction? I am having a difficult time understanding everything that is going on, especially trying to understand both the biology and computer science side.

    Thanks.


    Edit:

    I am expected to know how to do all of this in Linux terminal. I have the basics down pretty much, but I have to apply it.
    Last edited by prs321; 06-10-2013, 08:45 AM. Reason: Left out minor details.

  • #2
    Starting my first research project, but don't know what steps to take.

    You have a steep learning curve ahead, but we've all been there.

    Some reading material which should help:

    Basic Linux tutorial:


    BWA web pages:


    Samtools (for manipulating the alignment files) web pages:


    Nature Methods supplement on NGS data analysis


    Hope this helps,
    Maria

    Comment


    • #3
      BWA, as with many Linux programs, will give you some tips if you execute it with no arguments. BWA follows the model in which the first argument is a command to bwa; executing this with no further arguments will further explain that command

      Is your data paired end or single end? Platform?

      Comment


      • #4
        I haven't used BWA; does someone who has know if it will perform adapter trimming when aligning? I'd be surprised if it didn't.

        Comment


        • #5
          Originally posted by Heisman View Post
          I haven't used BWA; does someone who has know if it will perform adapter trimming when aligning? I'd be surprised if it didn't.
          I've always used BWA (and Bowtie2) with adapter trimmed sequences. Re-reading the manual does not indicate the BWA does adapter trimming. Quality trimming, sure, that is easy to implement. Bowtie2 also does not do adapter trimming. Given your "I'd be surprised" statement it seems like you expect aligners to do adapter trimming. Which aligner do you commonly use?

          Comment


          • #6
            Originally posted by krobison View Post
            BWA, as with many Linux programs, will give you some tips if you execute it with no arguments. BWA follows the model in which the first argument is a command to bwa; executing this with no further arguments will further explain that command

            Is your data paired end or single end? Platform?
            They are paired end.

            Comment


            • #7
              Originally posted by Heisman View Post
              I haven't used BWA; does someone who has know if it will perform adapter trimming when aligning? I'd be surprised if it didn't.
              BWA doesn't explicitly adapter trim (search against an adapter database), but the CIGAR for the alignments will include soft trimming (S) operations -- so this can be used to identify & remove adapters.

              Comment


              • #8
                Here is my typical pipeline:

                (1) To clean data
                a. Trim Galore to trim by quality AND remove adapter sequence


                b. Remove duplicate reads with fastxtoolkit (FASTQ Collapser script)


                (2) Map reads
                I've had very good luck with BOWTIE2 but use BWA if you are comfortable with it. BOWTIE2 has 4 built in parameter presents that trade off between speed and sensitivity (very fast, fast, sensitive, and very sensitive).



                My bit of advice (from my own experiences) would be to test your pipeline with a small data set (say 1,000 reads). Once you are comfortable with that, have played around with parameters and understand how the software works and the types of output you get, you'll be ready to ramp up.

                Good luck!

                Comment


                • #9
                  Originally posted by westerman View Post
                  I've always used BWA (and Bowtie2) with adapter trimmed sequences. Re-reading the manual does not indicate the BWA does adapter trimming. Quality trimming, sure, that is easy to implement. Bowtie2 also does not do adapter trimming. Given your "I'd be surprised" statement it seems like you expect aligners to do adapter trimming. Which aligner do you commonly use?
                  Ah, darn it. I was spoiled with Novoalign and assumed it was a more standard thing, but clearly I was wrong.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  31 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X