Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired end- search for complement of Read2 in Read1

    Hi all,
    I am having a bit of trouble with my Illumina paired end reads. Read2 is very poor quality then read2 so after filtering them I ended up with 8mil read for R1 and 3mil for R2. I can't run velveth with this so I want to do is extract from total read2 the reads that I choose to use from read1. How can I do that?? It's driving me insane...

    Thanks

  • #2
    Paired end- search for complement of Read2 in Read1

    You could probably write a script in something like perl to match up your read pairs. Actually, there must be existing scripts to do this.

    Otherwise, you could run velveth with your filtered R1 and R2 files as single end reads.

    Or, you could try cleaning the reads with trimmomatic, which will give you files for R1 and R2 of remaining matched pairs, and separate R1 and R2 files for reads where the mate has been filtered out.

    Hope this helps,
    Maria

    Comment


    • #3
      Thank you but I just started so I am no good with perl... do you know any link for scripts? Or a better search quest? Because I couldn't find anything I guess I am searching with wrong parameters

      so do you think I could run velveth firs with read 1 and then read2 and then what??? Merge the contigs files? Is it not going to be different the output? Using both paired ends wouldn't be more accurate?

      The problem is that I am supposed to find a way to extract from filtered read1 the name of the reads and them extrapolate them fro raw read2... and then uses these two to run velveth...

      I could try to have a look at trimmomatic but I am not sure I could have install in the platform anytime soon

      Comment


      • #4
        See this thread: http://seqanswers.com/forums/showthread.php?t=14708

        Comment


        • #5
          Thank you I'll have a look

          Comment


          • #6
            Have a look at this thread, for a python script:

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc



            Originally posted by flacchy View Post

            so do you think I could run velveth firs with read 1 and then read2 and then what??? Merge the contigs files? Is it not going to be different the output? Using both paired ends wouldn't be more accurate?
            Using the two files as paired reads would probably give a better assembly.

            But you can run velveth with both files as single reads:

            $ velveth dir k -fastq -short read1.fastq read2.fastq

            Comment


            • #7
              I've tried that but it gave me this error:
              >>velveth: Right sequence file 'Read2QualityFiltered.fastq' has too few sequences
              that's why I am tring a way to extract from filtered read 1 the ID of the reads (I have done that with grep) and now I need a script to extrapolate the corresponding reads in read2... the command they propose in another form:
              [QUOTE=maasha;53769]You can do this with Biopieces (www.biopieces.org) like this:

              First you need a file with the FASTQ sequence names you are interested in - or IDs if you like - one per line. And then:

              Code:
              read_fastq -i in.fastq | grab -E ids.txt | write_fastq -xo out.fastq
              need to install biopiece, so I am trying to get that installed ... but if I could found also other ways it will be better, since I don't know if that one will work on my data...

              is it to confusing??? This is my first month of PhD and I do have tons of things to learn...

              Comment


              • #8
                You're going the complicated way... simply filter your data using a tool that works already with paired-end reads, as mastal said.

                I find PRINSEQ pretty easy going, with good documentation:


                It will give you a file with your reads_1, a file with your reads_2 (both of them paired), and those single good reads as a file called singletons. You can even recover the discarded reads if you wanted.

                But anyway, if you're getting 3 M reads for one of the pairs, it means your provider probably did something wrong (as long as you're not too strict in the filtering...)

                Champi

                Comment


                • #9
                  Thank you...
                  I will try first trimmomatic since it should be specific for illumina while prinseq is specific for 454 ...

                  Before I used fastx toolkit and this what I choose to filter we were trying not to be to strict but it is still viral metagenomic...

                  fastq_quality_filter -Q33 -q 18 -p 60 -v
                  Last edited by flacchy; 05-23-2013, 06:43 AM.

                  Comment


                  • #10
                    PRINSEQ is not specific for 454. That was when it was designed, but I have used it for my Illumina data pretty well. As I said, you should read the documentation, it is everything there.

                    I haven't used trimmomatic, but I'm sure it'll do pretty much the same, so it's up to you what to use.

                    Good luck!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X