Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • number of read in RNA-seq?

    another biologist here.

    I got RNA sequence (in .txt format) back from sequencing facility. Each file of 2 lanes high throughput sequencing is nearly 25GB.

    I want to know the number of read in each file. Which programmer or code I should use.

    thanks you.

  • #2
    Try the LINUX command "wc -l <filename>". This will count the number of lines in the file. If the file is in fastq format, you will have to divide the number of lines by 4. This will give you the number of reads.

    Comment


    • #3
      why we use -l in the command?
      Last edited by ashiq.hussain; 07-12-2011, 12:17 PM.

      Comment


      • #4
        its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps

        Comment


        • #5
          Originally posted by upendra_35 View Post
          its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps
          This will not work! The '@' character may appear in fastq quality lines as well as the seq-id line. See this thread for a discussion on the problems of using grep to count reads in fastq files.

          Comment


          • #6
            thank you all.

            Comment


            • #7
              Sorry I missed "^" with @. It should be grep "^@" -c <filename>. However I agree with kmcarr that '@' is a character that may appear in fastq quality lines as well (sometimes at the start of the line) and so even that may not work. So the best way to do it is in two lines: wc -l <filename> followed by expr <no.of.lines> / 4

              Comment


              • #8
                compressed files

                Does it work with .gz files, or do I have to unpack them?

                Comment


                • #9
                  no use the power of piping

                  gzip -d -c input.fastq.gz | wc -l

                  Comment


                  • #10
                    I don't mean to revive a closed thread, just in case someone needs to know the number of lines in multiple compressed fastq files...

                    for file in *.bz2; do echo $file; b=$(bzcat $file | wc -l); echo $(($b/4)); done;

                    Comment


                    • #11
                      Great. Thanks. Very useful for me.

                      Comment


                      • #12
                        I didn't intend to revive this thread, but as I was searching for a solution to automatize the fastq read counting process, this solution may come in handy to some :

                        for i in `find . -name "*.fastq"`; do echo "$i" >> project_nbread.txt; egrep -c "`head -n 1 $i | awk -F '[@:]' '{ print $2 } '`" $i >> project_nbread.txt ; done
                        In the case of only one file, you can use this :
                        egrep -c "`head -n 1 file.fastq | awk -F '[@:]' '{ print $2 } '`" file.fastq
                        This solution will count the number of lines where the id is found in the header of a fastq seq, i.e the number of fastq reads.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        46 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X