Looping over lines with Python

So I have a file that contains this:

SequenceName 4.6e-38 810..924
SequenceName_FGS_810..924 VAWNCRQNVFWAPLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH
SequenceName 1.6e-38 887..992
SequenceName_GYQ_887..992 PLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH

I want my program to read only the lines that contain these protein sequences. Up until now I got this, which skips the first line and read the second one:

handle = open(filename, "r")
handle.readline()
linearr = handle.readline().split()
handle.close()

fnamealpha = fname + ".txt"
handle = open(fnamealpha, "w")
handle.write(">%s\n%s\n" % (linearr[0], linearr[1]))
handle.close()

But it only processes the first sequence and I need it to process every line that contains a sequence, so I need a loop, how can I do it? The part that saves to a txt file is really important too so I need to find a way in which I can combine these two objectives. My output with the above code is:

>SequenceName_810..924
VAWNCRQNVFWAPLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH

Answers


Okay, I think I understand your question--you want to iterate over the lines in the file, right? But only the second line in the sequence--the one with the protein sequence--matters, correct? Here's my suggestion:

# context manager `with` takes care of file closing, error handling
with open(filename, 'r') as handle:
    for line in handle:
        if line.startswith('SequenceName_'):
             print line.split()
             # Write to file, etc.

My reasoning being that you're only interested in lines that start with SequenceName_###.


Use readlines and throw it all into a for loop.

with open(filename, 'r') as fh:
    for line in fh.readlines:
        # do processing here

In the #do processing here section, you can just prepare another list of lines to write to the other file. (Using with handles all the proper closure and sure.)


Need Your Help

Plotting google map with ggplot in R

r google-maps ggplot2 geocoding ggmap

I am trying to plot Google map that is queried using RgoogleMaps package and combine it with ggplot. Ultimately, I want to show total population using geom_point, somewhat similar to the picture be...

Validation Functionality within SQLCLR and its Performance

c# .net sql-server-2008-r2 sqlclr

I have to validate all Data rows within the Table as soon as new row is inserted.