Looping over lines with Python
So I have a file that contains this:
SequenceName 4.6e-38 810..924 SequenceName_FGS_810..924 VAWNCRQNVFWAPLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH SequenceName 1.6e-38 887..992 SequenceName_GYQ_887..992 PLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH
I want my program to read only the lines that contain these protein sequences. Up until now I got this, which skips the first line and read the second one:
handle = open(filename, "r") handle.readline() linearr = handle.readline().split() handle.close() fnamealpha = fname + ".txt" handle = open(fnamealpha, "w") handle.write(">%s\n%s\n" % (linearr, linearr)) handle.close()
But it only processes the first sequence and I need it to process every line that contains a sequence, so I need a loop, how can I do it? The part that saves to a txt file is really important too so I need to find a way in which I can combine these two objectives. My output with the above code is:
Okay, I think I understand your question--you want to iterate over the lines in the file, right? But only the second line in the sequence--the one with the protein sequence--matters, correct? Here's my suggestion:
# context manager `with` takes care of file closing, error handling with open(filename, 'r') as handle: for line in handle: if line.startswith('SequenceName_'): print line.split() # Write to file, etc.
My reasoning being that you're only interested in lines that start with SequenceName_###.
Use readlines and throw it all into a for loop.
with open(filename, 'r') as fh: for line in fh.readlines: # do processing here
In the #do processing here section, you can just prepare another list of lines to write to the other file. (Using with handles all the proper closure and sure.)