How do I efficiently crossmatch two ASCII catalogs?

I have two ASCII text files with columnated data. The first column of both files is a 'name' that is consistent across both files. One file has some 6000 rows, the other only has 800. Without doing a for line in file.readlines(): approach - e.g.,

with open('big_file.txt') as catalogue:
with open('small_file.txt') as targets:
    for tline in targets.readlines()[2:]:
        name = tline.split()[0]

        for cline in catalogue.readlines()[8:]:
            if name == cline.split()[0]
                print cline
                catalogue.seek(0)
                break

is there an efficient way to return only the rows (or lines) from the larger file that also appear in the smaller file (using the 'name' as the check)?

It's okay if it is one row at a time for say a file.write(matching_line) the idea would be to create a third file with all the info from the large file for only the objects that are in the small file.

Answers


for line in file.readlines() is not inherently bad. What's bad is the nested loops you have there. You can use a set to keep track of and check all the names in the smaller file:

s = set()
for line in targets:
    s.add(line.split()[0])

Then, just loop through the bigger file and check if the name is in s:

for line in catalogue:
    if line.split()[0] in s:
        print line

Need Your Help

postgres, add row when a value is missing

sql postgresql insert rows fill

Forgive what may be a silly question, but I'm not much of a database guru.

UIView not handling touches

iphone uikit uiview statusbar uiapplication

I have set my status bar hidden in my Application.