How to find and replace X lines in one text file with Y lines in another text file?

I asked this question: How to find and replace multiple lines in text file? but was ultimately unclear in my question so I'm asking another one to be more specific.

I have Python 2.7.

I have three text files, data.txt, find.txt and replace.txt.

data.txt is about 1MB large file with several thousand lines. Now, I have a find.txt file containing X number of lines that I want to find in data.txt and replace with Y number of lines in replace.txt X and Y may be the same number, or it may not.

For example:

data.txt

pumpkin
apple
banana
cherry
himalaya
skeleton
apple
banana
cherry
watermelon
fruit

find.txt

apple
banana
cherry

replace.txt

1
2
3
4
5

So, in the above example, I want to search for all occurences of apple, banana, and cherry in the data and insert 1,2,3,4,5 in its place.

So, the resulting data.txt would look like:

pumpkin
1
2
3
4
5
himalaya
skeleton
1
2
3
4
5
watermelon
fruit

Or, if the number of lines in replace.txt were less than that of find.txt:

pumpkin
1
2
himalaya
skeleton
1
2
watermelon
fruit

I am having some trouble with the right approach to this as my data.txt is about 1MB so I want to be as efficient as possible. One dumb way is to concatenate everything into one long string and use replace, and then output to a new text file so all the line breaks will be restored.

data = open("data.txt", 'r')
find = open("find.txt", 'r')
replace = open("replace.txt", 'r')

data_str = ""
find_str = ""
replace_str = "" 

for line in data: # concatenate it into one long string
    data_str += line

for line in find: # concatenate it into one long string
    find_str += line

for line in replace: 
    replace_str += line


new_data = data_str.replace(find, replace)
new_file = open("new_data.txt", "w")
new_file.write(new_data)

But this seems so convoluted and inefficient for a large data file like mine.

The pseudo-code for something that I would like to see:

Something like this:

(x,y) = find_lines(data.txt, find.txt) # returns line numbers in data.txt that contains find.txt
replace_data_between(x, y, data.txt, replace.txt) # replaces the data between lines x and y with replace.txt

def find_lines(...):
    location = 0

    LOOP1: 
    for find_line in find:
        for i, data_line in enumerate(data).startingAtLine(location):
            if find_line == data_line:
                location = i # found possibility

    for idx in range(NUMBER_LINES_IN_FIND):
        if find_line[idx] != data_line[idx+location]  # compare line by line
            #if the subsequent lines don't match, then go back and search again
            goto LOOP1

As you can see, I am having trouble with the logic of this all. Can someone point me in the right direction?

Answers


If the files are small enough to do this in ram...

I would first map the find:replace relationship:

find_replace_dict = {find_string:replace_string}

then i would walk through the data file...

of = open('output_file','wt')
for line in data_file:
    if line in find_replace_dict.keys():
        of.write(find_replace_dict[line])
    else:
        of.write(line)
of.close()

Need Your Help

PHP fatal error depending on... browser type?

php browser runtime-error server-side php-include

The situation described below caused a fatal error in PHP in about 50% of the browsers whereas in other browsers the code ran normally. No browser or any client info sniffing was used. As far as I ...

LinQ optimization

c# .net linq optimization

Here is a peace of code: