Python File System Bottleneck, how can I fix this?

So basically I have a folder that looks like this:

MyFolder\
          data_1.txt
          data_2.txt
          data_3.txt
          ...
          data_very_large_number.txt

I want to process each of the files. My plan was to run 10 instances of a script that each process 1/10th of the files.

So basically, I did the following:

 python script.py 1
 python script.py 2
 ...
 python script.py 10

But I'm noticing that only the first instance of script.py is actually processing anything at all. After the first instance is done processing the second instance starts to process. I am guessing that this is a File System bottleneck.

Does anyone have an idea how to tackle this issue with Python?

Answers


There are many ways to run these scripts in parallel, but if you want to keep starting them manually from the command line you should do it like this:

python script.py 1 &
python script.py 2 &

and so on.


Working with large number of files that will fit into system memory, a significant performance improvement can be achieved by using ramdisk, check this out:

http://www.cyberciti.biz/faq/howto-create-linux-ram-disk-filesystem/

To create a ramdisk, simply do:

# mkfs -q /dev/ram1 8192
# mkdir -p /ramcache
# mount /dev/ram1 /ramcache

Use a Queue and threading.

import queue,threading
import glob

q = queue.Queue()
for file in glob.glob(r"MyFolder\data_*.txt"): q.add(file)

class doStuff(threading.Thread):
    def __init__(self,q):
        self.q = q
        super().__init__()
    def run(self):
        while True:
            file = None
            try: file = q.get_nowait()
            except Queue.Empty: return # end thread
            if file is None: continue
            # if we failed to get a file, forget it
            # DO STUFF WITH YOUR FILE
            #
            # DO STUFF WITH YOUR FILE

for _ in range(10):
    t = doStuff(q)
    t.daemon = True
    t.start()

Need Your Help

Send input array with jquery Post without submitting form

jquery html ajax forms

I need to Post data to an url without submitting a form and reloading the page but I'm not too good with JQuery. How would I execute something along the lines of this

PHP Menu Items Count then add under more button

php button menu items

I use the bellow code to load the main menu elements from some CMS, the present code is perfect except that it loads ALL the main items on a single line of menu - which will make the width of it un...