Python Word Count of Text File

I'm trying to get a count of the frequency of a word in a Text File using a python function. I can get the frequency of all of the words separately, but I'm trying to get a count of specific words by having them in a list. Here's what I have so far but I am currently stuck. My

def repeatedWords():
    with open(fname) as f:
        wordcount={}
        for word in word_list:
            for word in f.read().split():
                if word not in wordcount:
                    wordcount[word] = 1
                else:
                    wordcount[word] += 1
            for k,v in wordcount.items():
                 print k, v

word_list =  [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
repeatedWords('file.txt')

Updated, still showing all words:

def repeatedWords(fname, word_list):
with open(fname) as f:
    wordcount = {}
    for word in word_list:
        for word in f.read().split():
            wordcount[word] = wordcount.get(word, 0) + 1


for k,v in wordcount.items():
    print k, v

word_list = ['Emma', 'Woodhouse', 'father', 'Taylor', 'Miss', 'been', 'she', 'her'] repeatedWords('Emma.txt', word_list)

Answers


So you want the frequency of only the specific words in that list (Emma, Woodhouse, Father...)? If so, this code might help (try running it):

    word_list = ['Emma','Woodhouse','father','Taylor','Miss','been','she','her']
    #i'm using this example text in place of the file you are using
    text = 'This is an example text. It will contain words you are looking for, like Emma, Emma, Emma, Woodhouse, Woodhouse, Father, Father, Taylor,Miss,been,she,her,her,her. I made them repeat to show that the code works.'
    text = text.replace(',',' ') #these statements remove irrelevant punctuation
    text = text.replace('.','')
    text = text.lower() #this makes all the words lowercase, so that capitalization wont affect the frequency measurement

    for repeatedword in word_list:
        counter = 0 #counter starts at 0
        for word in text.split():
            if repeatedword.lower() == word:
                counter = counter + 1 #add 1 every time there is a match in the list
        print(repeatedword,':', counter) #prints the word from 'word_list' and its frequency

The output shows the frequency of only those words in the list you provided, and that's what you wanted right?

the output produced when run in python3 is:

    Emma : 3
    Woodhouse : 2
    father : 2
    Taylor : 1
    Miss : 1
    been : 1
    she : 1
    her : 3

The best way to deal with this is to use get method in Python dictionary. It can be like this:

def repeatedWords():
with open(fname) as f:
    wordcount = {}
    #Example list of words not needed
    nonwordlist = ['father', 'Miss', 'been']
    for word in word_list:
        for word in file.read().split():
            if not word in nonwordlist:
                wordcount[word] = wordcount.get(word, 0) + 1


# Put these outside the function repeatedWords
for k,v in wordcount.items():
    print k, v

The print statement should give you this:

word_list =  [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
newDict = {}
for newWord in word_list:
    newDict[newWord] = newDict.get(newWord, 0) + 1

print newDict

What this line wordcount[word] = wordcount.get(word, 0) + 1 does is, it first looks for word in the dictionary wordcount, if the word already exists, it gets it's value first and adds 1 to it. If the word does not exist, the value defaults to 0 and at this instance, 1 is added making it the first occurrence of that word having a count of 1.


Need Your Help

How to separate Text and HTML with CSS

html css text pseudo-element

In order to improve the maintainability of a website, is it a good idea to separate all the text from the HTML and put it all in a unique CSS stylesheet? I mean, something like this: