Python BeautifulSoup replace img src

I'm trying to parse HTML content from site, change a href and img src. A href changed successful, but img src don't.

It changed in variable but not in HTML (post_content):

<p><img alt="alt text" src="https://lifehacker.ru/wp-content/uploads/2016/08/15120903sa_d2__1471520915-630x523.jpg" title="Title"/></p>

Not _http://site.ru...

<p><img alt="alt text" src="http://site.ru/wp-content/uploads/2016/08/15120903sa_d2__1471520915-630x523.jpg" title="Title"/></p>

My code

if "app-store" not in url:
        r = requests.get("https://lifehacker.ru/2016/08/23/kak-vybrat-trimmer/")
        soup = BeautifulSoup(r.content)

        post_content = soup.find("div", {"class", "post-content"})
        for tag in post_content():
            for attribute in ["class", "id", "style", "height", "width", "sizes"]:
                del tag[attribute]

        for a in post_content.find_all('a'):
            a['href'] = a['href'].replace("https://lifehacker.ru", "http://site.ru")

        for img in post_content.find_all('img'):
            img_urls = img['src']
            if "https:" not in img_urls:
                img_urls="http:{}".format(img_urls)
            thumb_url = img_urls.split('/')
            urllib.urlretrieve(img_urls, "/Users/kr/PycharmProjects/education_py/{}/{}".format(folder_name, thumb_url[-1]))

            file_url = "/Users/kr/PycharmProjects/education_py/{}/{}".format(folder_name, thumb_url[-1])
            data = {
                'name': '{}'.format(thumb_url[-1]),
                'type': 'image/jpeg',
            }

            with open(file_url, 'rb') as img:
                data['bits'] = xmlrpc_client.Binary(img.read())


            response = client.call(media.UploadFile(data))

            attachment_url = response['url']


            img_urls = img_urls.replace(img_urls, attachment_url)



        [s.extract() for s in post_content('script')]
        post_content_insert = bleach.clean(post_content)
        post_content_insert = post_content_insert.replace('&lt;', '<')
        post_content_insert = post_content_insert.replace('&gt;', '>')

        print post_content_insert

Answers


Looks like you're never assigning img_urls back to img['src']. Try doing that at the end of the block.

img_urls = img_urls.replace(img_urls, attachment_url)
img['src'] = img_urls

... But first, you need to change your with statement so it uses some name other than img for your file object. Right now you're overshadowing the dom element and you can no longer access it.

        with open(file_url, 'rb') as some_file:
            data['bits'] = xmlrpc_client.Binary(some_file.read())

Need Your Help

JavaScript - How to make an array that contains objects by reference?

javascript arrays pass-by-reference openlayers

I'm using JavaScript Mapping Library - OpenLayer to create a markers overlay.

href is blank then got the error even having a script cathch it

javascript iis-7

I have javascript which return false if the href is blank; otherwise it return true. It worked when I run debug on VS2008. However when I put all my pages on IIS7. It has error 403_Forbidden: Acc...