What is the correct way to slice a unicode string in python?

Im new to python and playing around with the scrapy web crawler. I want grab the first 10 characters of a description string and use these as a title

The following fragment of python code results in the JSON below

item['image'] = img.xpath('@src').extract()
item_desc = img.xpath('@title').extract()
print(item_desc)
item['description'] = item_desc
item['title'] = item_desc[:10]
item['parentUrl'] = response.url

{'description': [u'CHAR-BROIL Tru-Infrared 350 IR Gas Grill - SportsAuthority.com '],
 'image': [u'http://www.sportsauthority.com/graphics/product_images/pTSA-10854895t130.jpg'],
 'parentUrl': 'http://www.sportsauthority.com/category/index.jsp?categoryId=3077576&clickid=topnav_Jerseys+%26+Fan+Shop',
 'title': [u'CHAR-BROIL Tru-Infrared 350 IR Gas Grill - SportsAuthority.com ']}

What I would like is the below. The slice isnt behaving as Id expect.

{'description': [u'CHAR-BROIL Tru-Infrared 350 IR Gas Grill - SportsAuthority.com '],
 'image': [u'http://www.sportsauthority.com/graphics/product_images/pTSA-10854895t130.jpg'],
 'parentUrl': 'http://www.sportsauthority.com/category/index.jsp?categoryId=3077576&clickid=topnav_Jerseys+%26+Fan+Shop',
 'title': [u'CHAR-BROIL']}

Answers


item_desc is a list with one element in it, and that element is a unicode string. It is not a unicode string itself. The [...] are a big hint there.

Get the element out, slice, and put it back in a list:

item['title'] = [item_desc[0][:10]]

Evidently the .extract() function can return more than one match; you could also just pick the first one if you are expecting just one match:

item['image'] = img.xpath('@src').extract()[0]
item_desc = img.xpath('@title').extract()[0]
item['description'] = item_desc
item['title'] = item_desc[:10]

If your XPath queries do not always return a result, test for an empty list first:

img_match = img.xpath('@src').extract()
item['image'] = img_match[0] if img_match else ''
item_desc = img.xpath('@title').extract()
item['description'] = item_desc[0] if item_desc else ''
item['title'] = item_desc[0][:10] if item_desc else ''

Need Your Help

Getting Yahoo Biz Industry Market Summary Stock Info

javascript yahoo-finance stockquotes

I am using this code to get Stock Info for a Symbol which is working for me!

Use YouTube iframe API with Angular2 and Typescript

typescript angular webpack youtube-iframe-api

How do I construct a YT.Player object and access its properties getCurrentTime() within an Angular2 Component using Typescript?