Validating URLs in Python
I've been trying to figure out what the best way to validate a URL is (specifically in Python) but haven't really been able to find an answer. It seems like there isn't one known way to validate a URL, and it depends on what URLs you think you may need to validate. As well, I found it difficult to find an easy to read standard for URL structure. I did find the RFCs 3986 and 3987, but they contain much more than just how it is structured.
Am I missing something, or is there no one standard way to validate a URL?
This looks like it might be a duplicate of How do you validate a URL with a regular expression in Python?
You should be able to use the urlparse library described there.
>>> from urllib.parse import urlparse # python2: from urlparse import urlparse >>> urlparse('actually not a url') ParseResult(scheme='', netloc='', path='actually not a url', params='', query='', fragment='') >>> urlparse('http://google.com') ParseResult(scheme='http', netloc='google.com', path='', params='', query='', fragment='')
call urlparse on the string you want to check and then make sure that the ParseResult has attributes for scheme and netloc
The original question is a bit old, but you might also want to look at the Validator-Collection library I released a few months back. It includes high-performing regex-based validation of URLs for compliance against the RFC standard. Some details:
- Tested against Python 2.7, 3.4, 3.5, 3.6
- No dependencies on Python 3.x, one conditional dependency in Python 2.x (drop-in replacement for Python 2.x's buggy re module)
- Unit tests that cover ~80 different succeeding/failing URL patterns, including non-standard characters and the like. As close to covering the whole spectrum of the RFC standard as I've been able to find.
It's also very easy to use:
from validator_collection import validators, checkers checkers.is_url('http://www.stackoverflow.com') # Returns True checkers.is_url('not a valid url') # Returns False value = validators.url('http://www.stackoverflow.com') # value set to 'http://www.stackoverflow.com' value = validators.url('not a valid url') # raises a validator_collection.errors.InvalidURLError (which is a ValueError)
In addition, Validator-Collection includes about 60+ other validators, including domains and email addresses as well, so something folks might find useful.
I would use the validators package. Here is the link to the documentation and installation instructions.
It is just as simple as
import validators url = 'YOUR URL' validators.url(url)
It will return true if it is, and false if not.
you can also try using urllib.request to validate by passing the URL in the urlopen function and catching the exception for URLError.
from urllib.request import urlopen, URLError def validate_web_url(url="http://google"): try: urlopen(url) return True except URLError: return False
This would return False in this case
Assuming you are using python 3, you could use urllib. The code would go something like this:
import urllib.request as req import urllib.parse as p def foo(): url = 'http://bar.com' request = req.Request(url) try: response = req.urlopen(request) #response is now a string you can search through containing the page's html except: #The url wasn't valid
If there is no error on the line "response = ..." then the url is valid.