Youtube I.D parsing for new URL formats

This question has been asked before and I found this:

Reg exp for youtube link

but I'm looking for something slightly different.

I need to match the Youtube I.D itself compatible with all the possible youtube link formats. Not exclusively beginning with youtube.com.

For example:

http://www.youtube.com/watch?v=-wtIMTCHWuI

http://www.youtube.com/v/-wtIMTCHWuI?version=3&autohide=1

http://youtu.be/-wtIMTCHWuI

http://www.youtube.com/oembed?url=http%3A//www.youtube.com/watch?v%3D-wtIMTCHWuI&format=json

http://s.ytimg.com/yt/favicon-wtIMTCHWuI.ico

http://i2.ytimg.com/vi/-wtIMTCHWuI/hqdefault.jpg

is there a clever strategy I can use to match the video I.D -wtIMTCHWuI compatible with all these formats. I'm thinking character counting and matching = ? / . & characters.

Answers


I had to deal with this for a PHP class I wrote a few weeks ago and ended up with a regex that matches any kind of strings: With or without URL scheme, with or without subdomain, youtube.com URL strings, youtu.be URL strings and dealing with all kind of parameter sorting. You can check it out at GitHub or simply copy and paste the code block below:

/**
 *  Check if input string is a valid YouTube URL
 *  and try to extract the YouTube Video ID from it.
 *  @author  Stephan Schmitz <eyecatchup@gmail.com>
 *  @param   $url   string   The string that shall be checked.
 *  @return  mixed           Returns YouTube Video ID, or (boolean) false.
 */
function parse_yturl($url)
{
    $pattern = '#^(?:https?://|//)?(?:www\.|m\.)?(?:youtu\.be/|youtube\.com/(?:embed/|v/|watch\?v=|watch\?.+&v=))([\w-]{11})(?![\w-])#';
    preg_match($pattern, $url, $matches);
    return (isset($matches[1])) ? $matches[1] : false;
}

Test cases: https://3v4l.org/GEDT0 JavaScript version: https://stackoverflow.com/a/10315969/624466

To explain the regex, here's a split up version:

/**
 *  Check if input string is a valid YouTube URL
 *  and try to extract the YouTube Video ID from it.
 *  @author  Stephan Schmitz <eyecatchup@gmail.com>
 *  @param   $url   string   The string that shall be checked.
 *  @return  mixed           Returns YouTube Video ID, or (boolean) false.
 */
function parse_yturl($url)
{
    $pattern = '#^(?:https?://|//)?' # Optional URL scheme. Either http, or https, or protocol-relative.
             . '(?:www\.|m\.)?'      #  Optional www or m subdomain.
             . '(?:'                 #  Group host alternatives:
             .   'youtu\.be/'        #    Either youtu.be,
             .   '|youtube\.com/'    #    or youtube.com
             .     '(?:'             #    Group path alternatives:
             .       'embed/'        #      Either /embed/,
             .       '|v/'           #      or /v/,
             .       '|watch\?v='    #      or /watch?v=,
             .       '|watch\?.+&v=' #      or /watch?other_param&v=
             .     ')'               #    End path alternatives.
             . ')'                   #  End host alternatives.
             . '([\w-]{11})'         # 11 characters (Length of Youtube video ids).
             . '(?![\w-])#';         # Rejects if overlong id.
    preg_match($pattern, $url, $matches);
    return (isset($matches[1])) ? $matches[1] : false;
}

I found this code this link:

<?php 
/** 
 *  parse_youtube_url() PHP function 
 *  Author: takien 
 *  URL: http://takien.com 
 *  
 *  @param  string  $url    URL to be parsed, eg:  
 *                            http://youtu.be/zc0s358b3Ys,  
 *                            http://www.youtube.com/embed/zc0s358b3Ys
 *                            http://www.youtube.com/watch?v=zc0s358b3Ys 
 *  @param  string  $return what to return 
 *                            - embed, return embed code 
 *                            - thumb, return URL to thumbnail image
 *                            - hqthumb, return URL to high quality thumbnail image.
 *  @param  string     $width  width of embeded video, default 560
 *  @param  string  $height height of embeded video, default 349
 *  @param  string  $rel    whether embeded video to show related video after play or not.

 */  

 function parse_youtube_url($url,$return='embed',$width='',$height='',$rel=0){ 
    $urls = parse_url($url); 

    //expect url is http://youtu.be/abcd, where abcd is video iD
    if($urls['host'] == 'youtu.be'){  
        $id = ltrim($urls['path'],'/'); 
    } 
    //expect  url is http://www.youtube.com/embed/abcd 
    else if(strpos($urls['path'],'embed') == 1){  
        $id = end(explode('/',$urls['path'])); 
    } 
     //expect url is abcd only 
    else if(strpos($url,'/')===false){ 
        $id = $url; 
    } 
    //expect url is http://www.youtube.com/watch?v=abcd 
    else{ 
        parse_str($urls['query']); 
        $id = $v; 
    } 
    //return embed iframe 
    if($return == 'embed'){ 
        return '<iframe width="'.($width?$width:560).'" height="'.($height?$height:349).'" src="http://www.youtube.com/embed/'.$id.'?rel='.$rel.'" frameborder="0" allowfullscreen>'; 
    } 
    //return normal thumb 
    else if($return == 'thumb'){ 
        return 'http://i1.ytimg.com/vi/'.$id.'/default.jpg'; 
    } 
    //return hqthumb 
    else if($return == 'hqthumb'){ 
        return 'http://i1.ytimg.com/vi/'.$id.'/hqdefault.jpg'; 
    } 
    // else return id 
    else{ 
        return $id; 
    } 
} 
?>

I'm dealing with this too so if you find a better solution please let me know. It doesn't quite do what you need for images out of the box but it could be easily adapted.


Currently I'm using this:

function _getYoutubeVideoId($url)
{
  $parts = parse_url($url);

  //For seriously malformed urls
  if ($parts === false) {
     return false;
  }

  switch ($parts['host']) {
     case 'youtu.be':
        return substr($parts['path'], 1);
        break;
     case 'youtube.com':
     case 'www.youtube.com':
        parse_str($parts['query'], $params);
        return $params['v'];
        break;
     default:
        return false;
        break;
  } 
}

It could be extended, but right now it works for most of the cases


It's a bit late, but I wrote this regex today and it does not only identify the links but returns the video_id via match-group 6

^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.?be)(\/)?(watch\?v=|\?v=)?(.*)$

https://gist.github.com/Shibizle/3c6707911ea716860786728d31f8e3e5

Test it: https://regex101.com/r/l0m7yh/1

Picture: Regex YouTube


Need Your Help

URL encode sees “&” (ampersand) as “&amp;” HTML entity

javascript urlencode

I am encoding a string that will be passed in a URL (via GET). But if I use escape, encodeURI or encodeURIComponent, &amp; will be replaced with %26amp%3B, but I want it to be replaced with %26. Wh...

C# .NET Rx- Where is System.Reactive?

c# .net system.reactive reactive-programming

I have an intensive Java background so forgive me if I'm overlooking something obvious in C#, but my research is getting me nowhere. I am trying to use the reactive Rx .NET library. The compiler is...