Regex to clean up YouTube titles

The titles of youtube-videos often contain notations on which part of a sequence (e.g. ‘part 1/9′) one is watching. This is fine in YouTube, but unbearable if one aggregates the xml-feed for custom purposes (like I do in my favorites-player). Since regex can be used in dynamic as well as clientside languages and many applications, I decided to set up a regex search expression eliminating all possible variations while leaving similar yet not connatural strings untouched:

(\,|)( Part| part| Teil| teil| Parte| parte|)( |)(\(|\[|\{| ?)[0-9]{1,3}\/[0-9]{1,3}(\)|\]|\}| ?)
  • Red: The fraction itself is always assembled of one or more digits slash one or more digits. The expression will not affect numbers alone.
  • Blue: There might be various kinds of brackets used to enclose the fraction - or even no brackets at all.
  • Green: Often the fraction is preceeded by an indicator like ‘part’ or a translation of this. It is removed given there is a space before it (so ‘im Gegenteil 1/2 so hoher Verbrauch’ will not be erased).
  • Orange: If the title starts with the fraction the space between potential indicator and fraction must not be mandatory. Possibly the indicator is seperated from the title by a comma, which will then be removed.

  If this doesn’t work, try it here.

In javascript the adjustment of YouTube-titles might look like this:

<script type=”text/javascript”>

 // Adjust title
 // −−−−−−−−−−−−

 // Get title from somewhere
    var title = “Regex to clean up YouTube titles, part (1/3)”;

 // Search regex and replace with nothing
    var titleClean = title.replace(/(\,|)( Part| part| Teil| teil| Parte| parte|)( |)(\(|\[|\{| ?)[0-9]{1,3}\/[0-9]{1,3}(\)|\]|\}| ?)/g, ”);

 // Debugging
    alert(”Input: ‘”+title+”‘\nOutput: ‘”+titleClean+”‘”);

</script>

[Post to Twitter] Über diesen Artikel twittern 


Schreibe einen Kommentar

Deine Email wird neimals veröffentlicht. Pflichtfelder sind mit * markiert

*
*