+ code character property support is available. These sequences retain
+ their original meanings from before UTF-8 support was available, mainly
+ for efficiency reasons.
+
+ The sequences \h, \H, \v, and \V are Perl 5.10 features. In contrast to
+ the other sequences, these do match certain high-valued codepoints in
+ UTF-8 mode. The horizontal space characters are:
+
+ U+0009 Horizontal tab
+ U+0020 Space
+ U+00A0 Non-break space
+ U+1680 Ogham space mark
+ U+180E Mongolian vowel separator
+ U+2000 En quad
+ U+2001 Em quad
+ U+2002 En space
+ U+2003 Em space
+ U+2004 Three-per-em space
+ U+2005 Four-per-em space
+ U+2006 Six-per-em space
+ U+2007 Figure space
+ U+2008 Punctuation space
+ U+2009 Thin space
+ U+200A Hair space
+ U+202F Narrow no-break space
+ U+205F Medium mathematical space
+ U+3000 Ideographic space
+
+ The vertical space characters are:
+
+ U+000A Linefeed
+ U+000B Vertical tab
+ U+000C Formfeed
+ U+000D Carriage return
+ U+0085 Next line
+ U+2028 Line separator
+ U+2029 Paragraph separator
+
+ A "word" character is an underscore or any character less than 256 that
+ is a letter or digit. The definition of letters and digits is con-
+ trolled by PCRE's low-valued character tables, and may vary if locale-
+ specific matching is taking place (see "Locale support" in the pcreapi
+ page). For example, in a French locale such as "fr_FR" in Unix-like
+ systems, or "french" in Windows, some character codes greater than 128
+ are used for accented letters, and these are matched by \w. The use of
+ locales with Unicode is discouraged.
+
+ Newline sequences
+
+ Outside a character class, the escape sequence \R matches any Unicode
+ newline sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is
+ equivalent to the following:
+
+ (?>\r\n|\n|\x0b|\f|\r|\x85)
+
+ This is an example of an "atomic group", details of which are given
+ below. This particular group matches either the two-character sequence
+ CR followed by LF, or one of the single characters LF (linefeed,
+ U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage
+ return, U+000D), or NEL (next line, U+0085). The two-character sequence
+ is treated as a single unit that cannot be split.
+
+ In UTF-8 mode, two additional characters whose codepoints are greater
+ than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
+ rator, U+2029). Unicode character property support is not needed for
+ these characters to be recognized.
+
+ Inside a character class, \R matches the letter "R".