- \p{xx} a character with the xx property
- \P{xx} a character without the xx property
- \X an extended Unicode sequence
-
- The property names represented by xx above are limited to the Unicode
- general category properties. Each character has exactly one such prop-
- erty, specified by a two-letter abbreviation. For compatibility with
- Perl, negation can be specified by including a circumflex between the
- opening brace and the property name. For example, \p{^Lu} is the same
- as \P{Lu}.
-
- If only one letter is specified with \p or \P, it includes all the
- properties that start with that letter. In this case, in the absence of
- negation, the curly brackets in the escape sequence are optional; these
- two examples have the same effect:
+ \p{xx} a character with the xx property
+ \P{xx} a character without the xx property
+ \X an extended Unicode sequence
+
+ The property names represented by xx above are limited to the Unicode
+ script names, the general category properties, and "Any", which matches
+ any character (including newline). Other properties such as "InMusical-
+ Symbols" are not currently supported by PCRE. Note that \P{Any} does
+ not match any characters, so always causes a match failure.
+
+ Sets of Unicode characters are defined as belonging to certain scripts.
+ A character from one of these sets can be matched using a script name.
+ For example:
+
+ \p{Greek}
+ \P{Han}
+
+ Those that are not part of an identified script are lumped together as
+ "Common". The current list of scripts is:
+
+ Arabic, Armenian, Bengali, Bopomofo, Braille, Buginese, Buhid, Cana-
+ dian_Aboriginal, Cherokee, Common, Coptic, Cypriot, Cyrillic, Deseret,
+ Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati,
+ Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada,
+ Katakana, Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam,
+ Mongolian, Myanmar, New_Tai_Lue, Ogham, Old_Italic, Old_Persian, Oriya,
+ Osmanya, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac, Tagalog, Tag-
+ banwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh,
+ Ugaritic, Yi.
+
+ Each character has exactly one general category property, specified by
+ a two-letter abbreviation. For compatibility with Perl, negation can be
+ specified by including a circumflex between the opening brace and the
+ property name. For example, \p{^Lu} is the same as \P{Lu}.
+
+ If only one letter is specified with \p or \P, it includes all the gen-
+ eral category properties that start with that letter. In this case, in
+ the absence of negation, the curly brackets in the escape sequence are
+ optional; these two examples have the same effect: