PCRE模式
在线手册:中文 英文
PHP手册

模式语法

Table of Contents


PCRE模式
在线手册:中文 英文
PHP手册
PHP手册 - N: 模式语法

用户评论:

mbrodin (24-Nov-2008 09:18)

Hi!

For even better prestanda of the code below, use;

<?php
    $f
= array();

    foreach(
$allTags[1] as $tag){
   
$f[] = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";
    }

    if(
sizeof($f)) $str = preg_replace($f, ($stripContent ? '' : '${2}'), $str);
?>

This will not use preg_replace on every tag, instead it collect the regex as array, and then executes and should be better.

It also check so there are any regex to replace! If not, it will not start preg_replace! :)

Added the "<?php" so it will highlight the code!

Ved Prakash (23-Nov-2007 10:00)

Hi,

About strip_selected_tags

it removes all the tags irrespective of the selection

for removing selected tags change

    $replace = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";
    foreach ($allTags[1] as $tag) {
        if ($stripContent) {
            $str = preg_replace($replace,'',$str);
        }
            $str = preg_replace($replace,'${2}',$str);
    }

to

    foreach ($allTags[1] as $tag) {
        $replace = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";
        if ($stripContent) {
            $str = preg_replace($replace,'',$str);
        }
            $str = preg_replace($replace,'${2}',$str);
    }

Thanks

datacompboy at call2ru dot com (29-Oct-2007 06:31)

For example, you want to cut an some <div> element.
Accurate, from <div> to correspond </div> element.
Here is proof-of-concept code to do this:

<?
$str = "<dqiv1>1+<div2>2+<div3><b><c>3</c></b></div3>2-</div2>1-</div1>";

preg_match("#<div.> ( ".
              " ( (?>[^<]*) ( < ( ([^/d]|d([^i]|i[^v])) | /([^d]|d([^i]|i[^v])) ) )? )* ".
           " | (?R) )* </div.>#xi", $str, $m);
var_dump($m[0]);

?>

it match accurate from <div2> to </div2>. And, if you change <dqiv1> to <div1>, it will match from <div1> to </div1>

datacompboy at call2ru dot com (29-Oct-2007 06:30)

For example, you want to cut an some <div> element.
Accurate, from <div> to correspond </div> element.
Here is proof-of-concept code to do this:

<?
$str = "<dqiv1>1+<div2>2+<div3><b><c>3</c></b></div3>2-</div2>1-</div1>";

preg_match("#<div.> ( ".
              " ( (?>[^<]*) ( < ( ([^/d]|d([^i]|i[^v])) | /([^d]|d([^i]|i[^v])) ) )? )* ".
           " | (?R) )* </div.>#xi", $str, $m);
var_dump($m[0]);

?>

it match accurate from <div2> to </div2>. And, if you change <dqiv1> to <div1>, it will match from <div1> to </div1>

datacompboy at call2ru dot com (29-Oct-2007 06:24)

For example, you want to cut an some <div> element.
Accurate, from <div> to correspond </div> element.
Here is proof-of-concept code to do this:

<?
$str = "<dqiv1>1+<div2>2+<div3><b><c>3</c></b></div3>2-</div2>1-</div1>";

preg_match("#<div.> ( ".
              " ( (?>[^<]*) ( < ( ([^/d]|d([^i]|i[^v])) | /([^d]|d([^i]|i[^v])) ) )? )* ".
           " | (?R) )* </div.>#xi", $str, $m);
var_dump($m[0]);

?>

it match accurate from <div2> to </div2>. And, if you change <dqiv1> to <div1>, it will match from <div1> to </div1>

chris at madblanks dot org (04-Jul-2007 08:22)

When enclosing your regular expression in double quotes, back references require two backslashes.

For example, \1 is the ascii character \1. You need to provide \\1 to get the back reference.

sam marshall (24-May-2007 06:23)

For anyone who sees this error:

Warning: preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at ...

As this manual page says, you need PHP 5.1.0 and the /u modifier in order to enable these features, but that isn't the only requirement! It is possible to install later versions of PHP (we have 5.1.4) while linking to an older PCRE install. A quick look at the PCRE changelog suggests that you probably need at least PCRE 5; we're running 4.5, while the latest is 7.1. You can find out your PCRE version by checking phpinfo().

I suspect this ancient PCRE version is included in some officially-supported Red Hat Enterprise package which is probably why we are running it so might also affect other people.

pstradomski at gmail dot com (29-Mar-2007 03:55)

About strip_selected_tags function from two posts below:

it does not work if somebody uses tags without ending ">" character, like this:

<p <b> bold text </b</p

This  is even valid HTML (but not valid XHTML)

theppg_001 at hotmail dot com (20-Nov-2006 09:22)

Hi there
This was originally made by someone eles but it didn't work correctly and so I remade it and as far as I know it works right.

<?php
/**
* strip_selected_tags ( string str [, string strip_tags[, strip_content flag]] )
* ---------------------------------------------------------------------
* Like strip_tags() but inverse; the strip_tags tags will be stripped, not kept.
* strip_tags: string with tags to strip, ex: "<a><p><quote>" etc.
* strip_content flag: TRUE will also strip everything between open and closed tag
*/
function strip_selected_tags($str, $tags = "", $stripContent = false)
{
   
preg_match_all("/<([^>]+)>/i", $tags, $allTags, PREG_PATTERN_ORDER);
   
$replace = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";
    foreach (
$allTags[1] as $tag) {
        if (
$stripContent) {
           
$str = preg_replace($replace,'',$str);
        }
           
$str = preg_replace($replace,'${2}',$str);
    }
    return
$str;
}
?>

Before I 'fixed' it, when running
strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>")
You would get back
"this is <p align=\"center\">a test</p> and this is bold"
Why? Because it did not take into account that there could be options etc in the HTML Tag.
My one works perfectly when stripping just the tags or the tag and its contents too!

So now when you run
strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>")
You get back
"this is a test and this is bold"
Or when running
strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>",true)
You get back
"this is  and "

Hope it helps someone :)

spook at op.pl (04-Apr-2006 07:37)

A useful note for beginners: note the difference between mathematical and PHP regular expressions. The _mathematical_ regex:

(a+b+c)*

which written in PHP syntax will look like:

[abc]*

will match any string built of a, b or c letters, but will not match string, for example:

abcd

However, the _PHP_ regular expression will match above string, because the regex means "accept all strings, which contain 0 or more occurences of letters: a, b or c".

To convert the regexp from the mathematical to PHP convention, use the ^ and $ characters, which indicate start and end of tested string. So the regexp:

^[abc]*$

means "match all strings, which, between its beginning and end, have 0 or more occurences of letters a, b or c" - which is, what we searched for.

Nasty habit, especially after two tests on "theoretical basics of computer science" :)

Daniel Vandersluis (23-Nov-2005 06:50)

Concerning note #6 in "Differences From Perl", the \G token *is* supported as the last match position anchor. This has been confirmed to work at least in preg_replace(), though I'd assume it'd work in preg_match_all(), and other functions that can make more than one match, as well.

roland dot illig at gmx dot de (08-Nov-2005 09:02)

<quote>
9. Another as yet unresolved discrepancy is that in Perl 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string "a", whereas in PCRE it does not. However, in both Perl and PCRE /^(a)?a/ matched against "a" leaves $1 unset.
</quote>

The last sentence does not indicate a bug. If the string "a" should match against the regular expression /^(a)?a/, the last "a" in the regex must be matched by any literal "a" in the string. The rest of the string is "", which obviously does not match the first /^(a)/.

Ned Baldessin (16-Jul-2005 12:14)

Although \w and \W do include as "word characters" locale-specific characters (like "" if you are using the "fr" locale), \b and \B do not work the same way.

For example :
"foo tait bar"   =>   /\W(tait)\W/   =>   This captures correctly "tait".
"foo tait bar"   =>   /\b(tait)\b/   =>   This fails to capture it.

This is confusing, because the manual talks in both cases about "word characters", but fails to mention the difference in behaviour.

onerob at gmail dot com (02-Apr-2005 12:51)

If, like me, you tend to use the /U pattern modifier, then you will need to remember that using ? or * to to test for optional characters will match zero characters if it means that the rest of the pattern can continue matching, even if the optional characters exist.

For instance, if we have this string:

a___bcde

and apply this pattern:

'/a(_*).*e/U'

The whole pattern is matched but none of the _ characters are placed in the sub-pattern. The way around this (if you still wish to use /U) is to use the ? greediness inverter. eg,

'/a(_*?).*e/U'

W W W (07-Mar-2005 03:22)

Back references are a great way to achieve exact matching when it would have been impossible any other way. Take these three strings.

1) "www.www.com"
2) 'www.www.com'
3) "www.www.com'

The regex /^("|').+?("|')$/ would match all three strings but what if you needed the 3rd string above to be illegal because the quotes are not the same? You could write four different regexes to check for every possible case OR you could use back references.

/^("|').+?\1$/ will match strings 1 and 2 but not string 3. Try this code for further proof:

$str_test="'www.www.com\"";
$int_count=preg_match("/^(\"|').+?\\1$/", $str_test, $matches, PREG_OFFSET_CAPTURE);

The preg_match function will not match against $str_test because the quotes are mismatched. If you change $str_test to

$str_test = "'www.www.com'";

the preg_match will work.

info at atjeff dot co dot nz (08-Feb-2005 12:46)

ive never used regex expressions till now and had loads of difficulty trying to convert a [url]link here[/url] into an href for use with posting messages on a forum, heres what i manage to come up with:

$patterns = array(
            "/\[link\](.*?)\[\/link\]/",
            "/\[url\](.*?)\[\/url\]/",
            "/\[img\](.*?)\[\/img\]/",
            "/\[b\](.*?)\[\/b\]/",
            "/\[u\](.*?)\[\/u\]/",
            "/\[i\](.*?)\[\/i\]/"
        );
        $replacements = array(
            "<a href=\"\\1\">\\1</a>",
            "<a href=\"\\1\">\\1</a>",
            "<img src=\"\\1\">",
            "<b>\\1</b>",
            "<u>\\1</u>",
            "<i>\\1</i>"
           
        );
        $newText = preg_replace($patterns,$replacements, $text);

at first it would collect ALL the tags into one link/bold/whatever, until i added the "?" i still dont fully understand it... but it works :)

J Daugherty (09-Dec-2004 05:06)

In the character class meta-character documentation above, the circumflex (^) is described:

"^   negate the class, but only if the first character"

It should be a little more verbose to fully express the meaning of ^:

^    Negate the character class.  If used, this must be the first character of the class (e.g. "[^012]").

(29-May-2004 10:15)

In addition to the meta-characters mentioned above, there can be another special character in a regular expression: the delimiter you use to start and end your expression.  Often people use the / character for this.

For example, if you wanted to search for text surrounded by opening and closing tags like'<TD>SELL</TD>' and replace it with nothing (erase it), you might be tempted to use a regex like this:

<?php
$myNewText
= preg_replace('/<TD>SELL</TD>/', "", $myText);
?>

This does not work properly.  As mentioned in the Introduction at the top of http://www.php.net/manual/en/ref.pcre.php, if the delimiter appears in the middle of your regular expression, then you must put a \ character before it.  So this DOES work:

<?php
$myNewText
= preg_replace('/<TD>SELL<\/TD>/', "", $myText);
?>

That same Introduction also mentions that you can start and end your expression with characters other than the usual /.  Because there are no % characters in the middle of my expression above, I might prefer to use the following:

<?php
$myNewText
= preg_replace('%<TD>SELL</TD>%', "", $myText);
?>

That also works correctly, and I did not need a \ before the /.

napalm at spiderfish dot net (17-Mar-2004 04:14)

Pay attention that some pcre features such as once-only or recursive patterns are not implemented in php versions prior to 5.00

Napalm