杂项 函数
在线手册:中文 英文
PHP手册

php_strip_whitespace

(PHP 5)

php_strip_whitespace返回删除注释和空格后的PHP源码

说明

string php_strip_whitespace ( string $filename )

返回删除注释和空格后 filename 的PHP源码。这对实际代码数量和注释数量的对比很有用。 此函数与 命令行 下执行 php -w 相似。 commandline.

参数

filename

PHP文件的路径。

返回值

在成功时返回过滤后的代码,或者在失败时返回空字符串。

Note:

此函数在PHP 5.0.1后以所述方式工作。之前它仅会返回一个空字符串。关于更多此BUG的信息与其行为,详见BUG报告 » #29606

范例

Example #1 php_strip_whitespace() 的例子

<?php
// PHP comment here

/*
 * Another PHP comment
 */

echo        php_strip_whitespace(__FILE__);
// Newlines are considered whitespace, and are removed too:
do_nothing();
?>

以上例程会输出:

<?php
 echo php_strip_whitespace(__FILE__); do_nothing(); ?>

可以注意到PHP的注释已不存在,成为第一个echo语句前的换行和空格。


杂项 函数
在线手册:中文 英文
PHP手册
PHP手册 - N: 返回删除注释和空格后的PHP源码

用户评论:

TK (23-May-2009 02:07)

I was looking earlier for a way to strip php comments from my source files but didn't come up with much.  I wrote the following function to do the trick using the tokenizer.  I've tested in on an entire phpMyAdmin install and it worked fine afterward... so it should be good to go.  You may also specify any number of tokens to strip such as T_WHITESPACE rather the default of T_COMMENT and T_DOC_COMMENT.

Hopefully someone finds it useful.

<?php

function strip_tokens($code) {

   
$args = func_get_args();
   
$arg_count = count($args);
   
   
// if no tokens to strip have been specified then strip comments by default
   
if( $arg_count === 1 ) {
       
$args[1] = T_COMMENT;
       
$args[2] = T_DOC_COMMENT;
    }

   
// build a keyed array of tokens to strip
   
for( $i = 1; $i < $arg_count; ++$i )
       
$strip[ $args[$i] ] = true;

   
// set a keyed array of newline characters used to preserve line numbering
   
$newlines = array("\n" => true, "\r" => true);

   
$tokens = token_get_all($code);

   
reset($tokens);

   
$return = '';

   
$token = current($tokens);
   
    while(
$token ) {

        if( !
is_array($token) )

           
$return.= $token;

        elseif(    !isset(
$strip[ $token[0] ]) )

           
$return.= $token[1];

        else {
           
           
// return only the token's newline characters to preserve line numbering
           
for( $i = 0, $token_length = strlen($token[1]); $i < $token_length; ++$i )
                if( isset(
$newlines[ $token[1][$i] ]) )
                   
$return.= $token[1][$i];

        }

       
$token = next($tokens);

    }
// while more tokens

   
return $return;

}
// function

?>

gelamu at gmail dot com (10-Apr-2008 10:12)

With this function You can compress Your PHP source code.

<?php

function compress_php_src($src) {
   
// Whitespaces left and right from this signs can be ignored
   
static $IW = array(
       
T_CONCAT_EQUAL,             // .=
       
T_DOUBLE_ARROW,             // =>
       
T_BOOLEAN_AND,              // &&
       
T_BOOLEAN_OR,               // ||
       
T_IS_EQUAL,                 // ==
       
T_IS_NOT_EQUAL,             // != or <>
       
T_IS_SMALLER_OR_EQUAL,      // <=
       
T_IS_GREATER_OR_EQUAL,      // >=
       
T_INC,                      // ++
       
T_DEC,                      // --
       
T_PLUS_EQUAL,               // +=
       
T_MINUS_EQUAL,              // -=
       
T_MUL_EQUAL,                // *=
       
T_DIV_EQUAL,                // /=
       
T_IS_IDENTICAL,             // ===
       
T_IS_NOT_IDENTICAL,         // !==
       
T_DOUBLE_COLON,             // ::
       
T_PAAMAYIM_NEKUDOTAYIM,     // ::
       
T_OBJECT_OPERATOR,          // ->
       
T_DOLLAR_OPEN_CURLY_BRACES, // ${
       
T_AND_EQUAL,                // &=
       
T_MOD_EQUAL,                // %=
       
T_XOR_EQUAL,                // ^=
       
T_OR_EQUAL,                 // |=
       
T_SL,                       // <<
       
T_SR,                       // >>
       
T_SL_EQUAL,                 // <<=
       
T_SR_EQUAL,                 // >>=
   
);
    if(
is_file($src)) {
        if(!
$src = file_get_contents($src)) {
            return
false;
        }
    }
   
$tokens = token_get_all($src);
   
   
$new = "";
   
$c = sizeof($tokens);
   
$iw = false; // ignore whitespace
   
$ih = false; // in HEREDOC
   
$ls = "";    // last sign
   
$ot = null// open tag
   
for($i = 0; $i < $c; $i++) {
       
$token = $tokens[$i];
        if(
is_array($token)) {
            list(
$tn, $ts) = $token; // tokens: number, string, line
           
$tname = token_name($tn);
            if(
$tn == T_INLINE_HTML) {
               
$new .= $ts;
               
$iw = false;
            } else {
                if(
$tn == T_OPEN_TAG) {
                    if(
strpos($ts, " ") || strpos($ts, "\n") || strpos($ts, "\t") || strpos($ts, "\r")) {
                       
$ts = rtrim($ts);
                    }
                   
$ts .= " ";
                   
$new .= $ts;
                   
$ot = T_OPEN_TAG;
                   
$iw = true;
                } elseif(
$tn == T_OPEN_TAG_WITH_ECHO) {
                   
$new .= $ts;
                   
$ot = T_OPEN_TAG_WITH_ECHO;
                   
$iw = true;
                } elseif(
$tn == T_CLOSE_TAG) {
                    if(
$ot == T_OPEN_TAG_WITH_ECHO) {
                       
$new = rtrim($new, "; ");
                    } else {
                       
$ts = " ".$ts;
                    }
                   
$new .= $ts;
                   
$ot = null;
                   
$iw = false;
                } elseif(
in_array($tn, $IW)) {
                   
$new .= $ts;
                   
$iw = true;
                } elseif(
$tn == T_CONSTANT_ENCAPSED_STRING
                      
|| $tn == T_ENCAPSED_AND_WHITESPACE)
                {
                    if(
$ts[0] == '"') {
                       
$ts = addcslashes($ts, "\n\t\r");
                    }
                   
$new .= $ts;
                   
$iw = true;
                } elseif(
$tn == T_WHITESPACE) {
                   
$nt = @$tokens[$i+1];
                    if(!
$iw && (!is_string($nt) || $nt == '$') && !in_array($nt[0], $IW)) {
                       
$new .= " ";
                    }
                   
$iw = false;
                } elseif(
$tn == T_START_HEREDOC) {
                   
$new .= "<<<S\n";
                   
$iw = false;
                   
$ih = true; // in HEREDOC
               
} elseif($tn == T_END_HEREDOC) {
                   
$new .= "S;";
                   
$iw = true;
                   
$ih = false; // in HEREDOC
                   
for($j = $i+1; $j < $c; $j++) {
                        if(
is_string($tokens[$j]) && $tokens[$j] == ";") {
                           
$i = $j;
                            break;
                        } else if(
$tokens[$j][0] == T_CLOSE_TAG) {
                            break;
                        }
                    }
                } elseif(
$tn == T_COMMENT || $tn == T_DOC_COMMENT) {
                   
$iw = true;
                } else {
                    if(!
$ih) {
                       
$ts = strtolower($ts);
                    }
                   
$new .= $ts;
                   
$iw = false;
                }
            }
           
$ls = "";
        } else {
            if((
$token != ";" && $token != ":") || $ls != $token) {
               
$new .= $token;
               
$ls = $token;
            }
           
$iw = true;
        }
    }
    return
$new;
}

?>

For example:
<?php

$src
= <<<EOT
<?php
// some comment
for (
$i = 0; $i < 99; $i ++ ) {
   echo "i=
${ i }\n";
   /* ... */
}
/** ... */
function abc() {
   return   "abc";
};

abc();
?>
<h1><?= "Some text " . str_repeat("_-x-_ ", 32);;; ?></h1>
EOT;
var_dump(compress_php_src($src));
?>

And the result is:
string(125) "<?php for(=0;<99;++){echo "i=\n";}function abc(){return "abc";};abc(); ?>
<h1><?="Some text ".str_repeat("_-x-_ ",32)?></h1>"

Jouni (03-Oct-2007 01:14)

If you wish to just remove excess whitespace from a string, see the example "Strip whitespace" in the preg_replace documentation (http://www.php.net/manual/en/function.preg-replace.php).

Zvjezdan Patz (14-Sep-2007 04:14)

I was given a report that was separated by spaces and asked to make graphs from it.  I needed to turn the report data into a csv in memory so I could manipulate it further. 

First needed to see the report, then need to strip out the whitespace, but leave one space between each item that I could convert to a column.

There were lots of complicated ways to do this.  I stumbled on something simple.

Say the report looks like this:

Monday    Tuesday    Wednesday    Thursday   Friday   Saturday   Sunday
1               5               7                    8               10         7              8        
7               15             4                    0               21         4              12
9               5               7                    9               0           9              43

The report is using spaces and not tabs to separate everything.  Assume it's a file called data.txt you can use the following to strip out the spaces and make it comma delimited:

<?php

$handle
= @fopen("data.txt", "r");

if (
$handle)
{
  while (!
feof($handle))
  {
   
$buffer = fgets($handle, 4096);
   
// this will search for 5 spaces and replace with 1, then 4, then 3, then 2
    // then only one will be left.  Replace that one space with a comma
    // then output with nl2br so you can see the line breaks

   
print nl2br(str_replace(" ", ",",ereg_replace( '  ', ' ',ereg_replace( '   ', ' ', ereg_replace( '    ', ' ', ereg_replace( '     ',' ',$buffer ))))));
  }
}

fclose($handle);
?>

Hope that helps someone else.

natio at phpfox dot com (06-Sep-2007 07:21)

Notice: In my last comment for this function I failed to add some important parts of the function. So I have re-added it here. Feel free to delete my earlier comment. Thanks!
---

To use php_strip_whitespace in (PHP 4 >= 4.2.0) you could try the function below. This function also helps solve the issues with php_strip_whitespace not fully removing new lines and extra whitespace's in HTML when embedded with PHP.

<?php

if (!defined ('T_ML_COMMENT'))
{
   
define ('T_ML_COMMENT', T_COMMENT);
}
if (!
defined ('T_DOC_COMMENT'))
{
   
define ('T_DOC_COMMENT', T_ML_COMMENT);
}

function
StripWhitespace($sFileName)
{
    if ( !
is_file($sFileName) )
    {
        return
false;
    }

   
$sContent = implode('', file($sFileName));

   
$aTokens = token_get_all($sContent);

   
$bLast = false;
   
$sStr = '';
    for (
$i = 0, $j = count($aTokens); $i < $j; $i++ )
    {
        if (
is_string($aTokens[$i]) )
        {
           
$bLast = false;
           
$sStr .= $aTokens[$i];
        }
        else
        {
            switch (
$aTokens[$i][0] )
            {
                case
T_COMMENT:
                case
T_ML_COMMENT:
                case
T_DOC_COMMENT:
                break;
                case
T_WHITESPACE:
                if (!
$bLast)
                {
                   
$sStr .= ' ';
                   
$bLast = true;
                }
                break;
                default:
                    
$bLast = false;
                   
$sStr .= $aTokens[$i][1];
                break;
            }
        }
    }

   
$sStr = trim($sStr);
   
$sStr = str_replace("\n", "", $sStr);
   
$sStr = str_replace("\r", "", $sStr);

    return
$sStr;
}
?>

nyctimus at yahoo dot com (03-May-2007 10:01)

Here's one for CSS:

<?php

function css_strip_whitespace($css)
{
 
$replace = array(
   
"#/\*.*?\*/#s" => ""// Strip C style comments.
   
"#\s\s+#"      => " ", // Strip excess whitespace.
 
);
 
$search = array_keys($replace);
 
$css = preg_replace($search, $replace, $css);

 
$replace = array(
   
": "  => ":",
   
"; "  => ";",
   
" {"  => "{",
   
" }"  => "}",
   
", "  => ",",
   
"{ "  => "{",
   
";}"  => "}", // Strip optional semicolons.
   
",\n" => ",", // Don't wrap multiple selectors.
   
"\n}" => "}", // Don't wrap closing braces.
   
"} "  => "}\n", // Put each rule on it's own line.
 
);
 
$search = array_keys($replace);
 
$css = str_replace($search, $replace, $css);

  return
trim($css);
}

?>

A word on the first regular expression, since it took me a while.
It strips C style comments. /* Like this. */

#/\*.*?\*/#s
^         ^
The pound signs at either end quote the regex. They don't match anything.

#/\*.*?\*/#s
           ^
The s at the very end sets the PCRE_DOTALL modifier. More info here:
http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

#  /\*  .*?  \*/  #s
    1    2    3
The expression itself consists of 3 parts:
1. the opening comment sequence, represented by     /\*
2. everything in the middle, represented by         .*?
3. and the closing comment sequence, represented by \*/

#/\*.*?\*/#s
   ^    ^
The comment asterisks are escaped. If I had used the more common / for PCRE quoting I would've had to escape those too.

#/\*.*?\*/#s
      ^
The ? prevents the regex from being greedy. See halfway down this page:
http://www.php.net/manual/en/reference.pcre.pattern.syntax.php

flconseil at yahoo dot fr (08-Jul-2006 04:57)

Beware that this function uses the output buffering mechanism.

If you give a 'stream wrapped' path as argument, anything echoed by the stream wrapper during this call (e.g. trace messages) won't be displayed to the screen but will be inserted in php_strip_whitespace's result.

If you execute this stripped code later, it will display the messages which should have been output during php_strip_whitespace's execution !

mwwaygoo AT hotmail DOT com (27-Apr-2006 05:17)

I thought this was a nice function until I realised it wouldnt strip down html. As i'd been reading an article on compressing output to speed up delivery.
So I wrote a little one to do that for me. Here its is, incase people were looking for a html version. It may need tweaking, like with existing &nbsp;'s.

<?php
function strip_html($data)
{
   
// strip unecessary comments and characters from a webpages text
    // all line comments, multi-line comments \\r \\n \\t multi-spaces that make a script readable.
    // it also safeguards enquoted values and values within textareas, as these are required

   
$data=preg_replace_callback("/>[^<]*<\\/textarea/i", "harden_characters", $data);
   
$data=preg_replace_callback("/\\"[^\\"<>]+\\"/", "harden_characters", $data);

   
$data=preg_replace("/(\\/\\/.*\\n)/","",$data); // remove single line comments, like this, from // to \\n
   
$data=preg_replace("/(\\t|\\r|\\n)/","",$data);  // remove new lines \\n, tabs and \\r
   
$data=preg_replace("/(\\/\\*.*\\*\\/)/","",$data);  // remove multi-line comments /* */
   
$data=preg_replace("/(<![^>]*>)/","",$data);  // remove multi-line comments <!-- -->
   
$data=preg_replace('/(\\s+)/', ' ',$data); // replace multi spaces with singles
   
$data=preg_replace('/>\\s</', '><',$data);

   
$data=preg_replace_callback("/\\"[^\\"<>]+\\"/", "unharden_characters", $data);
   
$data=preg_replace_callback("/>[^<]*<\\/textarea/", "unharden_characters", $data);

    return
$data;
}

function
harden_characters($array)
{
   
$safe=$array[0];
   
$safe=preg_replace('/\\n/', "%0A", $safe);
   
$safe=preg_replace('/\\t/', "%09", $safe);
   
$safe=preg_replace('/\\s/', "&nbsp;", $safe);
    return
$safe;
}
function
unharden_characters($array)
{
   
$safe=$array[0];
   
$safe=preg_replace('/%0A/', "\\n", $safe);
   
$safe=preg_replace('/%09/', "\\t", $safe);
   
$safe=preg_replace('/&nbsp;/', " ", $safe);
    return
$safe;
}
?>

The article code was similar to this, which shouldn't work as php_strip_whitespace takes a filename as input:-

<?php
// ob_start(); and output here
$data=ob_get_contents();
ob_end_clean();
if(
strstr($_SERVER['HTTP_ACCEPT_ENCODING'],'gzip'))
{
   
$data=gzencode(php_strip_whitespace($data),9);
   
header('Content-Encoding: gzip');
}
echo
$data;
?>

dnc at seznam dot cz (21-Oct-2005 11:01)

This function can not be used to strip comments outside <?php ... ?>

// this comment will not be removed
<?php
// this comment will be removed
?>

amedee at amedee dot be (09-Jul-2005 01:29)

Not only can this be used for JavaScript files, but also for:

* Java source code
* CSS (Style Sheets)
* Any file with C-style comments.