php 中英文混合文本截取字符串函数
下面要提供两款php中英文混合文本截取字符串函数,这两款字符串截取函数都是自于不同的cms一款是dede字符串截取函数,一款是phpcms的,现在拿来给各位使用.
PHP代码如下:
- function str_cut($string, $length, $dot = '...')
- {
- $strlen = strlen($string);
- if($strlen <= $length) return $string;
- $string = str_replace(array(' ', '&', '"', ''', '“', '”', '—', '<', '>', '·', '…'), array(' ', '&', '"', "'", '“', '”', '—', '<', '>', '·', '…'), $string);
- $strcut = '';
- if(strtolower(charset) == 'utf-8')
- {
- $n = $tn = $noc = 0;
- while($n < $strlen)
- {
- $t = ord($string[$n]);
- if($t == 9 || $t == 10 || (32 <= $t && $t <= 126)) {
- $tn = 1; $n++; $noc++;
- } elseif(194 <= $t && $t <= 223) {
- $tn = 2; $n += 2; $noc += 2;
- } elseif(224 <= $t && $t < 239) {
- $tn = 3; $n += 3; $noc += 2;
- } elseif(240 <= $t && $t <= 247) {
- $tn = 4; $n += 4; $noc += 2;
- } elseif(248 <= $t && $t <= 251) {
- $tn = 5; $n += 5; $noc += 2;
- } elseif($t == 252 || $t == 253) {
- $tn = 6; $n += 6; $noc += 2;
- } else {
- $n++;
- }
- if($noc >= $length) break;
- }
- if($noc > $length) $n -= $tn;
- $strcut = substr($string, 0, $n);
- }
- else
- {
- $dotlen = strlen($dot);
- $maxi = $length - $dotlen - 1;
- for($i = 0; $i < $maxi; $i++)
- {
- $strcut .= ord($string[$i]) > 127 ? $string[$i].$string[++$i] : $string[$i];
- }
- }
- $strcut = str_replace(array('&', '"', "'", '<', '>'), array('&', '"', ''', '<', '>'), $strcut);
- return $strcut.$dot;
- }
方法二:
中文截取2,单字节截取模式,如果是request的内容,必须使用这个函数,代码如下:
- function cn_substrr($str,$slen,$startdd=0)
- {
- $str = cn_substr(strips教程lashes($str),$slen,$startdd);
- return addslashes($str);
- }
-
- function cn_substr($str,$slen,$startdd=0)
- {
- global $cfg_soft_lang;
- if($cfg_soft_lang=='utf-8')
- {
- return cn_substr_utf8($str,$slen,$startdd);
- }
- $restr = '';
- $c = '';
- $str_len = strlen($str);
- if($str_len < $startdd+1)
- {
- return '';
- }
- if($str_len < $startdd + $slen || $slen==0)
- {
- $slen = $str_len - $startdd;
- }
- $enddd = $startdd + $slen - 1;
- for($i=0;$i<$str_len;$i++)
- {
- if($startdd==0)
- {
- $restr .= $c;
- }
- else if($i > $startdd)
- {
- $restr .= $c;
- }
- if(ord($str[$i])>0x80)
- {
- if($str_len>$i+1)
- {
- $c = $str[$i].$str[$i+1];
- }
- $i++;
- }
- else
- {
- $c = $str[$i];
- }
- if($i >= $enddd)
- {
- if(strlen($restr)+strlen($c)>$slen)
- {
- break;
- }
- else
- {
- $restr .= $c;
- break;
- }
- }
- }
- return $restr;
- }
-
- function cn_substr_utf8($str, $length, $start=0)
- {
- if(strlen($str) < $start+1)
- {
- return '';
- }
- preg_match_all("/./su", $str, $ar);
- $str = '';
- $tstr = '';
-
- for($i=0; isset($ar[0][$i]); $i++)
- {
- if(strlen($tstr) < $start)
- {
- $tstr .= $ar[0][$i];
- }
- else
- {
- if(strlen($str) < $length + strlen($ar[0][$i]) )
- {
- $str .= $ar[0][$i];
- }
- else
- {
- break;
- }
- }
- }
- return $str;
- }
上面二款字符串截取函数有一个相同点,他就是全部支持中英文混体文本,也都是判断asc码值进行区别那个是汉字,那个是中文,以及是utf8还是gbk等.