Snoopy初试,
snoopy是一个php类,用来模仿web浏览器的功能,它能完成获取网页内容和发送表单的任务。
下面是它的一些特征:
1、方便抓取网页的内容
2、方便抓取网页的文字(去掉HTML代码)
3、方便抓取网页的链接
4、支持代理主机
5、支持基本的用户/密码认证模式
6、支持自定义用户agent,referer,cookies和header内容
7、支持浏览器转向,并能控制转向深度
8、能把网页中的链接扩展成高质量的url(默认)
9、方便提交数据并且获取返回值
10、支持跟踪HTML框架(v0.92增加)
11、支持再转向的时候传递cookies
下面是简单的例子,比如说我们抓取我的blog的文字
<?php
include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->fetchtext("http://www.4wei.cn");
echo $snoopy->results;
?>
<?php
include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->fetchlinks("http://www.4wei.cn");
print_r($snoopy->results);
?>
<?php
/**
* @name Snoopy手册中文版
* @author 毛毛虫 wangchong1985@gmail.com
* @version Snoopy - the PHP net client v1.2.2
* @link http://www.wangchong.org
* @since 2008-04-27
*/
include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->fetchtext("http://www.php.net/");
print $snoopy->results;
$snoopy->fetchlinks("http://www.phpbuilder.com/");
print $snoopy->results;
$submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html";
$submit_vars["q"] = "amiga";
$submit_vars["submit"] = "Search!";
$submit_vars["searchhost"] = "Altavista";
$snoopy->submit($submit_url,$submit_vars);
print $snoopy->results;
$snoopy->maxframes=5;
$snoopy->fetch("http://www.ispi.net/");
echo "
";
echo htmlentities($snoopy->results[0]);
echo htmlentities($snoopy->results[1]);
echo htmlentities($snoopy->results[2]);
include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->user = "joe";
$snoopy->pass = "bloe";
if($snoopy->fetch("http://www.slashdot.org/"))
{
echo "response code: ".$snoopy->response_code."
";
while(list($key,$val) = each($snoopy->headers))
echo $key.": ".$val."
";
echo "
";
例子: 展示所有属性的功能:
include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->proxy_host = "my.proxy.host";
$snoopy->proxy_port = "8080";
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)";
$snoopy->referer = "http://www.microsnot.com/";
$snoopy->cookies["SessionID"] = 238472834723489l;
$snoopy->cookies["favoriteColor"] = "RED";
$snoopy->rawheaders["Pragma"] = "no-cache";
$snoopy->maxredirs = 2;
$snoopy->offsiteok = false;
$snoopy->expandlinks = false;
$snoopy->user = "joe";
$snoopy->pass = "bloe";
if($snoopy->fetchtext("http://www.phpbuilder.com"))
{
while(list($key,$val) = each($snoopy->headers))
echo $key.": ".$val."
";
echo "
";
echo "
";
例子: 抓取框架内容并展示结果:
include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->maxframes = 5;
if($snoopy->fetch("http://www.ispi.net/"))
{
echo "
COPYRIGHT:
Copyright(c) 1999,2000 ispi. All rights reserved.
This software is released under the GNU General Public License.
Please read the disclaimer at the top of the Snoopy.class.php file.
THANKS:
Special Thanks to:
Peter Sorger help fixing a redirect bug
Andrei Zmievski implementing time out functionality
Patric Sandelin
help with fetchform debugging
Carmelo misc bug fixes with frames
文章出处:http://www.diybl.com/course/1_web/webjs/200855/114322.html