模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)

  • A+
所属分类:C#

模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码) - 推酷

之前已经介绍过了网络相关的一些基础知识了:

【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项

以及简单的网页内容抓取,用C#是如何实现的:

【教程】抓取网并提取网页中所需要的信息 之 C#版

现在接着来介绍,以模拟登陆百度首页:
http://www.baidu.com/
为例,说明如何通过C#模拟登陆网站。
不过,此处需要介绍一下此文前提:
假定你已经看完了:

【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项

了解了基本的网络相关基本概念;
看完了:

【总结】浏览器中的开发人员工具(IE9的F12和Chrome的Ctrl+Shift+I)-网页分析的利器

知道了如何使用IE9的F12等工具去分析网页执行的过程。
1.模拟登陆网站之前,需要搞清楚,登陆该网站的内部执行逻辑
此想要通过程序,即C#代码,实现模拟登陆百度首页之前。
你自己本身先要搞懂,本身登陆该网站,内部的逻辑是什么样的。
而关于如何利用工具,分析出来,百度首页登录的内部逻辑过程,参见:

【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程

2.然后才是用对应的语言(C#)去实现,模拟登陆的逻辑
看懂了上述用F12分析出来的百度首页的登陆的内部逻辑过程,接下来,用C#代码去实现,相对来说,就不是很难了。
注:
(1)关于在C#中如何利用cookie,不熟悉的,先去看:

【经验总结】Http,网页访问,HttpRequest,HttpResponse相关的知识

(2)对于正则表达式不熟悉的,去参考:

正则表达式学习心得

(3)对C#中的正则表达式的类Regex,不熟悉的,可参考:

C#中的正则表达式的学习心得

此处,再把分析出来的流程,贴出来,以便方便和代码对照:

| |
| --- |
| 顺序 |

| |
| --- |
| 访问地址 |

| |
| --- |
| 访问类型 |

| |
| --- |
| 发送的数据 |

| |
| --- |
| 需要获得/提取 的返回的值 |

| | | | | | |
| --- | --- | --- | --- | --- | --- |
| | 1 | http://www.baidu.com/ | GET | 无 | 返回的cookie中的BAIDUID |
| 2 | https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true | GET | 包含BAIDUID这个cookie | 从返回的html中提取出token的值 |
| 3 | https://passport.baidu.com/v2/api/?login | POST | 一堆的post data,其中token的值是之前提取出来的 | 需要验证返回的cookie中,是否包含BDUSS,PTOKEN,STOKEN,SAVEUSERID |

然后,最终就可以写出相关的,用于演示模拟登录百度首页的C#代码了。
【版本1:C#实现模拟登陆百度首页的完整代码 之 精简版】
其中,通过UI中,点击“获取cookie BAIDUID”:

click get cookie baiduid then got its value

然后调用下面这部分代码:
private void btnGetBaiduid_Click(object sender, EventArgs e)
{
    //http://www.baidu.com/
    string baiduMainUrl = txbBaiduMainUrl.Text;
    //generate http request
    HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);

//add follow code to handle cookies
    req.CookieContainer = new CookieContainer();
    req.CookieContainer.Add(curCookies);

req.Method = "GET";
    //use request to get response
    HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
    txbGotBaiduid.Text = "";
    foreach (Cookie ck in resp.Cookies)
    {
        txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;
        if (ck.Name == "BAIDUID")
        {
            gotCookieBaiduid = true;
        }
    }

if (gotCookieBaiduid)
    {
        //store cookies
        curCookies = resp.Cookies;
    }
    else
    {
        MessageBox.Show("错误:没有找到cookie BAIDUID !");
    }
}
获得上述所看到的BAIDUID这个cookie的值了。
然后接着点击“获取token值”,然后调用下面的代码:
private void btnGetToken_Click(object sender, EventArgs e)
{
    if (gotCookieBaiduid)
    {

string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);

//add previously got cookies
        req.CookieContainer = new CookieContainer();
        req.CookieContainer.Add(curCookies);

req.Method = "GET";
        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
        StreamReader sr = new StreamReader(resp.GetResponseStream());
        string respHtml = sr.ReadToEnd();

//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';

string tokenValP = @"bdPass.api.params.login_token='(?\w+)';";

Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);
        if (foundTokenVal.Success)
        {
            //extracted the token value
            txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;
            extractTokenValueOK = true;
        }
        else
        {
            txbExtractedTokenVal.Text = "错误:没有找到token的值!";
        }

}
    else
    {
        MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");
    }
}
就可以获取对应的token的值了:

click get token then got token value

接着再去填上你的百度的用户名和密码,然后再点击“模拟登陆百度首页”,就会调用如下代码:
private void btnEmulateLoginBaidu_Click(object sender, EventArgs e)
{
    if (gotCookieBaiduid && extractTokenValueOK)
    {
        string staticpage = "http://www.baidu.com/cache/user/html/jump.html";

//init post dict info
        Dictionary postDict = new Dictionary();
        //postDict.Add("ppui_logintime", "");
        postDict.Add("charset", "utf-8");
        //postDict.Add("codestring", "");
        postDict.Add("token", txbExtractedTokenVal.Text);
        postDict.Add("isPhone", "false");
        postDict.Add("index", "0");
        //postDict.Add("u", "");
        //postDict.Add("safeflg", "0");
        postDict.Add("staticpage", staticpage);
        postDict.Add("loginType", "1");
        postDict.Add("tpl", "mn");
        postDict.Add("callback", "parent.bdPass.api.login._postCallback");
        postDict.Add("username", txbBaiduUsername.Text);
        postDict.Add("password", txbBaiduPassword.Text);
        //postDict.Add("verifycode", "");
        postDict.Add("mem_pass", "on");

string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);

//add cookie
        req.CookieContainer = new CookieContainer();
        req.CookieContainer.Add(curCookies);
        //set to POST
        req.Method = "POST";
        req.ContentType = "application/x-www-form-urlencoded";
        //prepare post data
        string postDataStr = quoteParas(postDict);
        byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
        req.ContentLength = postBytes.Length;
        //send post data
        Stream postDataStream = req.GetRequestStream();
        postDataStream.Write(postBytes, 0, postBytes.Length);
        postDataStream.Close();
        //got response
        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
        //got returned html
        StreamReader sr = new StreamReader(resp.GetResponseStream());
        string loginBaiduRespHtml = sr.ReadToEnd();

//check whether got all expected cookies

Dictionary cookieCheckDict = new Dictionary();

string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};
        foreach (String cookieToCheck in cookiesNameList)
        {
            cookieCheckDict.Add(cookieToCheck, false);
        }

foreach (Cookie singleCookie in resp.Cookies)
        {
            if (cookieCheckDict.ContainsKey(singleCookie.Name))
            {
                cookieCheckDict[singleCookie.Name] = true;
            }
        }

bool allCookiesFound = true;
        foreach (bool foundCurCookie in cookieCheckDict.Values)
        {
            allCookiesFound = allCookiesFound && foundCurCookie;
        }

loginBaiduOk = allCookiesFound;
        if (loginBaiduOk)
        {
            txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";
        }
        else
        {
            txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的Header信息为:";

txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();

txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";

txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;

}
    }
    else
    {
        MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");
    }
}
如果用户名和密码都是正确的话,即可成功登陆:

input name and pwd then click login will login ok

当然,如果故意输入错误的用户名和密码,则会显示登陆错误,并且打印出返回的headers值和html代码:

fake name and pwd will login fail

完整的C#模拟登陆百度首页的代码,如下:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

using System.Net;
using System.IO;
using System.Text.RegularExpressions;
using System.Web;

namespace emulateLoginBaidu
{
    public partial class frmEmulateLoginBaidu : Form
    {
        CookieCollection curCookies = null;

bool gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;

public frmEmulateLoginBaidu()
        {
            InitializeComponent();
        }

private void frmEmulateLoginBaidu_Load(object sender, EventArgs e)
        {
            //init
            curCookies = new CookieCollection();
            gotCookieBaiduid = false;
            extractTokenValueOK = false;
            loginBaiduOk = false;
        }

/************

functions in crifanLib.cs

*************/

//quote the input dict values
        //note: the return result for first para no '&'
        public string quoteParas(Dictionary paras)
        {
            string quotedParas = "";
            bool isFirst = true;
            string val = "";
            foreach (string para in paras.Keys)
            {
                if (paras.TryGetValue(para, out val))
                {
                    if (isFirst)
                    {
                        isFirst = false;

quotedParas += para + "=" + HttpUtility.UrlPathEncode(val);

}
                    else
                    {

quotedParas += "&" + para + "=" + HttpUtility.UrlPathEncode(val);

}
                }
                else
                {
                    break;
                }
            }

return quotedParas;
        }

/************

Demo emulate login baidu related functions

*************/

private void btnGetBaiduid_Click(object sender, EventArgs e)
        {
            //http://www.baidu.com/
            string baiduMainUrl = txbBaiduMainUrl.Text;
            //generate http request

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);

//add follow code to handle cookies
            req.CookieContainer = new CookieContainer();
            req.CookieContainer.Add(curCookies);

req.Method = "GET";
            //use request to get response
            HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
            txbGotBaiduid.Text = "";
            foreach (Cookie ck in resp.Cookies)
            {
                txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;
                if (ck.Name == "BAIDUID")
                {
                    gotCookieBaiduid = true;
                }
            }

if (gotCookieBaiduid)
            {
                //store cookies
                curCookies = resp.Cookies;
            }
            else
            {
                MessageBox.Show("错误:没有找到cookie BAIDUID !");
            }
        }

private void btnGetToken_Click(object sender, EventArgs e)
        {
            if (gotCookieBaiduid)
            {

string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);

//add previously got cookies
                req.CookieContainer = new CookieContainer();
                req.CookieContainer.Add(curCookies);

req.Method = "GET";
                HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
                StreamReader sr = new StreamReader(resp.GetResponseStream());
                string respHtml = sr.ReadToEnd();

//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';

string tokenValP = @"bdPass.api.params.login_token='(?\w+)';";

Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);
                if (foundTokenVal.Success)
                {
                    //extracted the token value

txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;

extractTokenValueOK = true;
                }
                else
                {
                    txbExtractedTokenVal.Text = "错误:没有找到token的值!";
                }

}
            else
            {
                MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");
            }
        }

private void btnEmulateLoginBaidu_Click(object sender, EventArgs e)
        {
            if (gotCookieBaiduid && extractTokenValueOK)
            {

string staticpage = "http://www.baidu.com/cache/user/html/jump.html";

//init post dict info

Dictionary postDict = new Dictionary();

//postDict.Add("ppui_logintime", "");
                postDict.Add("charset", "utf-8");
                //postDict.Add("codestring", "");
                postDict.Add("token", txbExtractedTokenVal.Text);
                postDict.Add("isPhone", "false");
                postDict.Add("index", "0");
                //postDict.Add("u", "");
                //postDict.Add("safeflg", "0");
                postDict.Add("staticpage", staticpage);
                postDict.Add("loginType", "1");
                postDict.Add("tpl", "mn");

postDict.Add("callback", "parent.bdPass.api.login._postCallback");

postDict.Add("username", txbBaiduUsername.Text);
                postDict.Add("password", txbBaiduPassword.Text);
                //postDict.Add("verifycode", "");
                postDict.Add("mem_pass", "on");

string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);

//add cookie
                req.CookieContainer = new CookieContainer();
                req.CookieContainer.Add(curCookies);
                //set to POST
                req.Method = "POST";
                req.ContentType = "application/x-www-form-urlencoded";
                //prepare post data
                string postDataStr = quoteParas(postDict);
                byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
                req.ContentLength = postBytes.Length;
                //send post data
                Stream postDataStream = req.GetRequestStream();
                postDataStream.Write(postBytes, 0, postBytes.Length);
                postDataStream.Close();
                //got response
                HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
                //got returned html
                StreamReader sr = new StreamReader(resp.GetResponseStream());
                string loginBaiduRespHtml = sr.ReadToEnd();

//check whether got all expected cookies

Dictionary cookieCheckDict = new Dictionary();

string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};

foreach (String cookieToCheck in cookiesNameList)
                {
                    cookieCheckDict.Add(cookieToCheck, false);
                }

foreach (Cookie singleCookie in resp.Cookies)
                {
                    if (cookieCheckDict.ContainsKey(singleCookie.Name))
                    {
                        cookieCheckDict[singleCookie.Name] = true;
                    }
                }

bool allCookiesFound = true;
                foreach (bool foundCurCookie in cookieCheckDict.Values)
                {
                    allCookiesFound = allCookiesFound && foundCurCookie;
                }

loginBaiduOk = allCookiesFound;
                if (loginBaiduOk)
                {
                    txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";
                }
                else
                {
                    txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的Header信息为:";

txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();

txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";

txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;

}
            }
            else
            {
                MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");
            }
        }

private void lklEmulateLoginTutorialUrl_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)

{

string emulateLoginTutorialUrl = "http://www.crifan.com/emulate_login_website_using_csharp";

System.Diagnostics.Process.Start(emulateLoginTutorialUrl);
        }

private void btnClearAll_Click(object sender, EventArgs e)
        {
            curCookies = new CookieCollection();
            gotCookieBaiduid = false;
            extractTokenValueOK = false;
            loginBaiduOk = false;

txbGotBaiduid.Text = "";
            txbExtractedTokenVal.Text = "";

txbBaiduUsername.Text = "";
            txbBaiduPassword.Text = "";
            txbEmulateLoginResult.Text = "";
        }
    }
}
对应的,完整的VS2010的C#项目,可以去这里下载:

emulateLoginBaidu_csharp_2012-11-07.7z

【版本2:C#实现模拟登陆百度首页的完整代码 之 crifanLib.py版】
后来,又把上述代码,改为利用我的C#版本的crifanLib.cs,以方便以后再次利用相关的网络方面的库函数。
下面是完整的,利用到crifanLib.cs的版本,的C#代码:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

using System.Net;
using System.IO;
using System.Text.RegularExpressions;
using System.Web;

namespace emulateLoginBaidu
{
    public partial class frmEmulateLoginBaidu : Form
    {
        CookieCollection curCookies = null;

bool gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;

public frmEmulateLoginBaidu()
        {
            InitializeComponent();
        }

private void frmEmulateLoginBaidu_Load(object sender, EventArgs e)
        {
            this.AcceptButton = this.btnEmulateLoginBaidu;

//init for crifanLib.cs
            curCookies = new CookieCollection();

//init for demo login
            gotCookieBaiduid = false;
            extractTokenValueOK = false;
            loginBaiduOk = false;
        }

/************

functions in crifanLib.cs

Online browser: http://code.google.com/p/crifanlib/source/browse/trunk/csharp/crifanLib.cs

Download:       http://code.google.com/p/crifanlib/

*************/

//quote the input dict values
        //note: the return result for first para no '&'
        public string quoteParas(Dictionary paras)
        {
            string quotedParas = "";
            bool isFirst = true;
            string val = "";
            foreach (string para in paras.Keys)
            {
                if (paras.TryGetValue(para, out val))
                {
                    if (isFirst)
                    {
                        isFirst = false;

quotedParas += para + "=" + HttpUtility.UrlPathEncode(val);

}
                    else
                    {

quotedParas += "&" + para + "=" + HttpUtility.UrlPathEncode(val);

}
                }
                else
                {
                    break;
                }
            }

return quotedParas;
        }

/**********/
        /
 cookie */
        /
*************/

//add a single cookie to cookies, if already exist, update its value

public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies, bool overwriteDomain)

{
            bool found = false;

if (cookies.Count > 0)
            {
                foreach (Cookie originalCookie in cookies)
                {
                    if (originalCookie.Name == toAdd.Name)
                    {
                        // !!! for different domain, cookie is not same,

// so should not set the cookie value here while their domains is not same

// only if it explictly need overwrite domain
                        if ((originalCookie.Domain == toAdd.Domain) ||

((originalCookie.Domain != toAdd.Domain) && overwriteDomain))

{

//here can not force convert CookieCollection to HttpCookieCollection,

//then use .remove to remove this cookie then add
                            // so no good way to copy all field value
                            originalCookie.Value = toAdd.Value;

originalCookie.Domain = toAdd.Domain;

originalCookie.Expires = toAdd.Expires;
                            originalCookie.Version = toAdd.Version;
                            originalCookie.Path = toAdd.Path;

//following fields seems should not change
                            //originalCookie.HttpOnly = toAdd.HttpOnly;
                            //originalCookie.Secure = toAdd.Secure;

found = true;
                            break;
                        }
                    }
                }
            }

if (!found)
            {
                if (toAdd.Domain != "")
                {

// if add the null domain, will lead to follow req.CookieContainer.Add(cookies) failed !!!

cookies.Add(toAdd);
                }
            }

}//addCookieToCookies

//add singel cookie to cookies, default no overwrite domain

public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies)

{
            addCookieToCookies(toAdd, ref cookies, false);
        }

//check whether the cookies contains the ckToCheck cookie
        //support:
        //ckTocheck is Cookie/string
        //cookies is Cookie/string/CookieCollection/string[]
        public bool isContainCookie(object ckToCheck, object cookies)
        {
            bool isContain = false;

if ((ckToCheck != null) && (cookies != null))
            {
                string ckName = "";
                Type type = ckToCheck.GetType();

//string typeStr = ckType.ToString();

//if (ckType.FullName == "System.string")
                if (type.Name.ToLower() == "string")
                {
                    ckName = (string)ckToCheck;
                }
                else if (type.Name == "Cookie")
                {
                    ckName = ((Cookie)ckToCheck).Name;
                }

if (ckName != "")
                {
                    type = cookies.GetType();

// is single Cookie
                    if (type.Name == "Cookie")
                    {
                        if (ckName == ((Cookie)cookies).Name)
                        {
                            isContain = true;
                        }
                    }
                    // is CookieCollection
                    else if (type.Name == "CookieCollection")
                    {
                        foreach (Cookie ck in (CookieCollection)cookies)
                        {
                            if (ckName == ck.Name)
                            {
                                isContain = true;
                                break;
                            }
                        }
                    }
                    // is single cookie name string
                    else if (type.Name.ToLower() == "string")
                    {
                        if (ckName == (string)cookies)
                        {
                            isContain = true;
                        }
                    }
                    // is cookie name string[]
                    else if (type.Name.ToLower() == "string[]")
                    {
                        foreach (string name in ((string[])cookies))
                        {
                            if (ckName == name)
                            {
                                isContain = true;
                                break;
                            }
                        }
                    }
                }
            }

return isContain;
        }//isContainCookie

// update cookiesToUpdate to localCookies

// if omitUpdateCookies designated, then omit cookies of omitUpdateCookies in cookiesToUpdate

public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies, object omitUpdateCookies)

{
            if (cookiesToUpdate.Count > 0)
            {
                if (localCookies == null)
                {
                    localCookies = cookiesToUpdate;
                }
                else
                {
                    foreach (Cookie newCookie in cookiesToUpdate)
                    {
                        if (isContainCookie(newCookie, omitUpdateCookies))
                        {
                            // need omit process this
                        }
                        else
                        {
                            addCookieToCookies(newCookie, ref localCookies);
                        }
                    }
                }
            }
        }//updateLocalCookies

//update cookiesToUpdate to localCookies

public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies)

{
            updateLocalCookies(cookiesToUpdate, ref localCookies, null);
        }

/**********/
        /
 HTTP */
        /
*************/

/ get url's response /
        public HttpWebResponse getUrlResponse(string url,
                                        Dictionary headerDict,
                                        Dictionary postDict,
                                        int timeout,
                                        string postDataStr)
        {
            //CookieCollection parsedCookies;

HttpWebResponse resp = null;

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

req.AllowAutoRedirect = true;
            req.Accept = "/";

//const string gAcceptLanguage = "en-US"; // zh-CN/en-US
            //req.Headers["Accept-Language"] = gAcceptLanguage;

req.KeepAlive = true;

//IE8

//const string gUserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E";

//IE9

//const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64

const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; // x86

//Chrome

//const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4";

//Mozilla Firefox

//const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";

req.UserAgent = gUserAgent;

req.Headers["Accept-Encoding"] = "gzip, deflate";
            req.AutomaticDecompression = DecompressionMethods.GZip;

req.Proxy = null;

if (timeout > 0)
            {
                req.Timeout = timeout;
            }

if (curCookies != null)
            {
                req.CookieContainer = new CookieContainer();

req.CookieContainer.PerDomainCapacity = 40; // following will exceed max default 20 cookie per domain

req.CookieContainer.Add(curCookies);
            }

if (headerDict != null)
            {
                foreach (string header in headerDict.Keys)
                {
                    string headerValue = "";
                    if (headerDict.TryGetValue(header, out headerValue))
                    {

// following are allow the caller overwrite the default header setting

if (header.ToLower() == "referer")
                        {
                            req.Referer = headerValue;
                        }
                        else if (header.ToLower() == "allowautoredirect")
                        {
                            bool isAllow = false;
                            if (bool.TryParse(headerValue, out isAllow))
                            {
                                req.AllowAutoRedirect = isAllow;
                            }
                        }
                        else if (header.ToLower() == "accept")
                        {
                            req.Accept = headerValue;
                        }
                        else if (header.ToLower() == "keepalive")
                        {
                            bool isKeepAlive = false;
                            if (bool.TryParse(headerValue, out isKeepAlive))
                            {
                                req.KeepAlive = isKeepAlive;
                            }
                        }
                        else if (header.ToLower() == "accept-language")
                        {
                            req.Headers["Accept-Language"] = headerValue;
                        }
                        else if (header.ToLower() == "useragent")
                        {
                            req.UserAgent = headerValue;
                        }
                        else
                        {
                            req.Headers[header] = headerValue;
                        }
                    }
                    else
                    {
                        break;
                    }
                }
            }

if (postDict != null || postDataStr != "")
            {
                req.Method = "POST";
                req.ContentType = "application/x-www-form-urlencoded";

if (postDict != null)
                {
                    postDataStr = quoteParas(postDict);
                }

//byte[] postBytes = Encoding.GetEncoding("utf-8").GetBytes(postData);

byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
                req.ContentLength = postBytes.Length;

Stream postDataStream = req.GetRequestStream();
                postDataStream.Write(postBytes, 0, postBytes.Length);
                postDataStream.Close();
            }
            else
            {
                req.Method = "GET";
            }

//may timeout, has fixed in:

//http://www.crifan.com/fixed_problem_sometime_httpwebrequest_getresponse_timeout/

resp = (HttpWebResponse)req.GetResponse();

updateLocalCookies(resp.Cookies, ref curCookies);

return resp;
        }

public HttpWebResponse getUrlResponse(string url,
                                    Dictionary headerDict,
                                    Dictionary postDict)
        {
            return getUrlResponse(url, headerDict, postDict, 0, "");
        }

public HttpWebResponse getUrlResponse(string url)
        {
            return getUrlResponse(url, null, null, 0, "");
        }

// valid charset:"GB18030"/"UTF-8", invliad:"UTF8"
        public string getUrlRespHtml(string url,
                                        Dictionary headerDict,
                                        string charset,
                                        Dictionary postDict,
                                        int timeout,
                                        string postDataStr)
        {
            string respHtml = "";

//HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout);

HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout, postDataStr);

//long realRespLen = resp.ContentLength;

StreamReader sr;
            if ((charset != null) && (charset != ""))
            {
                Encoding htmlEncoding = Encoding.GetEncoding(charset);
                sr = new StreamReader(resp.GetResponseStream(), htmlEncoding);
            }
            else
            {
                sr = new StreamReader(resp.GetResponseStream());
            }
            respHtml = sr.ReadToEnd();

return respHtml;
        }

public string getUrlRespHtml(string url, Dictionary headerDict, string charset, Dictionary postDict, string postDataStr)

{

return getUrlRespHtml(url, headerDict, charset, postDict, 0, postDataStr);

}

public string getUrlRespHtml(string url, Dictionary headerDict, Dictionary postDict)

{
            return getUrlRespHtml(url, headerDict, "", postDict, "");
        }

public string getUrlRespHtml(string url, Dictionary headerDict)

{
            return getUrlRespHtml(url, headerDict, null);
        }

public string getUrlRespHtml(string url, string charset, int timeout)
        {
            return getUrlRespHtml(url, null, charset, null, timeout, "");
        }

public string getUrlRespHtml(string url, string charset)
        {
            return getUrlRespHtml(url, charset, 0);
        }

public string getUrlRespHtml(string url)
        {
            return getUrlRespHtml(url, "");
        }

/************

Demo emulate login baidu related functions

*************/

private void btnGetBaiduid_Click(object sender, EventArgs e)
        {
            //http://www.baidu.com/
            string baiduMainUrl = txbBaiduMainUrl.Text;
            HttpWebResponse resp = getUrlResponse(baiduMainUrl);
            txbGotBaiduid.Text = "";
            foreach (Cookie ck in resp.Cookies)
            {
                txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;
                if (ck.Name == "BAIDUID")
                {
                    gotCookieBaiduid = true;
                }
            }

if (gotCookieBaiduid)
            {
                //store cookies
                curCookies = resp.Cookies;
            }
            else
            {
                MessageBox.Show("错误:没有找到cookie BAIDUID !");
            }
        }

private void btnGetToken_Click(object sender, EventArgs e)
        {
            if (gotCookieBaiduid)
            {

string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";

string respHtml = getUrlRespHtml(getapiUrl);

//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';

string tokenValP = @"bdPass.api.params.login_token='(?\w+)';";

Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);
                if (foundTokenVal.Success)
                {
                    //extracted the token value

txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;

extractTokenValueOK = true;
                }
                else
                {
                    txbExtractedTokenVal.Text = "错误:没有找到token的值!";
                }

}
            else
            {
                MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");
            }
        }

private void btnEmulateLoginBaidu_Click(object sender, EventArgs e)
        {
            if (gotCookieBaiduid && extractTokenValueOK)
            {

string staticpage = "http://www.baidu.com/cache/user/html/jump.html";

//init post dict info

Dictionary postDict = new Dictionary();

//postDict.Add("ppui_logintime", "");
                postDict.Add("charset", "utf-8");
                //postDict.Add("codestring", "");
                postDict.Add("token", txbExtractedTokenVal.Text);
                postDict.Add("isPhone", "false");
                postDict.Add("index", "0");
                //postDict.Add("u", "");
                //postDict.Add("safeflg", "0");
                postDict.Add("staticpage", staticpage);
                postDict.Add("loginType", "1");
                postDict.Add("tpl", "mn");

postDict.Add("callback", "parent.bdPass.api.login._postCallback");

postDict.Add("username", txbBaiduUsername.Text);
                postDict.Add("password", txbBaiduPassword.Text);
                //postDict.Add("verifycode", "");
                postDict.Add("mem_pass", "on");

string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";

string loginBaiduRespHtml = getUrlRespHtml(baiduMainLoginUrl, null, postDict);

//check whether got all expected cookies

Dictionary cookieCheckDict = new Dictionary();

string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};

foreach (String cookieToCheck in cookiesNameList)
                {
                    cookieCheckDict.Add(cookieToCheck, false);
                }

foreach (Cookie singleCookie in curCookies)
                {
                    if (cookieCheckDict.ContainsKey(singleCookie.Name))
                    {
                        cookieCheckDict[singleCookie.Name] = true;
                    }
                }

bool allCookiesFound = true;
                foreach (bool foundCurCookie in cookieCheckDict.Values)
                {
                    allCookiesFound = allCookiesFound && foundCurCookie;
                }

loginBaiduOk = allCookiesFound;
                if (loginBaiduOk)
                {
                    txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";
                }
                else
                {
                    txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";

txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";

txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;

}
            }
            else
            {
                MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!");
            }
        }

private void lklEmulateLoginTutorialUrl_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)

{

string emulateLoginTutorialUrl = "http://www.crifan.com/emulate_login_website_using_csharp";

System.Diagnostics.Process.Start(emulateLoginTutorialUrl);
        }

private void btnClearAll_Click(object sender, EventArgs e)
        {
            curCookies = new CookieCollection();
            gotCookieBaiduid = false;
            extractTokenValueOK = false;
            loginBaiduOk = false;

txbGotBaiduid.Text = "";
            txbExtractedTokenVal.Text = "";

txbBaiduUsername.Text = "";
            txbBaiduPassword.Text = "";
            txbEmulateLoginResult.Text = "";
        }

}
}
完整的VS2010的项目,可去这里下载:

emulateLoginBaidu_csharp_crifanLibVersion_2012-11-07.7z

关于crifanLib.cs:

在线浏览: crifanLib.cs

下载: crifanLib_2012-11-07.7z

【总结】

可以看出,虽然之前分析出来的,模拟登陆百度首页的流程,相对不是那么复杂,但是实际上用C#实现起来,要比 用Python实现出来 ,要复杂的多。

主要原因在于,Python中封装了很多常用的,好用的库函数。而C#中,很多细节,都需要自己处理,包括GET或POST时的各种参数,都要考虑到,另外尤其是涉及cookie等方面的内容,很是繁琐。

所以,对于抓取网页分析内容,和模拟登陆网站来说,还是Python用起来比较方便。
【后记 2013-09-11】
1.经过研究:

【记录】研究模拟登陆百度的C#代码为何在.NET 4.0中不工作

的确是:
之前的代码, 在.NET 3.5之前,都是正常工作的,而在.NET 4.0中,是不工作的;
2.现已找到原因并修复。
原因是:

.NET 4.0,对于没有指定expires域的cookie,会把cookie的expires域值设置成默认的0001年0分0秒,由此导致该cookie过期失效,导致百度的那个cookie:

H_PS_PSSID
失效,导致后续操作都异常了。
而.NET 3.5之前,虽然cookie的expires域值也是默认的0001年0分0秒,但是实际上cookie还是可用的,所以后续就正常,就不会发生此问题;
3.修复后的代码:
供下载:
(1)模拟百度登陆 独立完整代码版本 .NET 4.0
emulateLoginBaidu_csharp_independentCodeVersion_2013-09-11.7z
(2)模拟百度登陆 (利用我自己的)crifanLib版本 .NET 4.0
emulateLoginBaidu_csharp_crifanLibVersion_2013-09-11.7z
(抽空再上传上面两个文件,因为此处上传出错:

| |
| --- |
| xxx.7z:
unknown Bytes complete FAILED!
:Upload canceled
: VIRUS DETECTED!
(Heuristics.Broken.Executable FOUND) |

抽空换个时间上传试试。还是同样错误的话,再去解决。)
【总结】
.NET 不论是3.5以及之前,还是最新的4.0,在解析http的response中的Set-Cookie变成CookieCollection方面:
一直就是狗屎,bug一堆。
详见:

SetCookie解析有bug

以后,能少用那个resp.Cookies,就少用吧。
否则被C#玩死,都不知道怎么死的。
还是用自己写的那个解析函数去解析Set-Cookie,得到正确的CookieCollection吧。
详见:

解析(Http访问所返回的)Set-Cookie的字符串为Cookie数组:parseSetCookie

  • 我的微信
  • 这是我的微信扫一扫
  • weinxin
  • 我的微信公众号
  • 我的微信公众号扫一扫
  • weinxin