哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx

上传人:bao****ty 文档编号:132309048 上传时间:2020-05-14 格式:DOCX 页数:28 大小:1.01MB
返回 下载 相关 举报
哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx_第1页
第1页 / 共28页
哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx_第2页
第2页 / 共28页
哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx_第3页
第3页 / 共28页
哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx_第4页
第4页 / 共28页
哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx_第5页
第5页 / 共28页
点击查看更多>>
资源描述

《哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx》由会员分享,可在线阅读,更多相关《哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx(28页珍藏版)》请在金锄头文库上搜索。

1、哈尔滨工业大学深圳研究生院Advanced Computer Network Project Report http protocol analysis and webpage reverting 报告日期 2017年12月12号 Contenthttp protocol analysis and webpage reverting11Introduction11.1. Objectives11.2. Environment and tools11.3. Task Distribution12. Protocol Analysis22.1 HTTP Analysis22.2. TCP Anal

2、ysis72.3. IP Analysis113. Capture Packets143.1 Introduction143.2 Process144. Analyse Packets164.1. Introduction164.2. The process of analysis164.3. The method of analysis175. Revert Webpage205.1. Principle205.2. Process21Main Code225.3. Result241Introduction1.1. ObjectivesHTTP is the most widely use

3、d Internet protocol on the Internet. Write a program that can capture the coming packets and analyze HTTP protocol, then evert the webpage which you select to test by using the captured data.1.2. Environment and toolsWindows 10Codeblock + cWinpcap2. Protocol Analysis2.1 HTTP AnalysisThe HTTP (Hypert

4、ext Transfer Protocol ) is an application-layer protocol. HTTP specifies the format of data transfer between the browser client and the server. It specifies what kind of message the client can send to the server, and what kind of response. it is the foundation of data communication for the World Wid

5、e Web.work procedureAn HTTP operation is called a transaction, and its working process can be divided into four steps.1) First client and server need to establish a connection. 2) After the connection is established, the client sends a request to the server in the format of URL, protocol version num

6、ber, followed by MIME information including the request modifier, client information, and possible content.3) After receiving the request, the server gives the corresponding response information in the form of a status line including the protocol version number of the message, a successful or incorr

7、ect code, followed by the MIME information including server information, entity information and possible content.4) Client Receive The information returned by the server is displayed on the users display through the browser, and the client is disconnected from the server.If an error occurs at some p

8、oint in the above process, the error message will be returned to the client with the display output. Our request might also have been through the proxy server before it reached the web serve.HTTP protocol-structHTTP messages consist of requests from the client to the server and responses from the se

9、rver to the client.Request lineGeneral information headRequest head Entity headerMessage bodyTable 2.1.1 request message formatThe request line starts with the method field, followed by the URL field and the HTTP protocol version field, ending with CRLF. SP is a delimiter. Except for CF and LF in th

10、e final CRLF sequence, it is not necessary.Status lineGeneral information headRequest headEntity headerMessage bodyTable 2.1.2 response message formatThe status symbol consists of 3 digits, indicating whether the request is understood or fulfilled. The reason analysis is a brief description of the o

11、riginal status code, the status code is used to support automatic operation, and the reason analysis is for the user to use. The client does not need to be used to check or display the syntax.Request methodHTTP defines 8 methods to indicate the desired action to be performed on the identified resour

12、ce. What this resource represents, whether pre-existing data or data that is generated dynamically, depends on the implementation of the server. Often, the resource corresponds to a file or the output of an executable residing on the server. methodsIntroductionsGETAsk for read a web page.HEADAsks fo

13、r read the head of web page.PUTRequests to store a web page.POSTAttach a url.DELETElDeletes the web page.TRACEEchoes the received request .OPTIONSQuery the properties of the server or a particular file.CONNECTConverts the request connection to a transparent TCP/IP tunnel.Table 2.3 HTTP request metho

14、dThe HTTP server should at least GET and HEAD methods, the other methods are optional. In addition, in addition to the above method, a specific HTTP server can also extend a custom method.Client request messageGET /somedir/page.html HTTP/1.1Host: www.someschool.eduConnection: closeUser-agent: Mozill

15、a/5.0Accept-language: frA client request is followed by a blank line, so that the request ends with a double newline, each in the form of a carriage return followed by a line feed. The Host field distinguishes between various DNS names sharing a single IP address, allowing name-based virtual hosting. While optional in HTTP/1.0, it is mandatory in HTTP/1.1.Response messageThe client sends a request to the server. The server responds with a status line. The response includes the version of the message protocol

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 高等教育 > 其它相关文档

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号