《哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx》由会员分享,可在线阅读,更多相关《哈尔滨工业大学 深圳 高级计算机网络 课程project 网页抓取与恢复.docx(28页珍藏版)》请在金锄头文库上搜索。
1、哈尔滨工业大学深圳研究生院Advanced Computer Network Project Report http protocol analysis and webpage reverting 报告日期 2017年12月12号 Contenthttp protocol analysis and webpage reverting11Introduction11.1. Objectives11.2. Environment and tools11.3. Task Distribution12. Protocol Analysis22.1 HTTP Analysis22.2. TCP Anal
2、ysis72.3. IP Analysis113. Capture Packets143.1 Introduction143.2 Process144. Analyse Packets164.1. Introduction164.2. The process of analysis164.3. The method of analysis175. Revert Webpage205.1. Principle205.2. Process21Main Code225.3. Result241Introduction1.1. ObjectivesHTTP is the most widely use
3、d Internet protocol on the Internet. Write a program that can capture the coming packets and analyze HTTP protocol, then evert the webpage which you select to test by using the captured data.1.2. Environment and toolsWindows 10Codeblock + cWinpcap2. Protocol Analysis2.1 HTTP AnalysisThe HTTP (Hypert
4、ext Transfer Protocol ) is an application-layer protocol. HTTP specifies the format of data transfer between the browser client and the server. It specifies what kind of message the client can send to the server, and what kind of response. it is the foundation of data communication for the World Wid
5、e Web.work procedureAn HTTP operation is called a transaction, and its working process can be divided into four steps.1) First client and server need to establish a connection. 2) After the connection is established, the client sends a request to the server in the format of URL, protocol version num
6、ber, followed by MIME information including the request modifier, client information, and possible content.3) After receiving the request, the server gives the corresponding response information in the form of a status line including the protocol version number of the message, a successful or incorr
7、ect code, followed by the MIME information including server information, entity information and possible content.4) Client Receive The information returned by the server is displayed on the users display through the browser, and the client is disconnected from the server.If an error occurs at some p
8、oint in the above process, the error message will be returned to the client with the display output. Our request might also have been through the proxy server before it reached the web serve.HTTP protocol-structHTTP messages consist of requests from the client to the server and responses from the se
9、rver to the client.Request lineGeneral information headRequest head Entity headerMessage bodyTable 2.1.1 request message formatThe request line starts with the method field, followed by the URL field and the HTTP protocol version field, ending with CRLF. SP is a delimiter. Except for CF and LF in th
10、e final CRLF sequence, it is not necessary.Status lineGeneral information headRequest headEntity headerMessage bodyTable 2.1.2 response message formatThe status symbol consists of 3 digits, indicating whether the request is understood or fulfilled. The reason analysis is a brief description of the o
11、riginal status code, the status code is used to support automatic operation, and the reason analysis is for the user to use. The client does not need to be used to check or display the syntax.Request methodHTTP defines 8 methods to indicate the desired action to be performed on the identified resour
12、ce. What this resource represents, whether pre-existing data or data that is generated dynamically, depends on the implementation of the server. Often, the resource corresponds to a file or the output of an executable residing on the server. methodsIntroductionsGETAsk for read a web page.HEADAsks fo
13、r read the head of web page.PUTRequests to store a web page.POSTAttach a url.DELETElDeletes the web page.TRACEEchoes the received request .OPTIONSQuery the properties of the server or a particular file.CONNECTConverts the request connection to a transparent TCP/IP tunnel.Table 2.3 HTTP request metho
14、dThe HTTP server should at least GET and HEAD methods, the other methods are optional. In addition, in addition to the above method, a specific HTTP server can also extend a custom method.Client request messageGET /somedir/page.html HTTP/1.1Host: www.someschool.eduConnection: closeUser-agent: Mozill
15、a/5.0Accept-language: frA client request is followed by a blank line, so that the request ends with a double newline, each in the form of a carriage return followed by a line feed. The Host field distinguishes between various DNS names sharing a single IP address, allowing name-based virtual hosting. While optional in HTTP/1.0, it is mandatory in HTTP/1.1.Response messageThe client sends a request to the server. The server responds with a status line. The response includes the version of the message protocol