交通信息网搜索行为分析

上传人:lizhe****0001 文档编号:47565718 上传时间:2018-07-03 格式:PDF 页数:6 大小:254.45KB
返回 下载 相关 举报
交通信息网搜索行为分析_第1页
第1页 / 共6页
交通信息网搜索行为分析_第2页
第2页 / 共6页
交通信息网搜索行为分析_第3页
第3页 / 共6页
交通信息网搜索行为分析_第4页
第4页 / 共6页
交通信息网搜索行为分析_第5页
第5页 / 共6页
点击查看更多>>
资源描述

《交通信息网搜索行为分析》由会员分享,可在线阅读,更多相关《交通信息网搜索行为分析(6页珍藏版)》请在金锄头文库上搜索。

1、Analysis of User Web Traffic with a Focus on Search ActivitiesFeng Qiu, Zhenyu Liu, Junghoo Cho University of California Los Angeles, CA 90095fqiu,vicliu,chocs.ucla.eduABSTRACTAlthough search engines are playing an increasingly important role in users Web access, our understanding is still limited r

2、egardingthe magnitude of search-engine influence. For example, how many times do people start browsing the Web from a search engine? Howmuch percentage of Web traffic is incurred as a result of search? To what extent does a search engine like Google extend the scope of Websites that users can reach?

3、 To study these issues, in this paper we analyze a real Web access trace collected over a period of two and half months from the UCLA Computer Science Department.Our study indicates that search engines influence about 13.6% ofthe users Web traffic directly and indirectly. In addition, our study prov

4、ides realistic estimates for certain key parameters used for Web modelling.1.INTRODUCTION Since its arrival in the early 90s, the World-Wide Web has be- come an integral part of our daily life. According to recent studies, people access the Web for a variety of reasons and spend increas-ingly more t

5、ime surfing the Web. For example, 1 shows that a typical Internet user spends more than 3 hours per week online and tends to spend progressively less time in front of the TV partly dueto increased “surfing” time. This research is motivated by our desire to understand how peo- ple access the informat

6、ion on the Web. Even though the Web has become one of the primary sources of information, our understand- ing is still limited regarding how the Web is currently used and howmuch it influences people. In particular, we are interested in the impact of search engines on peoples browsing pattern of the

7、 Web. According to recent studies 2, search engines play an increas- ingly important role in users Web access, and if users heavily rely on search engines in discovering and accessing Web pages, searchengines may introduce significant bias to the users perception of the Web 3. The main goal of this

8、paper is to quantitatively measure the po-tential influence of search engines and the general access pattern of users by analyzing a real Web access trace generated from the users daily usage. For this purpose, we have collected all HTTP packets originating from the UCLA Computer Science Department

9、from May 15th 2004 until July 31st 2004 and analyze it to answer the following questions: Search-engine impact: How much of a users access to theWeb is “influenced” by search engines? For example, how many times do people start browsing the Web by going to a search engine and issuing a query? How ma

10、ny times do people start from a “random” Web site? How much do search engines expand the “scope” of Websites that users visit? General user behavior: How many different sites do people visit when they surf the Web? How much time do people spend on a single page on average? How many links do peo- ple

11、 follow before they jump to a “random” page? The answers to the above questions will provide valuable in- sights on how the Web is accessed by the users. Our study will also provide realistic estimates for some of the key parameters used for Web modeling. For example, the number of clicks before a r

12、an- dom jump is one of the core parameters used for the random-surfer model and PageRank computation 4. The rest of the paper is organized as follows. In Section 2 we describe the dataset used for our analysis. In Section 3 we reportour findings on the influence of search engines on the users Webacc

13、ess. In Section 4 we report our other findings on the general user behavior on the Web. Related work is reviewed in Section 5 and Section 6 concludes the paper.2.DESCRIPTION OF DATASETIn this section we first describe how we collect our HTTP access trace and discuss the necessary cleaning procedures

14、 we apply to it to eliminate “noise.” 2.1HTTP access traceFigure 1: Network topology of UCLA CS DepartmentWe have captured all HTTP Requests and Responses coming to/leaving from the UCLA Computer Science Department for the period of two and a half months. As we show in Figure 1, the CS departmenthas

15、roughly750machinesconnectedthrougha100Mbps LAN, which is then connected to the Internet through the depart- ment router. Since all packets that go to/come from outside ma- chines pass this router, we can easily capture all HTTP packets by installing a packet recorder at the router. Given the large v

16、ol-ume of traffic, we recorded only the relevant HTTP headers (e.g., Request-URL, Referer, User-Agent, etc.) in the packets, discarding the actual content.StatisticsValue Collection periodMay 15th July 31st, 2004 # of local IPs749 # of remote IPs66,372 # of requests2,157,887 size of our trace (in bytes)50GBTable 1: Statistics on our datasetTo help the reader assess the scale of our HTTP trace, we report a few statistics of our dataset in Table 1. In brief, our dataset con

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 高等教育 > 其它相关文档

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号