基于机器学习的恶意网页检测.pdf

资源描述

《基于机器学习的恶意网页检测.pdf》由会员分享，可在线阅读，更多相关《基于机器学习的恶意网页检测.pdf（8页珍藏版）》请在金锄头文库上搜索。

1、Obfuscated Malicious Javascript Detection using Classifi cation Techniques Peter Likarish, Eunjin (EJ) Jung Dept. of Computer Science The University of Iowa Iowa City, IA 52242 plikaris, ejjungcs.uiowa.edu Insoon Jo Distributed Computing Systems Lab School of Computer Science and Engineering Seoul N

2、ational University ischodcslab.snu.ac.kr Abstract As the World Wide Web expands and more users join, it becomes an increasingly attractive means of distributing malware. Malicious javascript frequently serves as the initial infection vector for malware. We train several classifi ers to detect malici

3、ous javascript and evaluate their performance. We propose features focused on detecting obfuscation, a common technique to bypass traditional malware detectors. As the classi- fi ers show a high detection rate and a low false alarm rate, we propose several uses for the classifi ers, in- cluding sele

4、ctively suppressing potentially malicious javascript based on the classifi ers recommendations, achieving a compromise between usability and secu- rity. 1. Introduction Malware distributors on the web have a large num- ber of attack vectors available including:drive-by download sites, fake codec ins

5、tallation requests, ma- licious advertisements and spam messages on blogs or social network sites. Most common attack methods use malicious javascript during part of the attack, in- cluding cross-site scripting 20 and web-based mal- ware distribution. Javascript may be used to redirect a user to a w

6、ebsite hosting malicious software, to cre- ate a window recommending users download a fake codec, to detect what software versions the user has installed and select a compatible exploit or to directly execute an exploit. Malicious javascript often utilizes obfuscation to hide known exploits and prev

7、ent rule-based or reg- ular expression (regex)-based anti-malware software from detecting the attack. The complexity of obfus- cation techniques has increased, raises the resources necessary to deobfuscate the attacks. For instance, at- tacks often include references to legitimate companies to disgu

8、ise their purpose and include context-sensitive information in their obfuscation algorithm. Our detec- tor takes advantage of the ubiquity of this obfuscation. Fig. 1 shows the clear difference between obfuscated javascript and a benign script. Even though the differ- ence is easily discernable by t

9、he human eyes, obfus- cation detection is not trivial. We investigate automat- ing the detection of malicious javascript using classi- fi ers trained on features present in obfuscated scripts collected from the internet. Of course, some benign javascript is also obfuscated as well, and some mali- ci

10、ous javascript is not. Our results show that we detect the vast majority of malicious scripts while detecting very few benign scripts as malicious. We further ad- dress this in Section 5.1. In the next section, we discuss prior research on malicious javascript detection. Then, we describe the system

11、 we used to collect both malicious and benign javascripts for training and testing machine learning classifi ers. We follow this with performance evalua- tion of four classifi ers and conclude with recommen- dations based on our fi ndings as well as detailing fu- ture work. 2. Related work Javascrip

12、t has become so widespread that nearly all users allow it to execute without question. To pro- tect users, current browsers use sandboxing: limit- (a) Obfuscated javascript (b) Benign javascript Figure 1. Example scripts ing the resources javascript can access.At a high- level, javascript exploits o

13、ccur when malicious code circumvents this sandboxing or utilizes legitimate in- structions in an unexpected manner in order to fool users into taking insecure actions. For an overview of javascript attacks and defenses, readers are referred to 11. 2.1. Disabling javascript NoScript, an extension for

14、 Mozillas Firefox web browser, selectively allows javascript 13. NoScript disables javascript, java, fl ash and other plugin con- tent types by default and only allows script execution from a website in a user-managed whitelist. However, many attacks, especially from user-generated content, are host

15、ed at reputable websites and may bypass this whitelist check. For example, Symantec reported that many of 808,000 unique domains hosting malicious javascript were mainstream websites 19. 2.2. Automated deobfuscation of javascript As mentioned in Section 1, obfuscation is a com- mon technique to bypa

16、ss malware detectors.Sev- eral projects aid anti-malware researchers by automat- ing the deobfuscation process. Caffeine Monkey 6 is a customized version of the Mozillas SpiderMon- key 14 designed to automate the analysis of obfus- cated malicious javascript. Wepawet is an online ser- vice to which users can submit javascript, fl ash or pdf fi les. Wepawet automatically generates a useful report, checking for known exploits, providing deobfuscation and capturing network acti

展开阅读全文