数据科学相关岗位面试题库1-100 (2)

上传人:ja****ee 文档编号:149211535 上传时间:2020-10-25 格式:DOC 页数:12 大小:165KB
返回 下载 相关 举报
数据科学相关岗位面试题库1-100 (2)_第1页
第1页 / 共12页
数据科学相关岗位面试题库1-100 (2)_第2页
第2页 / 共12页
数据科学相关岗位面试题库1-100 (2)_第3页
第3页 / 共12页
数据科学相关岗位面试题库1-100 (2)_第4页
第4页 / 共12页
数据科学相关岗位面试题库1-100 (2)_第5页
第5页 / 共12页
点击查看更多>>
资源描述

《数据科学相关岗位面试题库1-100 (2)》由会员分享,可在线阅读,更多相关《数据科学相关岗位面试题库1-100 (2)(12页珍藏版)》请在金锄头文库上搜索。

1、1、你在用哪些机器学习技术,是研究层次的,还是生产层次的?“What ML techniques do you work with? / Are these research level or production level techniques?”来源12、请告诉我一项你曾全程参与的项目,包括项目名称,所解决的问题及其解决方案和项目最终结果。“Tell me about an in-depth example of projects you have worked on from inception to completion. What was the project, how did

2、you approach the problem, what was the end result etc.”?来源13、你最喜欢的算法是什么“Whats your favorite algorithm?”来源14、你编程语言能力达到什么级别?你通常用编程语言做什么?以及你遇到过最难的挑战是什么?“What level of experience do you have with programming language?What do you do daily with programming language and what was your hardest challenges wit

3、h this?”来源15、你处理过最大的数据集是什么?你是如何处理的,最终结果怎么样?“What is the largest data set that you have processed? How did you approach this, and what was the end result?”来源16、如果让你向一名业务主管解释“线性回归”,你会如何解释?How would you explain a linear regression to a business executive?来源27、线性回归的一些替代模型有哪些?这些替代模型的优缺点是什么?What are some

4、alternative models to a linear regression? Why are they better or worse?来源28、(基于以下关系表,)请编写SQL查询语句,创建对应关系表,并计算出每个班的最高成绩(Grade)。Write a SQL query to create a table that shows, for each class, the value of the highest grade in the class.来源2 9、基于上表,我想计算出每个班得分最高的同学的姓名,请写出SQL查询语句。Suppose I had the same ta

5、ble as the previous question, but instead for each class I want to find the name of the student who got the highest grade. Write a query to do that.来源210、用伪代码或任何您想用的编程语言编写一个程序,要求如下:1)输出数字从1到100;2)遇到3的倍数、5的倍数以及3和5的公倍数,分别用“Fizz”和“Buzz”和“FizzBuzz”代替。In pseudo-code or whatever language you would like: w

6、rite a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.来源211、一家公司正在出售Microsoft Office的竞争对手的产品,该公司正在通过发送两套不同的电子邮件方案来测试自己的营销策

7、略。其中,一种方案涉及与业务相关的内容,另一种方案涉及与消费者相关的内容。以下是关于两种电子邮件的一系列图表。最下面的两张图与前两张的数据相同,是根据客户在发送电子邮件前一年在公司消费的金额计算得出的数据。请问,哪种方式效果更好?A company selling a competitor to Microsoft Office is testing their marketing by sending out two different sets of emails. One set contains business related content, and one contains co

8、nsumer related content. We are interested in how each campaign performed; did one do at getting people to click-through? Below is a selection of graphs on the two email campaigns. The bottom two graphs have the same data as the top two, only bucketed by the amount the customer has spent with the com

9、pany the year before the emails were sent. Which campaign did better?来源2 12、什么是正则化?有什么用?Explain what regularization is and why it is useful来源313、你最喜欢的数据科学家以及创业公司有哪些?Which data scientists do you admire most? which startups?来源314、您将如何检验一个基于多元回归的预测模型的有效性?How would you validate a model you created to ge

10、nerate a pre dictive model of a quantitative outcome variable using multiple regression.来源315、解释什么是精确率和召回率。它们与ROC曲线的关系?Explain what precision and recall are. How do they relate to the ROC curve?来源316、你怎样证明你对算法的改进确实比不改进有用?How can you prove that one improvement youve brought to an algorithm is really

11、an improvement over not doing anything?来源317、什么是根因分析(root cause analysis)?What is root cause analysis?来源318、您是否熟悉价格优化,价格弹性,库存管理,竞争情报?举例说明。Are you familiar with price optimization, price elasticity, inventory management, competitive intelligence? Give examples.来源319、什么是统计功效?What is statistical power?

12、来源320、解释什么是“重采样”方法,并揭示它们为什么有用?说明其局限性。Explain what resampling methods are and why they are useful. Also explain their limitations.来源321、过多的假正或过多的假负例,哪一个会更好吗?请给出揭示。Is it better to have too many false positives, or too many false negatives? Explain.来源322、什么是选择性偏差,为什么它很重要,你如何避免它?What is selection bias,

13、why is it important and how can you avoid it?来源323、举例说明如何使用试验设计来回答有关用户行为的问题。Give an example of how you would use experimental design to answer a question about user behavior.来源324、长表和宽表的区别,即“long”(“tall”)和“wide”格式数据有什么区别?What is the difference between long (tall) and wide format data?来源325、你用什么方法来确定

14、在一篇文章中发布(或出现在报纸或其他媒体上)的统计数据为错误或者只是为了支持作者的观点而给出的,并非为关于正确数据?来源3What method do you use to determine whether the statistics published in an article (or appeared in a newspaper or other media) are either wrong or presented to support the authors point of view, rather than correct, comprehensive factual i

15、nformation on a specific subject?来源326、解释Edward Tufte提出的的“chartjunk”的概念。Explain Edward Tuftes concept of chart junk.来源327、如何筛选异常值,如果发现异常值,应该怎么做?How would you screen for outliers and what should you do if you find one? 来源328、你如何使用极值理论,蒙特卡罗模拟或数理统计(或其他)来正确估计非常罕见事件的概率?How would you use either the extreme value theory, Monte Carlo simulations or mathematical statistics (or anything else) to correctly estimate the chance of a very rare event?来源329、什么是推荐引擎?它是如何工作的?What is a recommendation engine? Ho

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 大杂烩/其它

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号