數(shù)據(jù)科學(xué)家
課程介紹
了解數(shù)據(jù)科學(xué)家的工作內(nèi)容、需要解決的問題以及其處理現(xiàn)實(shí)問題時(shí)應(yīng)用的方法,以獲得來自不同行業(yè)數(shù)據(jù)的業(yè)務(wù)價(jià)值。實(shí)施自動(dòng)推薦系統(tǒng)。
How to identify potential business use cases where data science can provide impactful results
How to obtain, clean and combine disparate data sources to create a coherent picture for analysis
What statistical methods to leverage for data exploration that will provide critical insight into your data
Where and when to leverage Hadoop streaming and Apache Spark for data science pipelines
What machine learning technique to use for a particular data science project
How to implement and manage recommenders using Spark’s MLlib, and how to set up and evaluate data experiments
What are the pitfalls of deploying new analytics projects to production, at scale
課程目標(biāo)
? 了解數(shù)據(jù)科學(xué)家的工作內(nèi)容、需要解決的問題以及其處理現(xiàn)實(shí)問題時(shí)應(yīng)用的方法,以獲得來自不同行業(yè)數(shù)據(jù)的業(yè)務(wù)價(jià)值。實(shí)施自動(dòng)推薦系統(tǒng)。
適合人群
? 面向具備Hadoop基礎(chǔ)知識(shí)(HDFS、MapReduce、Hadoop Streaming、Hive)的工程師、數(shù)據(jù)分析師、統(tǒng)計(jì)人員。培訓(xùn)對(duì)象應(yīng)具備熟練的腳本語言能力:Python 是*;熟悉 Perl或者Ruby即可。