Tutorial系列Table of Contents
Spark Introduction and core concepts
UnderstandSpark basicconcepts, history发展, corecomponent and architecturedesign, MasterSpark basicworking principles.
View DetailsSpark Environment Setup and deployment
详细介绍Spark Installmethod, including单机模式, Standalonecluster, YARN模式 and Kubernetes模式deployment.
View DetailsSpark RDD programmingmodel
Deep dive intoRDD (弹性distributeddata集) concepts, creation方式, 转换operation and 行动operation, MasterRDDprogramming.
View DetailsSpark SQL and DataFrame
LearnSpark SQL and DataFrame/Dataset API, Masterstructure化dataprocessing, SQLquery and datasources集成.
View DetailsSpark Streaming 实时processing
UnderstandSpark Streaming 原理, MasterDStream API and Structured Streaming, Implement实时dataprocessing.
View DetailsSpark MLlib 机器LearnBasics
LearnMLlib basicconcepts and 常用algorithms, includingclassification, 回归, 聚class and 协同filteretc.机器Learntask.
View DetailsSpark MLlib advanced features
Deep dive intoMLlib advanced features, including自定义Transformer, 交叉verification, 网格搜索 and modeldeploymentetc..
View DetailsSpark GraphX graph计算
UnderstandGraphX basicconcepts and graph计算model, Mastergraph creation, operation and 常用graphalgorithmsImplement.
View DetailsSpark performanceoptimization and 调优
LearnSpark performanceoptimization策略, includingresourcemanagement, data倾斜processing, 序列化optimization and cache策略etc..
View DetailsSpark 实践case and best practices
throughpracticalcaseLearnSpark best practices, includingdataprocessingpipeline, 机器Learnproject and 实时流processingapplications.
View Details