تبلیغات

آموزش توسعه اپلیکیشن های Spark با Scala و Cloudera

دسته بندی ها: آموزش آپاچی اسپارک (Apache Spark) ، آموزش های پلورال سایت (Pluralsight) ، آموزش شبکه ، آموزش اسکالا (Scala) ، آموزش Cloud

در هسته کار با مجموعه داده های بزرگ، دانش کامل از پلتفرم های کلان داده  مانند Apache Spark و Hadoop وجود دارد. در این دوره با جزئیات فنی نحوه کار Spark، بررسی RDD API،  استفاده از Spark SQL و DataFrames، نحوه کار با مجموعه داده های Spark typed API و غیره آشنا می شوید.

سرفصل:

  • معرفی دوره
  • چرا Spark با Scala و Cloudera؟
  • چرا Apache Spark؟
  • تاریخچه Spark
  • نصب (Java 8 (JDK 1.8
  • آماده سازی کلان داده
  • مبانی Scala
  • عبارات، توابع، متدها و کلاس ها
  • برنامه نویسی تابعی
  • درک Spark
  • کتابخانه های Spark
  • معماری Spark
  • ذخیره سازی در Spark و فرمت های داده پشتیبانی شده
  • بهینه سازی عملکرد: Tungsten و Catalyst
  • SparkContext و SparkSession
  • RDD و PairRDD
  • ذخیره داده به عنوان ObjectFile، NewAPIHadoopFile، SequenceFile، ...
  • ترکیب، جمع آوری، کاهش و گروه بندی بر روی PairRDDs
  • ReduceByKey در مقابل  GroupByKey
  • بارگیری DataFrames - متن و CSV
  • کار با ستون ها
  • Spark SQL
  • Catalog API
  • مجموعه داده چیست؟
  • ایجاد مجموعه داده
  • و غیره
آیا این نوشته را دوست داشتید؟
Developing Spark Applications Using Scala & Cloudera Publisher:Pluralsight Author:Xavier Morera Duration:5h 42m Level:Beginner

Apache Spark is one of the fastest and most efficient general engines for large-scale data processing. In this course, you'll learn how to develop Spark applications for your Big Data using Scala and a stable Hadoop distribution, Cloudera CDH.
At the core of working with large-scale datasets is a thorough knowledge of Big Data platforms like Apache Spark and Hadoop. In this course, Developing Spark Applications Using Scala & Cloudera, you’ll learn how to process data at scales you previously thought were out of your reach. First, you’ll learn all the technical details of how Spark works. Next, you’ll explore the RDD API, the original core abstraction of Spark. Then, you’ll discover how to become more proficient using Spark SQL and DataFrames. Finally, you'll learn to work with Spark's typed API: Datasets. When you’re finished with this course, you’ll have a foundational knowledge of Apache Spark with Scala and Cloudera that will help you as you move forward to develop large-scale data applications that enable you to work with Big Data in an efficient and performant way.
Course Overview
2m 0s
Course Overview
2m 0s
Why Spark with Scala and Cloudera?
12m 50s
Why Spark with Scala and Cloudera?
1m 8s
But Why Apache Spark?
2m 20s
Brief History of Spark
3m 23s
What We Will Cover in This Training
2m 1s
Picking a Spark Supported Language: Scala, Python, Java, or R
1m 11s
What Do You Need for This Course?
1m 38s
Takeaway
1m 6s
Getting an Environment and Data: CDH + StackOverflow
34m 50s
Getting an Environment & Data: CDH + StackOverflow
2m 12s
Prerequisites & Known Issues
2m 20s
Upgrading Cloudera Manager and CDH
5m 31s
Installing or Upgrading to Java 8 (JDK 1.8)
4m 21s
Getting Spark - There Are Several Options: 1.6
2m 35s
Getting Spark 2 Standalone
3m 14s
Installing Spark 2 on Cloudera
5m 56s
Data: StackOverflow & StackExchange Dumps + Demo Files
2m 58s
Preparing Your Big Data
3m 56s
Takeaway
1m 43s
Refreshing Your Knowledge: Scala Fundamentals for This Course
24m 54s
Refreshing Your Knowledge: Scala Fundamentals for This Course
1m 23s
Scala's History and Overview
2m 21s
Building and Running Scala Applications
1m 26s
Creating Self-contained Applications, Including scalac & sbt
4m 46s
The Scala Shell: REPL (Read Evaluate Print Loop)
1m 11s
Scala, the Language
4m 29s
More on Types, Functions, and Operations
1m 46s
Expressions, Functions, and Methods
1m 27s
Classes, Case Classes, and Traits
1m 0s
Flow Control
1m 13s
Functional Programming
1m 12s
Enter spark2-shell: Spark in the Scala Shell
0m 32s
Takeaway
2m 0s
Understanding Spark: An Overview
27m 35s
Understanding Spark: An Overview
2m 35s
Spark, Word Count, Operations, and Transformations
2m 14s
A Few Words on Fine Grained Transformations and Scalability
1m 36s
Word Count in "Not Big Data"
2m 13s
How Word Count Works, Featuring Coarse Grained Transformations
3m 56s
Parallelism by Partitioning Data
2m 45s
Pipelining: One of the Secrets of Spark's Performance
1m 38s
Narrow and Wide Transformations
3m 42s
Lazy Execution, Lineage, Directed Acyclic Graph (DAG), and Fault Tolerance
4m 5s
Time for the Big Picture: Spark Libraries
1m 31s
Takeaway
1m 15s
Getting Technical with Spark
45m 37s
Getting Technical: Spark Architecture
3m 3s
Storage in Spark and Supported Data Formats
3m 25s
Let's Talk APIs: Low Level and High Level Spark APIs
4m 31s
Performance Optimizations: Tungsten and Catalyst
3m 3s
SparkContext and SparkSession: Entry Points to Spark Apps
3m 44s
Spark Configuration + Client and Cluster Deployment Modes
5m 33s
Spark on Yarn: The Cluster Manager
2m 30s
Spark with Cloudera Manager and YARN UI
4m 13s
Visualizing Your Spark App: Web UI and History Server
7m 39s
Logging in with Spark and Cloudera
2m 23s
Navigating the Spark and Cloudera Documentation
4m 14s
Takeaway
1m 15s
Learning the Core of Spark: RDDs
42m 48s
Learning the Core of Spark: RDDs
2m 5s
SparkContext: The Entry Point to a Spark Application
3m 30s
RDD and PairRDD - Resilient Distributed Datasets
3m 42s
Creating RDDs with Parallelize
4m 4s
Returning Data to the Driver, i.e. collect(), take(), first()...
4m 10s
Partitions, Repartition, Coalesce, Saving as Text, and HUE
3m 24s
Creating RDDs from External Datasets
10m 9s
Saving Data as ObjectFile, NewAPIHadoopFile, SequenceFile, ...
5m 30s
Creating RDDs with Transformations
2m 49s
A Little Bit More on Lineage and Dependencies
1m 10s
Takeaway
2m 10s
Going Deeper into Spark Core
47m 59s
Going Deeper into Spark Core
0m 35s
Functional Programming: Anonymous Functions (Lambda) in Spark
1m 39s
A Quick Look at Map, FlatMap, Filter, and Sort
5m 16s
How Can I Tell It Is a Transformation
1m 13s
Why Do We Need Actions?
1m 17s
Partition Operations: MapPartitions and PartitionBy
6m 7s
Sampling Your Data
2m 17s
Set Operations: Join, Union, Full Right, Left Outer, and Cartesian
4m 50s
Combining, Aggregating, Reducing, and Grouping on PairRDDs
8m 31s
ReduceByKey vs. GroupByKey: Which One Is Better?
1m 10s
Grouping Data into Buckets with Histogram
2m 33s
Caching and Data Persistence
2m 8s
Shared Variables: Accumulators and Broadcast
4m 57s
What's Needed for Developing Self-contained Spark Applications
1m 57s
Disadvantages of RDDs - So What's Better?
1m 10s
Takeaway
2m 10s
Increasing Proficiency with Spark: DataFrames and Spark SQL
37m 28s
Increasing Proficiency with Spark: DataFrames & Spark SQL
0m 50s
"Everyone" Uses SQL and How It All Began
2m 34s
Hello DataFrames and Spark SQL
2m 57s
SparkSession: The Entry Point to the Spark SQL / DataFrame API
1m 51s
Creating DataFrames
2m 5s
DataFrames to RDDs and Vice Versa
3m 8s
Loading DataFrames: Text and CSV
2m 21s
Schemas: Inferred and Programatically Specified + Option
4m 44s
More Data Loading: Parquet and JSON
4m 4s
Rows, Columns, Expressions, and Operators
1m 33s
Working with Columns
2m 12s
More Columns, Expressions, Cloning, Renaming, Casting, & Dropping
4m 9s
User Defined Functions (UDFs) on Spark SQL
2m 53s
Takeaway
2m 1s
Continuing the Journey on DataFrames and Spark SQL
35m 50s
Querying, Sorting, and Filtering DataFrames: The DSL
5m 29s
What to Do with Missing or Corrupt Data
4m 19s
Saving DataFrames
5m 22s
Spark SQL: Querying Using Temporary Views
3m 34s
Loading Files and Views into DataFrames Using Spark SQL
1m 57s
Saving to Persistent Tables + Spark 2 Known Issue
1m 57s
Hive Support and External Databases
4m 56s
Aggregating, Grouping, and Joining
5m 18s
The Catalog API
1m 21s
Takeaway
1m 33s
Working with a Typed API: Datasets
19m 29s
Understanding a Typed API: Datasets
0m 47s
The Motivation Behind Datasets
5m 0s
What's a Dataset?
3m 17s
What Do You Need for Datasets?
1m 14s
Creating Datasets
3m 6s
Dataset Operations
2m 57s
RDDs vs. DataFrames vs. Datasets: A Few Final Thoughts
1m 27s
Takeaway
1m 38s
Final Takeaway and Continuing the Journey with Spark
11m 22s
Final Takeaway
6m 5s
Continuing the Journey with Spark, Scala, and Cloudera
5m 16s

پیشنهاد فرادرس

لینک های دانلود حجم فایل: 783.0MB Pluralsight Developing Spark Applications Using Scala and Cloudera_git.ir.rar