عیب یابی Apache Spark
در این آموزش تصویری با تکنیک ها و راه حل های سریع برای عیب یابی Apache Spark آشنا می شوید.
رویکرد این دوره به صورت پرسش و پاسخ و ارائه راه حل های ساده برای حل مشکلات رایجی که توسعه دهندگان Apache Spark با آن ها مواجه می شوند است.
تمامی فایل ها و تمرین های مربوط به این دوره در این صفحه از مخزن گیت هاب در دسترس هستند.
Quick, simple solutions to common development issues and Debugging techniques with Apache Spark.
Apache Spark has been around quite some time, but do you really know how to solve the development issues and problems you face with it? This course will give you new possibilities and you'll cover many aspects of Apache Spark; some you may know and some you probably never knew existed. If you take a lot of time learning and performing tasks on Spark, you are unable to leverage Apache Spark's full capabilities and features, and face a roadblock in your development journey. You'll face issues and will be unable to optimize your development process due to common problems and bugs; you'll be looking for techniques which can save you from falling into any pitfalls and common errors during development. With this course you'll learn to implement some practical and proven techniques to improve particular aspects of Apache Spark with proper research
You need to understand the common problems and issues Spark developers face, collate them, and build simple solutions for these problems. One way to understand common issues is to look out for Stack Overflow queries. This course is a high-quality troubleshooting course, highlighting issues faced by developers in different stages of their application development and providing them with simple and practical solutions to these issues. It supplies solutions to some problems and challenges faced by developers; however, this course also focuses on discovering new possibilities with Apache Spark. By the end of this course, you will have solved your Spark problems without any hassle.
All the code and supporting files for this course are available on Github at https://github.com/PacktPublishing/Troubleshooting-Apache-Spark
Style and Approach
This course takes a question-and-answer approach, identifying key problems faced by Apache Spark developers and providing straightforward solutions.
Released: Wednesday, November 28, 2018
Common Problems and Troubleshooting the Spark Distributed Engine
The Course Overview
Eager Computations: Lazy Evaluation
Caching Values: In-Memory Persistence
Unexpected API Behavior: Picking the Proper RDD API
Wide Dependencies: Using Narrow Dependencies
Distributed DataFrames Optimization Pitfalls
Making Computations Parallel: Using Partitions
Defining Robust Custom Functions: Understanding User-Defined Functions
Logical Plans Hiding the Truth: Examining the Physical Plans
Slow Interpreted Lambdas: Code Generation Spark Optimization
Distributed Joins in Cluster
Avoid Wrong Join Strategies: Using a Join Type Based on Data Volume
Slow Joins: Choosing an Execution Plan for Join
Distributed Joins Problem: DataFrame API
TypeSafe Joins Problem: The Newest DataSet API
Solving Problems with Non-Efficient Transformations
Minimizing Object Creation: Reusing Existing Objects
Iterating Transformations – The mapPartitions() Method
Slow Spark Application Start: Reducing Setup Overhead
Performing Unnecessary Recomputation: Reusing RDDs
Troubleshooting Real-Time Processing Jobs in Spark Streaming
Repeating the Same Code in Stream Pipeline: Using Sources and Sinks
Long Latency of Jobs: Understanding Batch Internals
Fault Tolerance: Using Data Checkpointing
Maintaining Batch and Streaming: Using Structured Streaming Pros