این دوره نحوه کار با Storm ، HBase و Hive را بوسیله Microsoft HDInsight آموزش می دهد. در این دوره با معماری HDInsight ، نحوه ذخیره سازی داده ها در HBase ، نحوه اسکریپت نویسی HBase و نحوه کار با HFiles و Regions آشنا می شوید. در ادامه نحوه کار با کلاسترهای HBase ، ساختار داده های HBase در Azure ، کار با Hive و YARN و … را می آموزید.

این دوره آموزشی محصول موسسه Pluralsight است.

سرفصل های دوره:

  • کار با معماری HDInsight
  • کار با ماژول های برنامه
  • نحوه ذخیره سازی داده ها در HBase
  • طراحی جدول HBase
  • نحوه مدلسازی روابط
  • کار با HBase در Azure
  • نحوه اسکریپت نویسی HBase
  • کار با Storm
  • نحوه ذخیره سازی داده ها
  • کار با کلاسترهای HBase
  • کار با HFiles و Regions
  • ساختار داده های HBase در Azure
  • کار با دات نت
  • نحوه ادغام
  • بارگذاری منایب داده ها در Stargate
  • پردازش وقایع زمانی با Storm
  • معماری برنامه ی کاربردی در Storm
  • نحوه زمان بندی پردازش ها
  • نحوه طراحی توپولوژی ها
  • نحوه ساخت توپولوژی
  • نحوه تست عملکرد برنامه
  • نحوه پیاده سازی درخت
  • نحوه یکپارچه سازی برنامه
  • معرفی HiveQL
  • نقشه برداری جداول HBase در Hive
  • پیوستن جداول HBase و CSV در HiveQL
  • کار با نمایش ها و توابع
  • کار با Hive و YARN
  • کار با طرح های اجرایی HiveQL
  • کار با PowerShell
  • نحوه نوشتن Hive UDFs در C#
  • نحوه اشکال زدایی برنامه
  • و…

عنوان دوره: Pluralsight HDInsight Deep Dive: Storm, HBase, and Hive
سطح: متوسط
مدت زمان: 4 ساعت و 13 دقیقه
نویسنده: Elton Stoneman


Pluralsight HDInsight Deep Dive: Storm, HBase, and Hive

Elton Stoneman
4h 13m

HDInsight is Microsoft's managed Big Data stack in the cloud. With Azure you can provision clusters running Storm, HBase, and Hive which can process thousands of events per second, store petabytes of data, and give you a SQL-like interface to query it all. In this course, we'll build out a full solution using the stack and take a deep dive into each of the technologies.
Storm is a distributed compute platform which you can plug into Azure Event Hubs and use to power event stream processing. You can scale Storm to read tens of thousands of events per second and build a reliable workflow so that every event is guaranteed to be processed. HBase is a No-SQL database which is easy to get started with and can store tables with billions of rows and millions of columns. It's for real-time data access and it has a REST interface so you can read and write HBase data from a .NET Storm app. Hive is a data warehouse that provides a SQL-like interface over Big Data - HBase tables, and other sources. With Hive you can join across multiple sources and run queries from PowerShell and .NET. In this course, we use all three technologies running on Microsoft Azure to build a race timing solution and dive into performance tuning, reliability, and administration.

Architecting a Solution with HDInsight
12m 43s
Module Introduction
2m 10s
Big Data and the Three (or Four) Vs
2m 36s
Where Big Data Technologies Fit
2m 36s
About the Demo Solution
2m 27s
Course Overview
2m 52s
Storing Race Data in HBase
40m 42s
Module Introduction
2m 2s
HBase Table Design
4m 5s
Modeling Relationships
3m 2s
Row Key Design
2m 55s
Column Families
3m 48s
HBase on Azure
2m 42s
HBase Shell Scripts
3m 7s
The Stargate REST API
4m 14s
.NET Clients for Stargate
3m 10s
Storing Timing Events
4m 6s
Storing Sector Times
3m 47s
Module Summary
3m 40s
HBase Deep Dive
43m 21s
Module Introduction
1m 16s
HBase Cluster Nodes
3m 51s
HFiles and Regions
3m 46s
HBase Data Structure in Azure
3m 56s
Meta Tables and Region Splits
3m 23s
Splitting and Pre-splitting Regions
3m 49s
.NET and HBase Best Practices
3m 21s
Integration Testing with Docker
3m 19s
Load Balancing Stargate
3m 36s
Performance Analysis
3m 48s
Scaling and Compaction
5m 21s
Module Summary
3m 50s
Processing Timing Events with Storm
38m 43s
Module Introduction
2m 3s
Storm Application Architecture
2m 34s
Processing Race Timing with Storm
3m 10s
Saving Events and Defining Bolt Schemas
4m 28s
Local Memory Caches in Bolts
4m 6s
Buffering Writes in Bolts
3m 8s
Flushing Buffers with the Tick Stream
3m 7s
Designing the Topology
3m 39s
Building the Topology
5m 5s
Deploying to HDInsight
2m 48s
Running Race Simulations
2m 16s
Module Summary
2m 13s
Storm Deep Dive
42m 14s
Module Introduction
2m 7s
Storm Cluster Architecture
3m 16s
Runtime Compute Components
3m 45s
Approaches to Performance Testing
4m 14s
Performance Tuning Storm
5m 59s
Scaling the Storm Cluster
2m 58s
Guaranteed Messaging
2m 46s
Implementing Tuple Trees
4m 31s
Logging and Monitoring
2m 31s
Custom Component Logging
2m 38s
Unit & Integration Testing
4m 11s
Module Summary
3m 12s
Querying Race Data with Hive
38m 11s
Module Introduction
1m 34s
HiveQL, the Hive Query Language
2m 39s
Mapping HBase Tables in Hive
2m 26s
Hive Data Types and HBase Column Families
3m 41s
Querying Race Results
3m 53s
Mapping Flat Files in Hive
3m 32s
Joining HBase Tables and CSV Files in HiveQL
2m 50s
Writing Data to Azure from Hive
2m 59s
Recalculating Race Results
2m 54s
Hive Views and Functions
2m 30s
Collections, Joins, and Ranking
3m 16s
Hive and Big Data
3m 27s
Module Summary
2m 24s
Hive Deep Dive
37m 46s
Module Introduction
1m 48s
Hive and YARN
2m 53s
Parallelism for Hive Queries
3m 0s
HiveQL Execution Plans
2m 43s
Filtering HBase Tables
2m 42s
Parameterising Hive Queries with PowerShell
3m 3s
Running Parallel Hive Queries with PowerShell
4m 5s
The Hive ODBC Connector
3m 40s
Connecting to Hive from .NET Apps
3m 36s
Writing Hive UDFs in C#
3m 50s
Customizing the HBase Cluster for Hive
3m 56s
Course Summary
2m 26s