Study Guide for AWS Big Data Speciality Certification
This is a knowledge base of all the things I used to study for the Big data cert.
- Blueprint Overview
- Courses
- Books
- White Papers
- Blog Posts
- AWS Services and Tools
Domain 1: Collection 17%
Domain 2: Storage 17%
Domain 3: Processing 17%
Domain 4: Analysis 17%
Domain 5: Visualization 12%
Domain 6: Data Security 20%
-
1.1 Determine the operational characteristics of the collection system
-
1.2 Select a collection system that handles the frequency of data change and type of data being ingested
-
1.3 Identify the properties that need to be enforced by the collection system: order, data structure, metadata, etc.
-
1.4 Explain the durability and availability characteristics for the collection approach
-
2.1 Determine and optimize the operational characteristics of the storage solution
-
2.2 Determine data access and retrieval patterns
-
2.3 Evaluate mechanisms for capture, update, and retrieval of catalog entries
-
2.4 Determine appropriate data structure and storage format
-
3.1 Identify the appropriate data processing technology for a given scenario
-
3.2 Determine how to design and architect the data processing solution
-
3.3 Determine the operational characteristics of the solution implemented
-
4.1 Determine the tools and techniques required for analysis
-
4.2 Determine how to design and architect the analytical solution
-
4.3 Determine and optimize the operational characteristics of the Analysis
-
5.1 Determine the appropriate techniques for delivering the results/output
-
5.2 Determine how to design and create the Visualization platform
-
5.3 Determine and optimize the operational characteristics of the Visualization system
-
6.1 Determine encryption requirements and/or implementation technologies
-
6.2 Choose the appropriate technology to enforce data governance
-
6.3 Identify how to ensure data integrity
Big Data Analytics with Hadoop 3 by Sridhar Alla
Implementing AWS: Design, Build, and Manage your Infrastructure by Yohan Wadia; Lucas Chan; Udita Gupta; Rowan Udell
Learning Big Data with Amazon Elastic MapReduce by Vijay Rayapati; Amarkant Singh
- Big Data Analytics Options on AWS
- Building Big Data Storage Solutions Data Lakes for Maximum Flexibility
- Comparing the Use of Amazon DynamoDB and Apache HBase for NoSQL (November 2018)
- Migrating to Apache HBase on Amazon S3 on Amazon EMR
- Best Practices for Amazon EMR
- Querying Amazon Kinesis Streams directly with SQL and Spark Streaming (Sort of obsolete since queries can be run using Kinesis Analytics instead of Hive)
- Optimize Spark Streaming to efficiently process Amazon Kinesis Streams
- Analyze Real Time data from Amazon Kinesis Streams using Zeppelin and Spark Streaming
- Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning (Fantastic)
- Using Spark SQL for ETL
- Implementing Efficient and Reliable Producers with the Amazon Kinesis Producer Library
- Amazon Kinesis Firehose Data Transformation with AWS Lambda
- Secure Amazon EMR with Encryption
- Building a Near Real-Time Discovery Platform with AWS
- Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift
- Scaling Writes on Amazon DynamoDB Tables with Global Secondary Indexes
- Combine NoSQL and Massively Parallel Analytics Using Apache HBase and Apache Hive on Amazon EMR
- Choosing the Right DynamoDB Partition Key
- Strategies for Reducing Your Amazon EMR Costs
- Best practices for resizing and automatic scaling in Amazon EMR
- Best Practices for securing Amazon EMR
AWS Labs Big Data Blog Code samples
Collection
- Kinesis Streams
- Kinesis Firehose
- IoT
- SQS
- Data Pipeline
- Lambda
Storage
- Glacier
- DynamoDB
- DynamoDB Streams
Processing
- EMR
- Hadoop on EMR
- Hive on EMR
- Hbase on EMR
- Spark on EMR
Analysis
- Redshift
- Machine Learning
- ElasticSearch
- Athena
Visualizations
- QuickSight
Security
- EMR Security
- Redshift Security