Introduction to Designing Data Lakes on AWS Week 4 Complete

Course Name: Introduction to Designing Data Lakes on AWS (AWS Cloud Solutions Architect)

Course Link: Introduction to Designing Data Lakes on AWS

These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Practice Quiz

Question 1
True or False: The Registry of Open Data on AWS exists to help people share and discover datasets that were made publicly available through AWS services.

True
False

Answer: True


Question 2
Which task is performed by an AWS Glue crawler?

Map data from one schema to another schema.
Store metadata in a catalog for indexing.
Populate the AWS Glue Data Catalog with tables.
Analyze all data in the data lake to create an Apache Hive metastore.

Answer: Populate the AWS Glue Data Catalog with tables.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 3
True or False: According to the AWS shared responsibility model, if a company uses a managed service—such as Amazon Simple Storage Service (Amazon S3)—AWS manages all the security mechanisms that are needed to encrypt and protect data.

True
False

Answer: False


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 4
What is a typical workflow for Amazon QuickSight?

Add a dataset. Create a new analysis. Add charts, tables, or insights. Resize and rearrange the charts, tables, or insights on one or more sheets. Publish the analysis as a dashboard. Share the dashboard with other people.
Create a new analysis. Add a dataset. Choose fields to create the first chart. Enhance the visualized data. Publish the analysis as a dashboard. Share the dashboard with other people.
Create a new analysis. Add a dataset. Use extended features to add variables, custom controls, or colors. Share a dashboard with other people. Publish the analysis.
Choose fields to create the first chart. Add a dataset. Add more charts, tables, or insights. Publish the analysis as a dashboard. Share the dashboard with other people.

Answer: Create a new analysis. Add a dataset. Choose fields to create the first chart. Enhance the visualized data. Publish the analysis as a dashboard. Share the dashboard with other people.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Quiz

Question 1
True or False: It is a best practice that companies treat the original, ingested version of data in their data lake as immutable. Any data processing that is done to the original data should be stored as a secondary copy or extra copy of the data, which will then be analyzed.

True
False

Answer: True


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 2
Which scenario represents AWS Glue jobs as the BEST tool for the job?

Analyze data in batches on schedule or on demand.
Transform data in real time as data comes into the data lake.
Analyze data in real time as data comes into the data lake.
Transform data on a schedule or on demand.

Answer: Transform data on a schedule or on demand.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 3
A company collects and analyzes large amounts of data daily. Why should the company use a compression strategy for their processes?

Compressed data uses a row-based data format that works well for data optimization.
By using compressed data, data-processing systems can optimize for memory and cost.
Compressed data slows the time to process and analyze information.
Compressed data increases the risk of losing valuable information.

Answer: By using compressed data, data-processing systems can optimize for memory and cost.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 4
A software developer recently uploaded data logs from their application to Amazon Simple Storage Service (Amazon S3). Who is responsible for encrypting both the data at rest in the S3 bucket and the data in transit to the S3 bucket, according to the AWS shared responsibility model?

AWS
Customer
Both AWS and the customer
Third-party security company

Answer: Customer


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 5
Which statement about data visualization is TRUE?

Raw data is generally formatted to be read and used by a human eye.
Visualization data is always captured in a text editor.
If there is more data, making sense of the data will be more difficult without using visualization tools.
A click map is the main reason to invest into data visualization.

Answer: If there is more data, making sense of the data will be more difficult without using visualization tools.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 6
What makes Amazon QuickSight different, compared to other traditional business intelligence (BI) tools?

The ability to create sharable dashboards
The ability to visualize data
Data encryption at every layer
Super-fast, Parallel, In-memory Calculation Engine (SPICE)

Answer: Super-fast, Parallel, In-memory Calculation Engine (SPICE)


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 7
What is the purpose of the Registry of Open Data on AWS?

Provide a service that people can use to transform public datasets that are published by data providers through an API.
Provide a service that people can use to ingest software as a service (SaaS) application data into a data lake.
Help people discover and share datasets that are available through AWS resources.
Help people discover and share datasets that are available outside of AWS resources.

Answer: Help people discover and share datasets that are available through AWS resources.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 8
True or False: Amazon QuickSight is a cloud-scale business intelligence (BI) service that developers can use to deliver interactive visualizations and dashboards for data analysis and forecasting.

True
False

Answer: True


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Final Assessment

Question 1
What does the AWS Glue Metadata Catalog service do?

The AWS Glue Metadata Catalog is a query service that uses standard Structured Query Language (SQL) to retrieve data.
The AWS Glue Metadata Catalog provides a data transformation service where a company can author and run scripts to transform data between data sources and targets.
The AWS Glue Metadata Catalog provides a repository where a company can store, find, and access metadata, and use that metadata to query and transform the data.
The AWS Glue Metadata Catalog provides a repository where a company can store and find metadata to keep track of user permissions to data in a data lake.

Answer: The AWS Glue Metadata Catalog provides a repository where a company can store, find, and access metadata, and use that metadata to query and transform the data.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 2
A solutions architect is working for a customer who wants to build a data lake on AWS to store different types of raw data. Which AWS service should the solutions architect recommend to the customer to meet their requirements?

AWS Glue Metadata Catalog
Amazon OpenSearch Service
Amazon EMR
Amazon Simple Storage Service (Amazon S3)

Answer: Amazon Simple Storage Service (Amazon S3)


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 3
Which statement BEST describes batch data ingestion?

Batch data ingestion is a serverless data integration service that makes it easier to discover, prepare, and combine data for analytics, machine learning, and application development.
Batch data ingestion is the process of capturing gigabytes (GB) of data per second from multiple sources, such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events.
Batch data ingestion is the process of collecting and transferring large amounts of data that have already been produced and stored on premises or in the cloud.
By using batch data ingestion, a user can create a unified metadata repository across various services on AWS.

Answer: Batch data ingestion is the process of collecting and transferring large amounts of data that have already been produced and stored on premises or in the cloud.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 4
Which service is commonly used for real-time data processing when Amazon Kinesis Data Streams is used for data ingestion?

AWS Glue job
Amazon EMR
Amazon Kinesis Data Analytics
Amazon Athena

Answer: Amazon Kinesis Data Analytics


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 5
Apache Hadoop is an open-source framework that is used to efficiently store and process large datasets. A solutions architect is working for a company that currently uses Apache Hadoop on-premises for data processing jobs. The company wants to use AWS for these jobs, but they also want to continue using the same technology. Which service should the solutions architect choose for this use case?

AWS Lambda
Amazon Kinesis Data Analytics
Amazon EMR
Amazon OpenSearch Service

Answer: Amazon EMR


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 6
A team of machine learning (ML) experts are working for a company. The company wants to use the data in their data lake to train an ML model that they create. The company wants the most control that they can have over this model and the environment that it is trained in. Which AWS ML approach should the team take?

Use a pretrained model from an AWS service, such as Amazon Rekognition.
Create an AWS Lambda function with the training logic in the handler, and run the training based on an event.
Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance and run Amazon SageMaker on it to train the model.
Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance by using an AWS Deep Learning Amazon Machine Image (AMI) to host the application that will train the model.

Answer: Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance by using an AWS Deep Learning Amazon Machine Image (AMI) to host the application that will train the model.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 7
What is the main value proposition of data lakes?

The ability to define the data schema before ingesting and storing data.
The ability to store user-generated data, such as data from antennas and sensors.
The ability to ingest and store data that could be the answer for future questions when they are processed with the correct data processing mechanisms.
The ability to combine multiple databases together to expand their capacity and availability.

Answer: The ability to ingest and store data that could be the answer for future questions when they are processed with the correct data processing mechanisms.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 8
Which statements about data lakes and data warehouses are true? (Choose TWO.)

Data lakes use schema-on-write architectures and data warehouses use schema-on-read architectures.
Data lakes offer more choices in terms of the technology that is used for processing data. In contrast, data warehouses are more restricted to using Structured Query Language (SQL) as the query technology.
The solutions architect can combine both data lakes and data warehouses to better extract insights and turn data into information.
The solutions architect cannot attach data visualization tools to data warehouses.
Data lakes are not future-proof, which means that they must be reconfigured each time new data is ingested.

Answer:
Data lakes offer more choices in terms of the technology that is used for processing data. In contrast, data warehouses are more restricted to using Structured Query Language (SQL) as the query technology.
The solutions architect can combine both data lakes and data warehouses to better extract insights and turn data into information.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 9
A company plans to explore data lakes and their components. What are reasons to invest in a data lake? (Choose TWO.)

Increase operational overhead
Offload capacity from databases and data warehouses
Make data available from integrated departments
Limit data movement
Lower transactional costs

Answer:
Offload capacity from databases and data warehouses
Lower transactional costs


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 10
Which term indicates that a data lake lacks curation, management, cataloging, lifecycle or retention policies, and metadata?

Data swamp
Data warehouse
Data catalog
Database

Answer:
Data swamp


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 11
Which statement about whether data lakes make it easier to follow the “right tool for the job” approach is TRUE?

No, data lakes do not make it easier to follow “the right tool for the job approach” because data lakes can only handle structured data.
Yes, data lakes make it easier to follow “the right tool for the job” approach because storage can be decoupled from processing and ingestion.
Yes, data lakes make it easier to follow “the right tool for the job” approach because data lakes can only handle structured data.
No, data lakes do not make it easier to follow “the right tool for the job approach” because you are tied to a specific AWS service.

Answer: Yes, data lakes make it easier to follow “the right tool for the job” approach because storage can be decoupled from processing and ingestion.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 12
Which scenario represents AWS Glue jobs as the BEST tool for the job?

Transform data in real time as data comes into the data lake.
Analyze data in batches on schedule or on demand.
Transform data on a schedule or on demand.
Analyze data in real time as data comes into the data lake.

Answer: Transform data on a schedule or on demand.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 13
Which task is performed by an AWS Glue crawler?

Map data from one schema to another schema.
Populate the AWS Glue Data Catalog with tables.
Analyze all data in the data lake to create an Apache Hive metastore.
Store metadata in a catalog for indexing.

Answer: Populate the AWS Glue Data Catalog with tables.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 14
A software developer recently uploaded data logs from their application to Amazon Simple Storage Service (Amazon S3). Who is responsible for encrypting both the data at rest in the S3 bucket and the data in transit to the S3 bucket, according to the AWS shared responsibility model?

AWS
Customer
Both AWS and the customer
Third-party security company

Answer: Customer


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 15
What makes Amazon QuickSight different, compared to other traditional business intelligence (BI) tools?

The ability to visualize data
The ability to create sharable dashboards
Super-fast, Parallel, In-memory Calculation Engine (SPICE)
Data encryption at every layer

Answer: Super-fast, Parallel, In-memory Calculation Engine (SPICE)


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 16
What is the purpose of the Registry of Open Data on AWS?

Help people discover and share datasets that are available through AWS resources.
Provide a service that people can use to transform public datasets that are published by data providers through an API.
Help people discover and share datasets that are available outside of AWS resources.
Provide a service that people can use to ingest software as a service (SaaS) application data into a data lake.

Answer: Help people discover and share datasets that are available through AWS resources.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 17
Which statements about data organization and categorization in data lakes are TRUE? (Choose TWO.)

Data lakes are not future-proof, which means that they must be reconfigured each time new data is ingested.
Data lakes need to be schema-on-write. In this case, users need to transform all the data before they load it into the data lake.
Amazon Simple Storage Service (Amazon S3) is mostly used for storage, and AWS Glue is mostly used for categorizing data.
When cataloging data, it is a best practice to organize the data according to the access pattern of the user who will access it.
Users must delete the original raw data to keep their data lake organized and cataloged.

Answer:
Amazon Simple Storage Service (Amazon S3) is mostly used for storage, and AWS Glue is mostly used for categorizing data.
When cataloging data, it is a best practice to organize the data according to the access pattern of the user who will access it.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 18
Which type of data has the HIGHEST probability of containing structured data?

Video files from mobile phone photo libraries
Raw data from marketing research surveys
Data that is sitting in a relational MySQL table
Customer reviews on products in retailer websites

Answer: Data that is sitting in a relational MySQL table


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 19
What is the most common way of categorizing data in terms of structure?

Structured data, unstructured data, and semi-structured data
Ready data, not-ready data, and semi-ready data
Development data, quality assurance (QA) data, and production data
The good data, the bad data, and the ugly data

Answer: Structured data, unstructured data, and semi-structured data


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


Question 20
Which statement about data consumption in Amazon Kinesis Data Streams is TRUE?

If data is not consumed within 15 minutes, Kinesis will delete the data that was added to the stream. This case is true even though the data-retention window is greater than 15 minutes.
Data consumers must use an AWS SDK to correctly fetch data from Kinesis in the same order that it was ingested. However, AWS Lambda functions do not need to fetch data from Kinesis in a specific order because Lambda integrates natively with AWS services, including Kinesis.
Data is automatically pushed to each consumer that is connected to Kinesis. Thus, consumers are notified that new data is available, even when they are not running the Kinesis SDK for data consumption.
If data is consumed by a consumer, that consumer can never get that same data again. This case is true even if the data is still in the stream, according to the data-retention window.

Answer: Data consumers must use an AWS SDK to correctly fetch data from Kinesis in the same order that it was ingested. However, AWS Lambda functions do not need to fetch data from Kinesis in a specific order because Lambda integrates natively with AWS services, including Kinesis.


These are Introduction to Designing Data Lakes on AWS Week 4 Complete


More Weeks of the course: Click Here

More Coursera courses: https://progiez.com/coursera

Introduction to Designing Data Lakes on AWS Week 4
The content uploaded on this website is for reference purposes only. Please do it yourself first.