AWS Glue

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.

Glue Jobs Resource Information:

Type of Resource: Generic monitorable resource
Discovery type: AWS SDK discovery type
Discovery profile selection: Resource can be discovered by “Glue Jobs” in the profiler
ResourceTag: AWS_GLUE_JOBS
Resource Unique Identification: JobName+"/"+region (job.name() + “/” + region)
Dependencies: Glue Jobs may depend on Glue databases, tables, and connections for ETL operations

Glue ML Transform Resource Information:

Type of Resource: Generic non-monitorable resource
Discovery type: AWS SDK discovery type
Discovery profile selection: Resource can be discovered by “Glue ML Transform” in the profiler
ResourceTag: AWS_GLUE_ML_TRANSFORM
Resource Unique Identification: TransformId (ml.transformId())
Dependencies: Glue ML Transform may depend on Glue databases and tables for machine learning operations

Glue Crawlers Resource Information:

Type of Resource: Generic non-monitorable resource
Discovery type: AWS SDK discovery type
Discovery profile selection: Resource can be discovered by “Glue Crawlers” in the profiler
ResourceTag: AWS_GLUE_CRAWLERS
Resource Unique Identification: crawlerName+"/"+region (crawler.name() + “/” + region)
Dependencies: Glue Crawlers may depend on data stores like S3, databases, and Glue databases

Glue Tables Resource Information:

Type of Resource: Generic non-monitorable resource
Discovery type: AWS SDK discovery type
Discovery profile selection: Resource can be discovered by “Glue Tables” in the profiler
ResourceTag: AWS_GLUE_TABLES
Resource Unique Identification: tableDBName+"/"+tableName+"/"+region (table.databaseName() + “/” + table.name() + “/” + region)
Dependencies: Glue Tables run on Glue databases and may depend on data sources

Glue Databases Resource Information:

Type of Resource: Generic non-monitorable resource
Discovery type: AWS SDK discovery type
Discovery profile selection: Resource can be discovered by “Glue Databases” in the profiler
ResourceTag: AWS_GLUE_DB
Resource Unique Identification: DBName+"/"+region (db.name() + “/” + region)
Dependencies: Glue Databases may contain tables and be accessed by Glue crawlers and jobs

Glue Dev Endpoints Resource Information:

Type of Resource: Generic non-monitorable resource
Discovery type: AWS SDK discovery type
Discovery profile selection: Resource can be discovered by “Glue Dev Endpoints” in the profiler
ResourceTag: AWS_GLUE_DEV_ENDPOINTS
Resource Unique Identification: endpointName+"/"+region (endpoint.endpointName() + “/” + region)
Dependencies: Glue Dev Endpoints may depend on VPC configurations and security groups

AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python or Scala code, and a flexible scheduler that handles dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there is no infrastructure to set up or manage.

AWS Glue is designed to work with semi-structured data. It introduces a component called a dynamic frame, which you can use in your ETL scripts. A dynamic frame is similar to an Apache Spark dataframe, except that each record is self-describing, so no schema is required initially. With dynamic frames, you get schema flexibility and a set of advanced transformations specifically designed for dynamic frames.

You can convert between dynamic frames and Spark dataframes to take advantage of both AWS Glue and Spark transformations to do the kinds of analysis that you want.

External reference

What Is AWS Glue?

Setup

To set up the AWS integration and discover the AWS service, go to AWS Integration Discovery Profile and select GLUE. AWS Glue databases, tables, crawlers, jobs, DevEndpoints, and MLTransforms are discovered.

Event support

CloudTrail event support

Supported
Configurable in OpsRamp AWS Integration Discovery Profile.

CloudWatch alarm support

Supported
Configurable in OpsRamp AWS Integration Discovery Profile.

Supported metrics

OpsRamp Metric	Metric Display Name	Unit	Aggregation Type
aws_glue_glue_jvm_heap_usage Number of memory bytes used by the JVM heap for the driver, the executor identified by executorId, or ALL executors.	glue jvm heap usage	None	Average
aws_glue_glue_jvm_heap_used Number of memory bytes used by the JVM heap for the driver, the executor identified by executorId, or ALL executors.	glue jvm heap used	None	Average
aws_glue_glue_s3_filesystem_read_bytes Number of bytes read from Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes read during the previous minute).	glue s3 file system read bytes	Count	Average
aws_glue_glue_s3_filesystem_write_bytes Number of bytes written to Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes written during the previous minute).	glue s3 filesystem write bytes	Count	Average
aws_glue_glue_system_cpuSystemLoad The fraction of CPU system load used (scale: 0-1) by the driver, an executor identified by executorId, or ALL executors.	glue system cpu System Load	None	Average