Using AWS Services to Perform Data Analysis With SAP SuccessFactors
A successful experience has been enjoyed by an organisation in performing unique data analysis with SAP SuccessFactors, using AWS services.
SuccessFactors is a suite of applications from SAP, a renowned player in enterprise and business applications. SuccessFactors mainly covers Human Resource core services, Payroll services, Recruiting Management, Performance Management, Workforce Planning, Learning and Development Management and other related Human Capital Management functions.
While SAP offers data analytics, the organisation does have unique requirements in what they wanted to achieve in data analytics which led them to look into other possibilities.
Eventually, following setup helped to deliver the desired outcomes.
Extracting Data from SAP SuccessFactors Using AWS Glue
AWS Glue is a fully managed, serverless data ETL (Extract, Transform, Load) service with efficient operations. Its high performance enablement is suitable for extracting data from Successfactors and there are available connectors for AWS Glue.
A designated IAM Role is needed for the AWS Glue job. The role enables access to all resources used by the Glue job, including Amazon S3 for any sources, targets, scripts, temporary directories, and AWS Glue Data Catalog objects. The role also grants access to the Glue Connector for SAP SuccessFactors.
As a quick summary, following policies are associated with the IAM Role:
- AWSGlueServiceRole (accessing Glue Studio and Glue Jobs)
- AmazonEC2ContainerRegistryReadOnly (accessing the Glue Connector for SAP SuccessFactors)
- AmazonS3FullAccess (reading and writing to Amazon S3)
- SecretsManagerReadWrite (For accessing AWS Secrets Manager)
Authenticating with SAP SuccessFactors is done through OAuth Authentication. Following are specified:
- Url: the URL of the server hosting Success Factors.
- User: the username of the account.
- CompanyId: the unique identifier of the company.
- OAuthClientId: the API Key that was generated in API Center.
- OAuthClientSecret: the X.509 private key used to sign SAML assertion. This is obtained in the certificate that was downloaded in Registering your OAuth Client Application.
- InitiateOAuth: set this to GETANDREFRESH.
Leveraging AWS Secrets Manager for access and parameter security
- In the AWS Secrets Manager console choose store a new secret.
- Choose Other type of secret. This option means you must supply the structure and details of your secret.
- Add all required properties to connect to SAP SuccessFactors as well as any additional private credential key-value pairs required by the Glue Connector
- The AWS Glue ETL job and the secrets need to be in the same region
After activating the Glue Connector and completing the configuration, a Glue job can be built
The Script tab can be used to review the script being created by Glue Studio. Following is a sample script (Sample only):
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [connection_type = "marketplace.jdbc", connection_options = {"dbTable":"ExtAddressInfo","connectionName":"cdata-sapsuccessfactors"}, transformation_ctx = "DataSource0"]
## @return: DataSource0
## @inputs: []
DataSource0 = glueContext.create_dynamic_frame.from_options(connection_type = "marketplace.jdbc", connection_options = {"dbTable":"ExtAddressInfo","connectionName":"cdata-sapsuccessfactors"}, transformation_ctx = "DataSource0")
## @type: DataSink
## @args: [connection_type = "s3", format = "json", connection_options = {"path": "s3://PATH/TO/BUCKET/", "partitionKeys": []}, transformation_ctx = "DataSink0"]
## @return: DataSink0
## @inputs: [frame = DataSource0]
DataSink0 = glueContext.write_dynamic_frame.from_options(frame = DataSource0, connection_type = "s3", format = "json", connection_options = {"path": "s3://PATH/TO/BUCKET/", "partitionKeys": []}, transformation_ctx = "DataSink0")
job.commit()
The ETL-ed data is stored in a S3 bucket (as a data lake) for data analytics which further leads to:
- Reporting
- Backup and Disaster Recovery
- Observability Enhancement
This setup also provides future potentials on Machine Learning and Artificial Intelligence, as the large volumes of data is readily there in S3 for utilising the comprehensive AI services in which AWS is a leader of.
Simon Wang
Comments
Post a Comment