An Interest In:
Web News this Week
- April 20, 2024
- April 19, 2024
- April 18, 2024
- April 17, 2024
- April 16, 2024
- April 15, 2024
- April 14, 2024
Generating AppFlow's flows using Cloud Formation's templates
Salesforce is a great tool for managing, keeping in touch, and monitoring our members, our researchers are using its data to create models such as Logistic Regression or Neural Network and verify existing models.
To query the data in SalesForce, you can either use SOQL (which is less suitable for research uses) or fetch the data to your storage and use more suitable tools such as Athena to query the data.
We at Assured Allies use AWS AppFlow, AWS AppFlow is a fully-managed integration service that enables you to securely exchange data between SaaS applications such as SalesForce and AWS services, such as S3 and Redshift.
In AssuredAllies we utilize AppFlow to fetch SF Objects and store them as Parquet files on S3, later our ETL\LTE processes will clean, transform and enrich the files and we will be able to serve them to the researchers for their use.
I'm not going to dive into ETLs or Parquet in this post for lack of time but I would love to touch on the first link in this chain - fetching multiple SF objects using AppFlow.
Creating one scheduled flow manually using AppFlow's console isn't very hard and it'll fetch one object from SF, but if you fully utilize SF as we do, then you'll need to create many flows (in our case ~60 objects * 2 environments), so we looked for a better, automatic way to create the flows.
There are many examples of how to programmatically create flows but most of them are either:
- "on-demand" flows and not scheduled.
- Rely on boto3.
- Force you to know all the fields of the object in order to fetch them.
In this post, I will give snippets of how to generate a general Cloud Formation template that anyone can use to create scheduled flows.
You'll be able to find the template and related code in AA's public GitLab repo.
AWSTemplateFormatVersion: '2010-09-09'Transform: AWS::Serverless-2016-10-31Description: Dumping SF object to S3Parameters: ObjectName: Type: String ScheduleStartTime: Type: String S3Bucket: Type: String S3Prefix: Type: String Connector: Type: String
The template receives 5 params:
S3Bucket
and S3Prefix
are the Bucket name and Prefix to store the results.
ObjectName
is the SalesForce Object we want to fetch.
ScheduleStartTime
is the scientific notation of the unixtime for the first occurrence of the flow.
i.e. for: 2022-04-11 00:00:00+00:00
ScheduleStartTime will be 1.64962440E9
, the repo has a small python script for calculating ScheduleStartTime.
Connector
is the name of the connector we will use to connect to SF, the easiest way to get the connector is to manually create a connector using AppFlow.
Resources: GenericFlow: Type: AWS::AppFlow::Flow Properties: Description: Fn::Join: - '' - - 'App Flow for ' - Ref: ObjectName - ' object' DestinationFlowConfigList: - ConnectorType: S3 DestinationConnectorProperties: S3: BucketName: Ref S3Bucket BucketPrefix: Ref S3Prefix S3OutputFormatConfig: AggregationConfig: AggregationType: None FileType: PARQUET PrefixConfig: PrefixType: PATH_AND_FILENAME PrefixFormat: DAY
Please note the Parquet file will be saved with the S3Prefix/year/month/day as prefix
FlowName: Ref: ObjectName SourceFlowConfig: ConnectorProfileName: Ref Connector ConnectorType: Salesforce SourceConnectorProperties: Salesforce: EnableDynamicFieldUpdate: true
EnableDynamicFieldUpdate
checks every time if the SF object's fields changed and updates the flow accordingly.
IncludeDeletedRecords: false Object: Ref: ObjectName Tasks:
Map_all
and the empty EXCLUDE_SOURCE_FIELDS_LIST
array are where the magic really is, without these two you'd need to map all the fields from the object one by one! and then if you add a new field to the object you'll need to change the template and redeploy it, Map_all
saves us the trouble.
- TaskType: Map_all SourceFields: [] TaskProperties: - Key: EXCLUDE_SOURCE_FIELDS_LIST Value: '[]' ConnectorOperator: Salesforce: NO_OP TriggerConfig: TriggerType: Scheduled TriggerProperties: DataPullMode: Complete ScheduleExpression: rate(1days) ScheduleOffset: 0 ScheduleStartTime: Ref: ScheduleStartTime
ScheduleExpression
states what is the reoccurrence rate of the flow (in my example it's a daily reoccurrence).
The template can be deployed using sam deploy
and it needs to be activated once (using either boto3, AWS cli or rest request).
You should at least create one flow manually to understand the different configurations, after your manual flow is set you can copy the configuration using multiple tools such as boto3, AWS cli or even AWS explorer in your favorite IDE.
Sign off, and links to AA
Jobs: https://www.assuredallies.com/careers/
Original Link: https://dev.to/manicqin/generating-appflows-flows-using-cloud-formations-templates-2bcm
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To