Write "_SUCCESS" File when finish data exporting
We use MongoDB to store time-series data, and export the data via Data Federation incrementally on daily basis onto s3 as Parquet. The data is relative big, and duration to export data varies from day to day. It’s hard for downstream services to know when data exporting completes. Sometimes, the downstream service start reading the parquets while MongoDB is writing, which causes partial extraction. Normally, a big data job would create a flag file, such as _SUCCESS, to indicate that the job has finished writing the dataset. This file serves as a marker, indicating that all tasks associated with the job were finished successfully, and the data files in the directory are complete and consistent. Could you consider adding such feature?
const outStage = {
"$out": {
"s3": {
"bucket": ${aws-s3-bucket}
,
"filename": ${fileName}
,
"format": {
"name": "parquet",
"maxFileSize": "1GB",
"maxRowGroupSize": "128MB",
}
}
}
}
const coll = db.collection(collName);
await coll.aggregate([
matchStage,
outStage
], { background: true }).toArray();
console.log(Job Submitted
);