Price Monitoring Tool The Business Need A leading Australian marketplace organization wished to monitor…
Mobius developed a complete automated data aggregation system using Perl, Python, Azure data factory and Azure SQL. Source websites list page URLs will be crawled as first step using the bots developed in Perl/python on daily schedule. Extracted output will be converted into flat files and placed in the Azure blob. We build Azure data factory pipelines to load the flat files from the blob to Azure SQL database. Using store procedures in Azure SQL, input processing will be done and delta new input will be placed for product information collection.
Crawl bots will consume the product information collection input from the blob. Again product information extraction will happen and extracted output converted flat files will be placed to blob again in different path. Schedule pipelines would load those product information files to the database and proceed with standardization and post processing with the given set of rules. This approach is used for all the categories / sub categories and we have different pipelines / azure SQL based on the defined schema attributes.
End of the month, daily collected output will be validated and sent for quality audit against client defined percentage at attribute levels. Based on the internal and quality audit review comments, fix will be made to the data and finally delivered to the customer as flat files.
The solution provided a category wise data set delivery across the various e-commerce sources in UK region, thus improving operational efficiency in the client end. Increased data quality activities have enriched the data and can be readily consumed for analytics purpose at the client end.
Using scalable services in Azure infra and go with pay as you use
Azure SQL can be scaled up/down as required which helps the cost management
Data storage in blob can be archived and can be stored for any number of years as per client requirement. No separate archive mechanism needed in this case
Highly scalable model developed
Improved operational efficiency and ready to analytics data delivered.