Upcoming Webinar : Leveraging Web Data For Advanced Analytics

On 6th Dec, 11.00 AM to 12.00 PM ( EST) 4.00 PM to 5.00 PM ( GMT )

TechMobius

Data for the Travel Industry

Web Data Automation Case Study

Client

A large independent producer of official statistics in the UK.

The Business Need

The primary objective of our project was to aggregate data from various travel agencies and deliver this data for the holiday package category, covering subcategories like cruises, city breaks, and foreign holidays. We successfully delivered a dataset of approximately 3.6 million entries for this category. Additionally, for the airfares category, which includes subcategories such as domestic flights, European flights, and long-haul flights, we provided a dataset of 9.9 million entries.

Challenges

 Obtaining reliable and up-to-date data from retailers can be challenging due to variations in website structures and data availability. Additionally, maintaining the scraping process to adapt to changes in retailer websites, data structure, or access restrictions can be an ongoing challenge.

Contact us for a solutions demo:

    Our Solution

    Certain key components developed formed a comprehensive and integrated solution to meet our client’s needs.
    Rationalized Data Scraping: We implemented a centralized data aggregation framework that integrates data from multiple sources which are
    the input websites obtained from the client. Site analysis is further processed on these sites to check for feasibility.
    Automated Crawler Development: Developed scripts in perl, python and selenium based approach to pull data from multitude of websites which handles 60+ attributes in different categories each. An end-to-end fully automated workflow has been enabled. This handled millions of data on a
    daily basis by means of normalized database structure.
    Efficient Data Automation: Enabled automatic notifications and reporting to ensure that the crawl and following procedures are proceeding as intended. To solve the coverage and fill rate difficulties, we implemented automatic validations. We have also automated the creation of JSON manifest files that will provide output file metadata such as categories, file name, size, md5 value, and S3 location for each monthly delivery.
    Diligent Quality Checks and Quality Assessments: To ensure data correctness and dependability, comprehensive quality checks and validation processes were implemented. Full output validation was done before the output was provided to the client’s S3 location. Our accuracy and coverage percentage was above 95%.

    Highlights

    • Enhanced Decision-making: The solution we implemented empowered our client to make data-driven decisions based on up-to-date insights, enabling them to identify market trends and growth opportunities.
    • Increased Efficiency: Our client experienced a 15% growth in the output volume, improving their operational efficiency and allowing for more
      streamlined data management.

    Contact us for a solutions demo: