Upcoming Webinar : Leveraging Web Data For Advanced Analytics

On 6th Dec, 11.00 AM to 12.00 PM ( EST) 4.00 PM to 5.00 PM ( GMT )

TechMobius

Data for the Hospitality Industry

Web Data Automation Case Study

Client

A leading hospitality data intelligence provider in Italy.

The Business Need

Our client aims to create a business directory of hotels and restaurants in Italy, along with the images. We were tasked to aggregate data related to the Hotels/restaurants listed on one of the prominent online travel information and booking websites and also aggregate the first eight images of the restaurant and resize them, applying basic filters to them.

Challenges

Maintaining ID mapping for previously delivered records is challenging due to minor changes in Hotel/Restaurant names or addresses, making it difficult to accurately link new data with existing entries. Additionally, identifying the correct menu images poses complexity due to varying formats and frequent updates, necessitating advanced algorithms for fuzzy matching and image recognition to ensure accurate data mapping and reliable menu image identification in the business directory.

Contact us for a solutions demo:

    Our Solution

    • Scrape Hotel/Restaurant data from websites: Web scraping tools and scripts to automatically extract hotel and restaurant data from online travel information and booking website. This data could include information such as the name, address, contact details, ratings, reviews, and other relevant details for each hotel and restaurant listed on the website.
    • Scrape & download the image URLs: We employed automation to download images associated with each hotel and restaurant listing. This can
      be done using web scraping or APIs to fetch images automatically and save them in the appropriate directories.
    •  Automated Image processing: We removed images with text for clarity and consistency. Some images were flipped horizontally for visual diversity. Random effects were added and simple scaling was done to ensure consistent dimensions. Images with identifiable persons were deleted for privacy and legal compliance. Further images were anonymized by renaming with random strings.
    •  Map the IOL category code to the primary category keywords: Our client has a predefined category structure for organizing and classifying businesses in their directory. In this step, the scraped hotel and restaurant data will be mapped automatically to the appropriate categories based on their internal category code system.
    •  Data Normalization & Validation: We developed automated scripts to normalize and validate the scraped data. Data formatting and data
      deduplication were done to provide consistent and accurate information.
    • Automated Delivery: Once the data is processed and validated, we set up an automated delivery of the final product in CSV or JSON format. This might involve automatically updating the database or content management system with the new listings and images.

    Highlights

    • Improved Data Accuracy: The data accuracy rate significantly increased to 95% after implementing data cleansing and standardization algorithms. This means that the majority of the listings now contain accurate and reliable information, enhancing the user experience and trust in the platform.
    •  Streamlined Operations and Efficiency: With web scraping and automation techniques in place, the data collection time is reduced by 80%.
      Now, only 6 minutes per listing is taken, resulting in a total time of 6000 minutes (100 hours) for data collection.

    Contact us for a solutions demo: