The total cost of a crawl task depend on several factors such as number of items to crawl or whether proxies are used.
In this example, the Google Search Engine is crawled with automated browsers. 100k keywords are scraped and the average execution time of a single keyword is 3.5 seconds. For each keyword, 100KB of data is generated and stored in compressed form in the cloud. The total costs of this crawling task are computed as follows:
100000 * 3.5 (items per second) * 0.000186704 (cpu & memory price per second) = 65.35$ 100000 * 0.0001 (storage space in GB) * 0.09 (storage costs) = 0.86$ 100000 * 6e-05 (fixed cost per storage operation) = 6.0$ --------------------------------------------------------------------------- Total Price = 65.35$ + 0.86$ + 6.0$ = 72.21$
The task is to crawl 1M urls in the internet. The crawling is done by plain http requests without using browsers. The average crawling time is 750ms per url and the average storage space is 500kb of data. The total costs are computed as follows:
1000000 * 0.75 (items per second) * 0.000186704 (cpu & memory price per second) = 140.03$ 1000000 * 0.0005 (storage space in GB) * 0.09 (storage costs) = 42.92$ 1000000 * 6e-05 (fixed cost per storage operation) = 60.0$ --------------------------------------------------------------------------- Total Price = 140.03$ + 42.92$ + 60.0$ = 242.95$
By default, our backend uses datacenter IP addresses for all crawling tasks. However, some websites and protection mechanisms require dedicated proxies in order to bypass them. The usage of proxies increases crawling costs twofold:
© All rights Reversed 2021- Scrapeulous.com - Built with love and German Engineering