Optimize eCommerce Bots With Proxy Manipulations

Guide to optimize E-commerce bots with simple proxy manipulations
12 Min
Advanced
30-Aug-2018
Play Video about Data Collection Infrastructure for The Internet - Bright Data - graphic showing proxy tubes for information flow

Guide to optimize eCommerce bots with simple proxy manipulations

Agenda

  • Proxy optimization for E-commerce product page crawling
  • Reduce bandwidth costs
  • Success rate and logs viewer
  • Optimize success rate with proxy rules
  • Working with API: add new proxy and view links redirects
  • Setting multi-sessions browser with Proxy Manager
  • How to find troubleshooting information and help

Don’t want to watch the webinar, read it

The Proxy Manager is a way to control the communication between your bot, super proxies and peer IP’s.

Proxy Manager is open source and holds features for crawling while optimizing your success rate. I will discuss a few of these features in this webinar.

How does it work?
When your bot or browser sends the proxy network a request, the proxy sets the optimal configuration of the super proxy and peer IP to process this request going to the destination website and returning to you.

Proxy Manager can be downloaded from Bright Data dashboard by clicking the Proxy Manager tab on the left.
It is available on Windows, Mac, Linux and as Docker image.
After installation, sign in with your Bright Data account username and password.
Once you are logged in, go to ‘General setting’ on the left menu bar to enable your logs.
Click ‘Enable logs for’ and here I will choose 1,000, this will keep 1,000 logs per each proxy port.
In order to see the logs of HTTPS requests, go to the Overview tab and download the SSL certificate ‘here’ and follow the instructions, now your Proxy Manager is good to go.

In the Proxy Manager, dashboard creates ‘New proxy’ and select your residential Zone.
When scraping data from product pages of shopping websites select in the Preset configuration, Online shopping.
This will set up an optimal proxy configuration for content curation from product pages.

This Setup includes DNS resolve remotely by the peer, changing the user-agent for each request, post-processing rule examples and enables SSL to see request log details.
This setup usually achieves 98%-100% success rate for product pages, click Save and you’ll find the new proxy port at the end of the list.

Let’s review the ‘online shopping’ port setup I just created, selecting the proxy port number I just created.
First I’ll rename it by going to ‘general tab’, at in the Internal name field I’ll type in AMZ kids.

Under the targeting tab, I will choose the specific city I want to use, and in this example, I will choose Orlando, Florida.
Accessing a particular product page from the same city will ensure getting correct information and accessing your ecommerce accounts from the same city will also ensure a higher success rate.

Under the Rule tab, the ‘online shopping’ preset creates a rule that will get the product pages title, price, and description list.
The rule was set up to be triggered by URL, for this, I will choose the target domain, under action type I will choose Process data.

Under ‘processing rule’ there is an example of our 3 needed fields of the product page, this fields will be sent to you in JSON format if you are using API.

You can update these rules and settings at any time.

By selecting ‘proxy tester’, on the left menu bar I can test my new set-up.
The proxy test will send a request via the selected port, in our case the port named AMZ kids’ and I’ll test this with Bright Data’s demo product page, by selecting Go.
In the header tab, I will find the response header and in the Response tab, I can find the responses with the predefined fields of title, price, and description.

Now I’ll do the same with another page, I’ll copy the product page URL and try it out.
As you can see in the header we have gotten a successful response and in the responds tab, I can find the predefined fields.

Let’s go back to our predefined port named AMZ kids proxy port and in the ‘Logs’ tab, I can see in the transaction log of my requests have succeeded, selecting one of them will show me the request details.
In the case that I’m getting 403 error code, I’ll set up the proxy manager to refresh the IP.
This can be done by going to the Rules tab, and choosing to add New rule.

For this example, I will select the status code 403, but this can be done for any of other status codes, and for an ‘Action type’ I’ll select ‘Refresh IP’.
This will refresh my IP every time I get blocked.
If I want to be notified every time I get blocked and an IP is refreshed, I can select Yes at Send email and at Email address select who gets the email.

Next I will show you how to reduce the amount of data used for each request and the most efficient way to do this is by removing pictures and videos.
We will do this by adding New rule, and selecting Rule type as URL.
In Regex field I’ll select the file formats I want to remove from my responses.

Under ‘Action type’ I will select Null response.
For those want to work with the Proxy Manager via API you can find this on the left menu bar by selecting API.

In Swagger, you can find how to create, update or delete a new proxy port.
In the field, you just need to specify the proxy port number such as 24028 and the desired setup.
You can also refresh sessions over the API and use the link tester to get all the redirects.

There are a few modes of working with the Insomniac browser and Proxy Manager.
You can choose to have all browser tabs to work with the same proxy port or set up each tab to work with a different port and setup.
This can be done by adding to Insomniac the plugin insomniac Proxy Per Tab.
Open the plugin and choose Rotate through proxies in order and click Manage Proxy list in the proxy list, you can add a single proxy by typing in the ‘Host’ field Bright Data’s IP which is 127.0.0.1 and type the port number in the Port field.
The user-name and password needed can be found in the Proxy Manager under General tab or in your Bright Data dashboard on the zones page.

If I want to create a bulk of proxy ports in the Proxy Manager to use in my browser, I can go to my last port or create a new one.
Once I have the port I want to duplicate I will go to the General tab and here at Multiply proxy port I’ll select 10, this will create a total of 10 identical ports.

Now I can create an excel sheet with columns ordered like Insomniac so I can copy and paste the port numbers, host, username, and password.
Now I am ready to upload this to Insomniac.

Within the Proxy Manager on the left, you will find an FAQ button.
Here you will find valuable information and explanations on the many ways to optimize your data collection needs along with answers to questions you may have.

I hope you enjoyed this webinar and again please feel free to contact us with any questions you may have.

Resource download

Already have an account? Log in
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
Already have an account? Log in
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
Already have an account? Log in
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
Already have an account? Log in
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
Already have an account? Log in
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network

Web Scraper IDE - Contact Us

  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
Already have an account? Log in

Sign-up is required to get the dataset sample

  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network

Download Dataset Sample

Sign-up is required to get the dataset sample

Get a free sample
Bright Insights eCommerce Report

  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
Already have an account? Log in
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network

Join our Partner Program

Dataset Sample Request

  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network
Already have an account? Log in

GLOBAL MARKET LEADER

RECOGNIZED BY

Crozdesk Award - Happiest Users - High :User Satisfaction Award 2020
Award from Crozdesk for Quality Choice Top Ranked Solution 2020
AWARD - Crozdesk Trusted Vendor High Market Presence 2020
Schedule a call
  • 20,000+ Customers Worldwide
  • Category Leader #1 in Web Data
  • 5,500+ Granted Patent Claims & Counting
  • Used by: Fortune 500 Companies, Academia, SMBs, NGOs

Trusted BY

Powered by an award-winning proxy network