See the results

Don't want to kwow how I got the data and just look at it's visualization?

Have a look here!

Introduction

Imagine you work in the adult industry and you own several adult websites. Often, website owners want to specialize and offer a product in a niche to increase revenue. To be successful in this niche, you need to know which product is most demanded.

Now think about the following hypothetical scenario: You want to create a new affilate partner-ship with big porn sites, but you do not know which partner fits best your criteria. You have a list of all your adult models and you want to know which affilate site ranks on the top in relation to each model on the biggest search engines.

Aquire the data

Because your list is very long and you don't want to enter all your list manually in the search engines, you need a program that automates the task. Luckily, there is already such a program, it's called GoogleScraper. And even better, it's completely open source!

  • So first you need to decide how many search engine result pages you want to search for every keyword. We will choose 2 sites per keyword!

  • Then to diversify our results, we will scrape our keywords on three different search engines: Google, Bing and Yahoo.

  • Next you need to create a text file with all the model names that you want to search. In my case, I just take the pornstar names from a IMDB list.

  • Then because we want our results to be from the same country (using proxies from different countries will mix the results and it's validity), we will use 6 US proxies.

Because we want to find porn tube sites, we append on every keyword the slug tube.

Our list:

Jenna Jameson tube
Asia Carrera tube
Tori Black tube
Sunny Leone tube
Audrey Bitoni tube
Jayden Jaymes tube
Belladonna tube
Gianna Michaels tube
Sasha Grey tube
Jenna Haze tube
Katie Morgan tube
Bree Olson tube
Jesse Jane tube
Tera Patrick tube
Kayden Kross tube
Aletta Ocean tube
Alexis Texas tube
Kagney Linn Karter tube
Lexi Belle tube
Riley Steele tube
Eva Angelina tube
Nikki Benz tube
Jessica Drake tube
Cassandra Cruz tube
Lisa Ann tube
Lily Carter tube
India Summer tube
Skin Diamond tube
Asa Akira tube
Céline Tran tube
Kristina Rose tube
Stoya tube
Shyla Stylez tube
Veronica Rayne tube
Audrey Hollander tube
Jenaveve Jolie tube
Faye Reagan tube
Kobe Tai tube
Heather Hunter tube
Lynn LeMay tube
Daniella Rush tube
Nina Hartley tube
Chasey Lain tube
Janine Lindemulder tube
Amber Lynn tube
Stephanie Swift tube
Julia Taylor tube
Nikki Tyler tube
Olivia Del Rio tube
Katja Kean tube
Lelu Love tube
Annette Schwarz tube
Tanner Mayes tube
Allyssa Hall tube
Brooklyn Lee tube
Samantha Saint tube
Silvia Saint tube
Jessie Andrews tube
Tasha Reign tube
Nikki Dial tube
Mercedez tube
Annette Haven tube
Havana Ginger tube
Deidre Holland tube
Catalina Cruz tube
Adriana Sage tube
Tia Bella tube
Angel Kelly tube
Paola Rey tube
Lupe Fuentes tube
Anetta Keys tube
Luci Thai tube
Rebeca Linares tube
Evelyn Lin tube
Taylor Hayes tube
Carmen Luvana tube
Teagan Presley tube
Sophia Santi tube
Dark Angel tube
Aria Giovanni tube
Lela Star tube
Bella Moretti tube
Lacey DuValle tube
Isis Taylor tube
Ashlynn Brooke tube
Maria Ozawa tube
Lolo Ferrari tube
Linda Lovelace tube
Bambi Woods tube
Traci Lords tube
Kylie Ireland tube
Aurora Snow tube
Hillary Scott tube
Amber Rayne tube
Misty Stone tube
Kimberly Kane tube
Naomi Banxx tube
Madison Ivy tube
Haley Cummings tube
Gia Paloma tube

Now that we have our configuration, we can install GoogleScraper, you can do so by reading How to install GoogleScraper.

Then we open a command window and enter the following into it:

GoogleScraper -m selenium --proxy-file path/to/our/keywordfile.txt --keyword-file path/to/our/proxyfile.txt -s 'bing,google,yahoo' -p 2 -z 5

Of couse path/to/our/keywordfile.txt nees to be a valid keyword file (just paste the contents above) and path/to/our/proxyfile.txt needs to be a valid proxy file.

Then just wait. We do 600 requests alltogether. This is much more than we could do manually. We'd need hours to do it.

Why 600 requests?

100 keywords, two pages for any keyword, in 3 search engines: 100 * 2 * 3 = 600

The results

You want to see all the scraped links? Here are All scraped urls.

I got them like this:

First start GoogleScraper in a shell:

GoogleScraper --shell

Then in the shell:

In [13]: o = session.query(ScraperSearch).first()

In [14]: f = open('/tmp/pr0n_links.txt', 'wt')

In [15]: for serp in o.serps:
    for link in serp.links:
        f.write(link.link + '\n')

Those are 12100 unique links on 600 serp pages. I needed 4 minutes to scrape them with only 5 proxies. If I would use 500 proxies, there wouldnt' be any limit at all.

Now let's do some analysis:

  • What is the most common porn tube domain for these 100 porn models?
  • Which model got the most hits on which search engine? Do the search engines have different rankings for the models with the most hits?

Most common domain

In [31]: from sqlalchemy import func
In [30]: sorted(session.query(Link.domain, func.count(Link.domain)).group_by(Link.domain).all(), key=lambda x: x[1], reverse=True)
Out[30]: 
[('www.tubegalore.com', 361),
 ('www.4tube.com', 357),
 ('www.redtube.com', 329),
 ('www.nudevista.com', 327),
 ('www.tubepleasure.com', 291),
 ('www.largeporntube.com', 257),
 ('www.tubesplash.com', 232),
 ('www.pornhub.com', 215),
 ('www.tubepornstars.com', 201),
 ('www.tube8.com', 195),
 ('www.xvideos.com', 191),
 ('www.youjizz.com', 191),
 ('www.porntube.com', 172),
 ('www.goldporntube.com', 144),
 ('www.apetube.com', 133),
 ('www.youtube.com', 129),
 ('www.tubevector.com', 111),
 ('www.tubestack.com', 105),
 ('tube.asexstories.com', 84),
 ('www.tubegals.com', 83),
 ('www.nudevista.at', 80),
 ('www.3movs.com', 64),
 ('www.bravotube.net', 64),
 ('www.hornytube.xxx', 64),
 ('www.cliphunter.com', 54),
 ('www.tubezaur.com', 52),
 ('www.youporn.com', 46),
 ('www.tubepornkiss.com', 42),
 ('www.freshporntube.com', 35),
 ('tubegals.com', 34),
 ('www.mammothtube.com', 32),
 ('www.bestandfree.com', 30),
 ('www.royalporntube.com', 29),
 ('www.keezmovies.com', 26),
 ('www.nurglesnymphs.com', 25),
 ('xxxbunker.com', 25),
 ('see-tube.com', 24),
 ('pornsharing.com', 21),
 ('www.porndig.com', 21),
 ('www.nurglestube.com', 20),
 ('hqfucks.com', 18),
 ('www.maturesexbox.com', 18),
 ('www.pornstargalore.com', 18),
 ('www.xnxx.com', 18),
 ('sextubebox.com', 16),
 ('www.bangsextube.com', 16),
 ('www.eskimotube.com', 16),
 ('www.freeones.com', 16),
 ('www.roccotube.com', 16),
 ('www.tubeporncity.com', 16),
 ('xxxmomtube.com', 16),
 ('xhamster.com', 15),
 ('www.69porntube.com', 14),
 ('www.elephanttube.com', 14),
 ('www.carameltube.com', 13),
 ('www.empflix.com', 13),
 ('www.rulertube.com', 13),
 ('clubxxxvideos.com', 12),
 ('glamourtubes.com', 12),
 ('myxxxporn.com', 12),
 ('oxotube.com', 12),
 ('sextubegonzo.com', 12),
 ('tryporntube.com', 12),
 ('www.iliketubes.com', 12),
 ('www.jizzonline.com', 12),
 ('www.cubeporntube.com', 11),
 ('www.vporn.com', 11),
 ('www.youngpornvideos.com', 11),
 ('fucked-tube.com', 10),
 ('hqsextubes.com', 10),
 ('www.itubexxx.com', 10),
 ('www.videosbang.com', 10),
 ('porntubemovs.net', 9),
 ('scandaltubes.com', 9),
 ('www.alohatube.com', 9),
 ('www.tjoob.com', 9),
 ('hornyanalsex.com', 8),
 ('matures-naked.com', 8),
 ('porn69.org', 8),
 ('www.asspoint.com', 8),
 ('www.nexttube.xxx', 8),

The porn star with the most hits!

The number of results on a serp page are saved in serp.num_results_for_keyword

The format depends on the search engine and the language settings of the searching.

First create a little helper function:

import re

def number_of_results_for_serp(o):
    v = {}
    for serp in o.serps:
        if serp.page_number == 1:
            query = serp.query[:-5]

            num_results = re.search(r'(?P<n>(\d\.?,?)*)\s*?(Ergebnisse|results)', serp.num_results_for_keyword)
            try:
                num_results = int(re.sub(r'[.,]', '', num_results.group('n')))
            except:
                continue

            if not query in v:
                v[query] = {}

            v[query][serp.search_engine_name] = num_results

    return v

Then we use it like this:

In [55]: from module import number_of_results_for_serp
In [38]: number_of_results_for_serp(o)
Out[38]: 
{'Adriana Sage': {'bing': 1300000, 'google': 684000, 'yahoo': 395000},
 'Aletta Ocean': {'bing': 1710000, 'google': 1110000, 'yahoo': 1880000},
 'Alexis Texas': {'bing': 4700000, 'google': 1570000, 'yahoo': 4950000},
 'Allyssa Hall': {'bing': 21, 'google': 482000, 'yahoo': 619000},
 'Amber Lynn': {'bing': 2200000, 'google': 2250000, 'yahoo': 1970000},
 'Amber Rayne': {'bing': 940000, 'google': 1090000, 'yahoo': 889000},
 'Anetta Keys': {'bing': 362000, 'google': 549000, 'yahoo': 382000},
 'Angel Kelly': {'bing': 13, 'google': 18100000, 'yahoo': 1630000},
 'Annette Haven': {'bing': 285000, 'google': 676000, 'yahoo': 319000},
 'Annette Schwarz': {'bing': 547000, 'google': 1060000, 'yahoo': 556000},
 'Aria Giovanni': {'bing': 781000, 'google': 1080000, 'yahoo': 1850000},
 'Asa Akira': {'bing': 3400000, 'google': 1540000, 'yahoo': 3810000},
 'Ashlynn Brooke': {'bing': 28, 'google': 1050000, 'yahoo': 1630000},
 'Asia Carrera': {'bing': 1050000, 'google': 1780000, 'yahoo': 1170000},
 'Audrey Bitoni': {'bing': 2040000, 'google': 994000, 'yahoo': 2000000},
 'Audrey Hollander': {'bing': 605000, 'google': 941000, 'yahoo': 632000},
 'Aurora Snow': {'bing': 1790000, 'google': 1810000, 'yahoo': 1880000},
 'Bambi Woods': {'bing': 301000, 'google': 446000, 'yahoo': 72400},
 'Bella Moretti': {'bing': 405000, 'google': 939000, 'yahoo': 307000},
 'Belladonna': {'bing': 1920000, 'google': 2830000, 'yahoo': 2690000},
 'Bree Olson': {'bing': 2810000, 'google': 1450000, 'yahoo': 4140000},
 'Brooklyn Lee': {'bing': 2800000, 'google': 4210000, 'yahoo': 1980000},
 'Carmen Luvana': {'bing': 426000, 'google': 727000, 'yahoo': 244000},
 'Cassandra Cruz': {'bing': 683000, 'google': 1080000, 'yahoo': 478000},
 'Catalina Cruz': {'bing': 661000, 'google': 843000, 'yahoo': 527000},
 'Chasey Lain': {'bing': 168000, 'google': 481000, 'yahoo': 160000},
 'Céline Tran': {'bing': 162000, 'google': 212000, 'yahoo': 99100},
 'Daniella Rush': {'bing': 531000, 'google': 593000, 'yahoo': 408000},
 'Dark Angel': {'bing': 31100000, 'google': 11700000, 'yahoo': 21400000},
 'Deidre Holland': {'bing': 23, 'google': 230000, 'yahoo': 54900},
 'Eva Angelina': {'bing': 30, 'google': 1250000, 'yahoo': 6110000},
 'Evelyn Lin': {'bing': 868000, 'google': 879000, 'yahoo': 1040000},
 'Faye Reagan': {'bing': 1410000, 'google': 1120000, 'yahoo': 1660000},
 'Gia Paloma': {'bing': 572000, 'google': 899000, 'yahoo': 449000},
 'Gianna Michaels': {'bing': 3800000, 'google': 1370000, 'yahoo': 4290000},
 'Haley Cummings': {'bing': 506000, 'google': 851000, 'yahoo': 666000},
 'Havana Ginger': {'bing': 21, 'google': 815000, 'yahoo': 402000},
 'Heather Hunter': {'bing': 1130000, 'google': 1200000, 'yahoo': 795000},
 'Hillary Scott': {'bing': 22, 'google': 954000, 'yahoo': 2310000},
 'India Summer': {'bing': 32200000, 'google': 30500000, 'yahoo': 24600000},
 'Isis Taylor': {'bing': 1080000, 'google': 1200000, 'yahoo': 1100000},
 'Janine Lindemulder': {'bing': 345000, 'google': 1010000, 'yahoo': 191000},
 'Jayden Jaymes': {'bing': 25, 'google': 1150000, 'yahoo': 3130000},
 'Jenaveve Jolie': {'bing': 24, 'google': 871000, 'yahoo': 1400000},
 'Jenna Haze': {'bing': 25, 'google': 1300000, 'yahoo': 3490000},
 'Jenna Jameson': {'bing': 1650000, 'google': 1130000, 'yahoo': 2220000},
 'Jesse Jane': {'bing': 1820000, 'google': 1410000, 'yahoo': 3150000},
 'Jessica Drake': {'bing': 1040000, 'google': 1350000, 'yahoo': 872000},
 'Jessie Andrews': {'bing': 1120000, 'google': 1040000, 'yahoo': 2140000},
 'Julia Taylor': {'bing': 3830000, 'google': 11300000, 'yahoo': 4320000},
 'Kagney Linn Karter': {'bing': 1630000, 'google': 1090000, 'yahoo': 1690000},
 'Katie Morgan': {'bing': 2280000, 'google': 1640000, 'yahoo': 4030000},
 'Katja Kean': {'bing': 122000, 'google': 363000, 'yahoo': 96600},
 'Kayden Kross': {'bing': 1740000, 'google': 905000, 'yahoo': 1710000},
 'Kimberly Kane': {'bing': 1160000, 'google': 857000, 'yahoo': 30200000},
 'Kobe Tai': {'bing': 319000, 'google': 769000, 'yahoo': 188000},
 'Kristina Rose': {'bing': 24, 'google': 1360000, 'yahoo': 7760000},
 'Kylie Ireland': {'bing': 530000, 'google': 626000, 'yahoo': 381000},
 'Lacey DuValle': {'bing': 1160000, 'google': 787000, 'yahoo': 1290000},
 'Lela Star': {'google': 882000, 'yahoo': 4080000},
 'Lelu Love': {'bing': 1260000, 'google': 987000, 'yahoo': 1660000},
 'Lexi Belle': {'bing': 4010000, 'google': 1380000, 'yahoo': 4860000},
 'Lily Carter': {'bing': 1350000, 'google': 1260000, 'yahoo': 1880000},
 'Linda Lovelace': {'bing': 355000, 'google': 390000, 'yahoo': 115000},
 'Lisa Ann': {'bing': 7870000, 'google': 10300000, 'yahoo': 9660000},
 'Lolo Ferrari': {'bing': 294000, 'google': 526000, 'yahoo': 191000},
 'Luci Thai': {'bing': 6, 'google': 515000, 'yahoo': 1460000},
 'Lupe Fuentes': {'bing': 413000, 'google': 850000, 'yahoo': 448000},
 'Lynn LeMay': {'bing': 183000, 'google': 405000, 'yahoo': 69000},
 'Madison Ivy': {'bing': 25, 'google': 1470000, 'yahoo': 4340000},
 'Maria Ozawa': {'bing': 1380000, 'google': 1240000, 'yahoo': 1370000},
 'Mercedez': {'bing': 765000, 'google': 1420000, 'yahoo': 8550000},
 'Misty Stone': {'bing': 1210000, 'google': 956000, 'yahoo': 1320000},
 'Naomi Banxx': {'bing': 844000, 'google': 128000, 'yahoo': 240000},
 'Nikki Benz': {'bing': 1420000, 'google': 1090000, 'yahoo': 1910000},
 'Nikki Dial': {'bing': 285000, 'google': 392000, 'yahoo': 101000},
 'Nikki Tyler': {'bing': 1220000, 'google': 1910000, 'yahoo': 1640000},
 'Nina Hartley': {'bing': 30, 'google': 1490000, 'yahoo': 1880000},
 'Olivia Del Rio': {'bing': 22, 'google': 790000, 'yahoo': 40000000},
 'Paola Rey': {'bing': 952000, 'google': 853000, 'yahoo': 455000},
 'Rebeca Linares': {'bing': 22, 'google': 2490000, 'yahoo': 3710000},
 'Riley Steele': {'bing': 21, 'google': 924000, 'yahoo': 1290000},
 'Samantha Saint': {'bing': 1510000, 'google': 1270000, 'yahoo': 1370000},
 'Sasha Grey': {'bing': 27, 'google': 1540000, 'yahoo': 5220000},
 'Shyla Stylez': {'bing': 30, 'google': 1200000, 'yahoo': 3520000},
 'Silvia Saint': {'bing': 1620000, 'google': 1500000, 'yahoo': 2360000},
 'Skin Diamond': {'bing': 5430000, 'google': 14500000, 'yahoo': 5180000},
 'Sophia Santi': {'bing': 255000, 'google': 648000, 'yahoo': 316000},
 'Stephanie Swift': {'bing': 660000, 'google': 1110000, 'yahoo': 559000},
 'Stoya': {'bing': 455000, 'google': 747000, 'yahoo': 314000},
 'Sunny Leone': {'bing': 3450000, 'google': 8790000, 'yahoo': 2140000},
 'Tanner Mayes': {'bing': 900000, 'google': 888000, 'yahoo': 846000},
 'Tasha Reign': {'bing': 699000, 'google': 860000, 'yahoo': 557000},
 'Taylor Hayes': {'bing': 1690000, 'google': 2060000, 'yahoo': 1800000},
 'Teagan Presley': {'bing': 559000, 'google': 753000, 'yahoo': 409000},
 'Tera Patrick': {'bing': 1680000, 'google': 1010000, 'yahoo': 2600000},
 'Tia Bella': {'google': 1170000, 'yahoo': 849000},
 'Tori Black': {'bing': 4980000, 'google': 9960000, 'yahoo': 6810000},
 'Traci Lords': {'bing': 351000, 'google': 677000, 'yahoo': 934000},
 'Veronica Rayne': {'bing': 632000, 'google': 867000, 'yahoo': 465000}}

The above data represents the number of hits for the each model search on the three search engines. But because we cannot see any pattern in this mass of data and we want to know how strong the hits in the three different search engines differ, we need to visualize the data.