May
Have you ever really wondered how spiders really read data? Like really read it? I’ll drop something you may find interesting which will make your day to day life change drastically if you write your own content, or even just leave comments. Just a side note, I wouldn’t be posting this if it weren’t for [...]
Have you ever really wondered how spiders really read data? Like really read it? I’ll drop something you may find interesting which will make your day to day life change drastically if you write your own content, or even just leave comments. Just a side note, I wouldn’t be posting this if it weren’t for a friend because one of his posts reminded me about this. I think you’ll find it interesting, none the less.
So let’s think about how spider’s work in general. Spiders go to your website, and for the most part they see flat HTML. Do they execute JavaScript? Do they execute Java? What about jQuery? If you say no to all of the previously mentioned statements, you are a dumbass. I’ve actually seen a Google Crawler join an IRC channel via a Java app. It was on a website it was crawling, which shows that it does actually read Java. And no, I wasn’t imagining things or lying, the actual hostmask was the crawl googlebot hostmask with a Google owned and operated IP.
Is it safe to say that they actually take whatever data about the scripts they’re running that they can? Maybe they won’t be able to interpret all of the data coming from it like when I said “Hey Googlebot!” but it could certainly look at the description, the data in it including the IRC URL, etc. Also, along the same lines is it out of reach to consider that Google Crawlers may actually use OCR to see the text in banners? Is it really that far out of reach? Personally, from what I’ve seen that can happen with OCR and CAPTCHA’s, I don’t doubt it for one second. For you doubters, 150-200 links/minute anyone? All CAPTCHA “protected”?
Anyway, back to the subject at hand. What can Google Crawlers read and not read. By this point I’m sure there’s at least a handful of people going “why hasn’t he mentioned Flash yet?”. I’ll address that right now. One of my “research friends” wrote up this article about Indexing and Flash over at SERPable. If you give it a full read he actually goes into a lot of detail regarding what he found out with indexing flash.
When you start to think more along the lines of how spiders are configured and made to ‘think’ (algorithms, etc) you can start to find exactly what they’re looking at and seeing. Now think outside the box at how the data they collect is manipulated and categorized. Here’s an example: if you have a 500 word unique article that you can “spin” into being unique again, would you stop at just 5 spins? What about re-ordering all the sentences and doing it again? What about running it through a synonym function to find higher words synonyms or even taking a word, getting an antonym and saying “not <word>”?
An example of this would be: “The quick brown fox jumps over the lazy dog” => “The not so slow brown fox leaps over the lethargic dog”. Do you really think the spider would be able to tell the difference? It knows a few things … unique or not unique, keyword density, and related words. If you have a list of synonyms and antonyms, so do they. Keep that in mind.
As for other data and cloaking I won’t even really touch on that right now. I used to experiment with cloaking and it’s very hard to get around especially without an IP list of Google spiders. If you think UserAgent is the best way, I’d try something else after kicking yourself in the ass. Remember that Google’s entire job is to ensure that the best quality sites make it to the top. My job is to figure out how to get it up there, so that it stays.
Anyway I hope this helped you all think a little bit outside the box, and if you’re not members of WickedFire I hope you signup and take a peek at my post here that has some PHP code I posted about making unique content. Have a good one
PS – Bored? Skype me @ Contempt.me (that’s the username). If you don’t catch me in the middle of coding we can have a little chat …
Sep
EPN & Blackhat
A lot of people often correspond the Ebay Partner Network and Blackhat as being a cookie stuffer. That’s true in some cases, but not in mine. I’ve been doing some smaller studies and seeing what kind of damage I can do with EPN, and now that I’ve hit $1000 I figure I should report how [...]
A lot of people often correspond the Ebay Partner Network and Blackhat as being a cookie stuffer. That’s true in some cases, but not in mine.
I’ve been doing some smaller studies and seeing what kind of damage I can do with EPN, and now that I’ve hit $1000 I figure I should report how I did it and more importantly, my response to the “random terminations” that EPN has been doing (in my opinion).
As of right now here are my stats “all time” and I will round to the closest logical number to make it easier to understand.
Clicks: ~13100
EPC: ~$7.95
Earnings: ~$1000
These are not exact results obviously, but here’s the funny thing. I hold a high EPC, I don’t cookie stuff what-so-ever and all of my visitors go to the products that they’re looking for. That’s my whole reasoning for me not being terminated from EPN so far. There were some days where I had 300-400 uniques a day, per site and I was running maybe 3-4 sites. I’m about to scale this out to about 700 and see what kind of numbers I can run with that. We’ll call that the “followup EPN & Blackhat casestudy”. This is not using any link spam what-so-ever and also no cloaking. The methods for the content itself is actually whitehat so it’s very hard to consider this blackhat at all. The only reason I am, is due to the fact that the sites DO get penalized due to … I’m assuming “lack of good quality content”.
There’s a good detail to that, though. They may be penalized, but instead of pulling 300-400 uniques a day they still pull 30-50 uniques a day per domain. So my assumption is that with 700 of them running, even when penalized, I’ll be able to keep up with around 21,000 uniques/day lowball and 35,000 highball. Obviously this is a goal to shoot for, but the actual reasoning behind dictates probably higher penalties and problems trying to scale. I guess we’ll find out just how much it will, assuming I don’t kick the crap out of my server and send it into a million pieces and/or start a datacenter fire.
My costs for this case study were low. A total of $50 was spent. Yes, you heard me right, $50 was spent and over $1000 return. The timeframe for this was really thrown off, I would work a little on it then let it sit. Total time invested in this was about 4-5 hours. 5 hours and $1000 later, that’s not bad for $200 an hour. A total of 4 domains were used, and I have 7 sitting which I’m going to use for this next test.
The method behind this is obviously a win/win situation.
For EPN
- I provide high-converting traffic
- I don’t cookie stuff
- I provide volume
- I provide sales
- There’s no reason for them to terminate me.
For Me
- Low Time Invested
- Low Money Invested
- Big Profit Potential When Scaled
Now, to address the “random EPN terminations” I believe that they have an algorithm on the backend that runs every now and again (like the Google slaps) that calculates both your EPC, conversions, and more importantly in/out keyword purchase ratio. Every time I read about someone that got terminated they were doing whitehat and legit things (with the exception of a few, obviously). What I think happens is that EPN tracks the keyword that the user came in on, and compares that with what they buy. If the user buys something later on that isn’t related to that product (to be safe lets say the same product category) – you get flagged. This will not happen due to one person of course, but lets say if 40%+ of your ‘sales’ are from unrelated searches, you get a ‘termination’.
This is just my take on it, but it does explain why I can keep on with my account and consistently provide traffic and make money – and some white hats have been terminated. But of course, this is my opinion and I can’t confirm it what so ever.
Hope you enjoyed the read.

(6 votes, average: 4.83 out of 5)


