PDA

View Full Version : .htaccess -- Blocking leechers!


wsjb78
05-30-2003, 07:05 AM
Hiya,

I'm using this .htaccess to keep site leechers out of my site. One thing I noticed is that the .htaccess gives a 403 error if there is no [NC] at the end of the last condition.

I think some of you might like this one also (wget is deactivated because I use it to mirror some things form another server)

Let me know what you think of this!

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} .*download.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} .*cyg.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} .*setup.* [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} .*get.* [NC,OR]
RewriteCond %{HTTP_USER_agent} .*almaden.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_agent} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_agent} ^attach [OR]
RewriteCond %{HTTP_USER_agent} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_agent} ^BackWeb [OR]
RewriteCond %{HTTP_USER_agent} ^Bandit [OR]
RewriteCond %{HTTP_USER_agent} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_agent} ^Buddy [OR]
RewriteCond %{HTTP_USER_agent} ^bumblebee [OR]
RewriteCond %{HTTP_USER_agent} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_agent} ^CICC [OR]
RewriteCond %{HTTP_USER_agent} ^Collector [OR]
RewriteCond %{HTTP_USER_agent} ^Copier [OR]
RewriteCond %{HTTP_USER_agent} ^Crescent [OR]
RewriteCond %{HTTP_USER_agent} ^DA [OR]
RewriteCond %{HTTP_USER_agent} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_agent} ^DISCo\Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^download\ Demon [OR]
RewriteCond %{HTTP_USER_agent} ^download\ Wonder [OR]
RewriteCond %{HTTP_USER_agent} ^downloader [OR]
RewriteCond %{HTTP_USER_agent} ^Drip [OR]
RewriteCond %{HTTP_USER_agent} ^DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_agent} ^EasyDL/2.99 [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_agent} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_agent} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_agent} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_agent} ^GetSmart [OR]
RewriteCond %{HTTP_USER_agent} ^gigabaz [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go\!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_agent} ^gotit [OR]
RewriteCond %{HTTP_USER_agent} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_agent} ^grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_agent} ^httpdown [OR]
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
RewriteCond %{HTTP_USER_agent} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_agent} ^Indy*Library [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_agent} ^InternetLinkagent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ ninja [OR]
RewriteCond %{HTTP_USER_agent} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_agent} ^Iria [OR]
RewriteCond %{HTTP_USER_agent} ^JBH*agent [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_agent} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_agent} ^LexiBot [OR]
RewriteCond %{HTTP_USER_agent} ^lftp [OR]
RewriteCond %{HTTP_USER_agent} ^Link*Sleuth [OR]
RewriteCond %{HTTP_USER_agent} ^likse [OR]
RewriteCond %{HTTP_USER_agent} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_agent} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_agent} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ downloader [OR]
RewriteCond %{HTTP_USER_agent} ^Memo [OR]
RewriteCond %{HTTP_USER_agent} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_agent} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_agent} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_agent} ^Mozilla*MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^MS\ FrontPage* [OR]
RewriteCond %{HTTP_USER_agent} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_agent} ^MSProxy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_agent} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_agent} ^ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_agent} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_agent} ^Ping [OR]
RewriteCond %{HTTP_USER_agent} ^PingALink [OR]
RewriteCond %{HTTP_USER_agent} ^Pockey [OR]
RewriteCond %{HTTP_USER_agent} ^psbot [OR]
RewriteCond %{HTTP_USER_agent} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Realdownload [OR]
RewriteCond %{HTTP_USER_agent} ^Reaper [OR]
RewriteCond %{HTTP_USER_agent} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_agent} ^Seeker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_agent} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_agent} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^Smartdownload [OR]
RewriteCond %{HTTP_USER_agent} ^Snake [OR]
RewriteCond %{HTTP_USER_agent} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_agent} ^Stripper [OR]
RewriteCond %{HTTP_USER_agent} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_agent} ^Szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_agent} ^URLSpiderPro [OR]
RewriteCond %{HTTP_USER_agent} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_agent} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebHook [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMiner [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_agent} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
#RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_agent} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_agent} ^x-Tractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_agent} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
RewriteCond %{HTTP_USER_AGENT} ^-$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^$ [NC]
RewriteRule ^.*$ http://www.google.com [L]

Feynman
05-30-2003, 10:16 AM
I'm a newbie here and there are things I don't understand.

I use BlackWidow from time to time on certain sites that are paysites because I don't want to have to access the site all the time to review the same documents.

It seems that Black Widow will not access the site if I don't have the password.

Why is it that you'd want to block the PAID user from accessing the site?

Is it part of the TOS that site rippers are not to be used ?

Or is it just a tactic to prevent the user to have a convenient access to what he paid for, imposing a nuisance on him in order to entice a renewal of the subscription?

Or what is the problem if it is of another nature?

wsjb78
05-30-2003, 02:20 PM
Well, we're not operating paysites but we are traffic brookers. We just don't want that some site-leeching programm can download a whole domain. Protecting pics wouldn't be a problem but we also want to protect our html files and stuff. It's ok to view one, two, hundred but we do not want to have all files downloaded at once!

monaro
05-30-2003, 02:28 PM
Originally posted by wsjb78
Well, we're not operating paysites but we are traffic brookers. We just don't want that some site-leeching programm can download a whole domain. Protecting pics wouldn't be a problem but we also want to protect our html files and stuff. It's ok to view one, two, hundred but we do not want to have all files downloaded at once!



question to wsjb78 =)

does this block out a desk top software called super--bot
that can download a site in seconds i will not post the url cause i dont know who might use this against us


sorry just read it

RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]

AcidMaX
05-31-2003, 08:24 AM
Originally posted by Feynman


Is it part of the TOS that site rippers are not to be used ?



From my experience people don't really care what is in the TOS. I could completely understand someone having this on their paysite members area. Paysite owners make money on recurring billing, and if someone comes in and downloads the entire site in 15 minutes, then cancels that hurts their recurring cash flow. Think of all the galleries that are submitted to MGP's on a daily basis and how much bandwidth Movies take up, for people to make a sale. Now imagine someone using a download manager to crawl The Hun or someone else and download all the movies linked on that site.

Now if you are the guy trying to refer pepople and folks are using download managers you are getting reamed of your bandwidth without them even seeing advertising. There are many uses for the htaccess script above, and i say GOOD WORK ! :)

AJ

Mister X
05-31-2003, 03:38 PM
One VERY good reason to block rippers in your member area is to help your trials convert. Why the hell would anybody want to renew their subscription when they have the whole site d/led in a couple of hours? If you don't have trials it's less of a problem because the member has a month to d/l shit anyways so the rebill is going to happen because of your update quality. But on a trial the whole idea is to have enough good content that the guy CAN'T d/l everything before the trial is up.