Advertisement

You're blocking ads, which pay for BlenderNation. Read about other ways to support us.

Downloading BlenderNation? Better don't...

47

We had a friendly visitor from Hong Kong this morning who was downloading thousands of pages from BlenderNation, causing our server to crash. I'm not sure why people would try to do this, but please rest assured that this will earn you an instant and permanent place in the BlenderNation Naughty Corner (also known as an IP-ban ;-).

We're looking into technical measures to automatically prevent this and add these people to our blacklist. Any tips doing this with WordPress or Apache would be much appreciated!

About the Author

Avatar image for Bart Veldhuizen
Bart Veldhuizen

I have a LONG history with Blender - I wrote some of the earliest Blender tutorials, worked for Not a Number and helped run the crowdfunding campaign that open sourced Blender (the first one on the internet!). I founded BlenderNation in 2006 and have been editing it every single day since then ;-) I also run the Blender Artists forum and I'm Head of Community at Sketchfab.

47 Comments

  1. Looks like a DoS attack. As long as it's not a DDoS attack block the source IP via iptables and send an abuse report to the hosting provider of that machine.

  2. @Craig: hmm, that looks rather sophisticated aka a lot of work :) I'm looking for a quick & simple way to block people.

    @Manko10: yes, I blocked this IP via a .htaccess rule now. The trick though is automatically detecting these attacks/downloads and then adding them to the list of blocked IPs automatically. It's kind of defeating the purpose to use a site-crash as a notification system ;-)

    @Tristan: they were downloading the entire site.

  3. Not a good idea. When mod_rewrite comes into play a server instance is already forked and that consumes memory which causes the server to crash on a DoS attack. The assaulter must be blocked before any server process is spawned to answer the request.

  4. Bart. Appearances can be deceiving :)

    The following steps will have you up and running in under 5 minutes (on a Linux box)

    wget http://www.ossec.net/files/ossec-hids-2.4.tar.gz
    tar xvzf ossec-hids-2.4.tar.gz
    cd ossec-hids-2.4
    ./install.sh (as root)
    answer the handful of questions asked by the install script.

    OSSEC have also released a WordPress plugin (which I've only just discovered) : http://www.ossec.net/main/wpsyslog2

    I'm investigating to see how easy it will be to add a rule for aggressive/mass http requests from a single IP.

  5. Last answer was to Tristian. But for .htaccess it's the same. It consumes not that much memory as it would if your blog's PHP code is processed but on an aggressive DoS attack your server will just go down a little later. Best way is to use iptables but of course that's only possible if you have root access to your server or if your hosting provider let's you set firewall rules via you admin interface.

    Little addition: the mentioned HIDS named OSSEC also uses iptables.

  6. IP ban's work for our site but we're using Modx which is great for scripting up a little 'if doing somethng weird then kill session then log and ban' script. You'd be surprised at quite how few times it does kick in but seems to be working well enough for preventing bot's and xss attacks etc.

  7. Hello,
    Once I have a similar problem with a site that I help to admin.
    Putting it short, what we did was put an invisible link in the pages linking to a php script that add the ip to the .htacces. This way you can stop stupid crawling of the site.
    By invisible I mean that it does not have text, so when the browser renders the page nothing is "printed", so an user can click accidentally the link, only parsers "see" the link.

    You can try this as first attempt, of course it depends on how "intelligent" is the crawler, but there is much root to improve.

  8. Hi,
    I think this is the problem:
    (Ref. http://summary.net/manual/tutorial/lesson12-3.html)

    Limiting Mirroring Tools

    You may be able to limit the use of mirroring tools by configuring you server to block traffic from particular user agents, such as wget. Unfortunately, it is simple for users of these tools to change the user agent to look like, for example, Microsoft Internet Explorer 5, which you really would not want to block. Unfortunately, beyond that it is quite complicated to stop mirroring. As far as the web server is concerned, it looks very much like regular web site traffic. Some web servers will limit the number of simultaneous connections from a given host, which may help reduce excessive load caused by greedy mirroring software, but a “friendly” mirror is indistinguishable from visitor traffic and virtually impossible to detect and block.

    But, a possible solution is in the log; a log scanner like Fail2ban (http://www.fail2ban.org), can reach the interesting part of the apache (or iptables) log, to automatically write into the .htaccess file the domain from wich the scan come (sorry for my english)

    I need more time from an exhaustive search.
    Just find something concrete, notice. (google translate... sic...)

    Ciao
    Maurizio

  9. @Rafael Rios: ah yes! I've seen that before, that's a terriffic idea, I'll check that out. What would be even nicer, is if there's a WordPress plugin that already does that for you! Ideas?

    @txrx: oh, ModX looks *nice* I'll have to look at that, too :)

  10. @Bart: I only have a basic knowledge of wordpress, I don't know if there is a plugin for that. But I think you can put the "invisible" link in the foot of the page, and make a handmade php script.
    This was quite time ago, but I think I have the php scripts at home. I can share them with you if you need them.

  11. If you have the url point to a blank page, and then use a server side script to retrieve the actual page.. might work. That way whoever tries to mirror the site only gets a repetitive blank page not even 2 kb in size.

    If you can't block spiders. simply don't redirect.

    Don't ask me how to implement this. It's tricky but it can be done in php.

    er.. and keep an eye on you .htaccess file.

  12. Brian Treacy on

    Pardon my ignorance, but downloading the entire site -that's not the same as working offline, is it? Have a nice day,
    -Brian

  13. Sites like archive.org and also Google regularly download whole websites. Its how they work. However, they usually do that kinda slowly so that it doesn't affect the functionality of the target website.

  14. I have heard recently that China has it's own version of the web which has a sanitised copy of most of the popular websites from the rest of the world. Maybe it was a lazy attempt to copy the BlenderNation website. I have no basis other than hearsay and I am totally open to criticism of my sources! Can you say the same CHINA??? :)

  15. Antonio Pereira on

    I do have a question, Is there a place in blender nation where you can download tutorials in a more organized way instead of hunting through the wole site? Thanks!!

  16. It's strange that someone wanted to download BlenderNation website which is mostly news site. It's annoying if it causes server overload, but this is not necessarily an attack. Might be just lame user trying to download one section of the site, using wrong download depth settings, thus ripping all site, and using several streams to download faster. I myself mirror pages time from time, usually API references or something, to be able to browse off-line when I stay at places with limited Internet connectivity.

  17. Ahhh so that is why I couldnt check my favorite website yesterday. Well that was strange, ip ban would do well, but prob PHP would help more too. just saying.

  18. @txrx How do you want to block XSS attacks with IP bans? XSS = Cross Site Scripting. That doesn't have anything to do with bad guys flooding the server.

    @krizas A normal user shouldn't be able to crash a server without crashing his own internet connection before. Please don't forget, that a server's connection is not comparable to a normal DSL connection.

    @all most guys here forget, that CGI programs, PHP scripts etc. are executed by a webserver and therefore consume a lot of system resources. Blocking mass requests has to be done before server instances are spawned.

  19. @Manko10: you are right, but as Bart said it was one person downloading (crawling?) blendernation and not a DoS attack (or similar) from serveral places arround the world. So if you block it in .htaccess would be enough.
    From time to time apears someone try to crawl for serveral pages, like tutorials series, with a bad configured crawler causing poblems on the site server.
    One user can bring down a server, even with a slow DSL, all depends on how the site has been built, ie. time consuming templates, tons of sql sentences to build one page, etc., if in one moment you get more request than usual you can be in troubles.

  20. I haven't read the responses because I'm more than sure that you can muscle WordPress into doing whatever you want. I'm also more than sure this is handled or about to be. :] Good game Hong Kong Phooey..

  21. hey, my webmaster suggested a simple solution. go to your apache server interface, and set your firewall to block that IP. if it is one individual, you can a make a single rule with a string of known IPs. should fix the problem with a minimum of effort.

  22. jim ww: unless they can alter their ip - a trip to youtube shows how unbelievably easy this is. -with a program of course.

  23. I use fail2ban to prevent such things very effectively. I.e. if someone tries to get a page which does not exist for 8 times within a defined time frame the Ip of this guy is blocked for some hours (it's up to you to define for how long the block exists) - just one example of things which are possible.

    The block uses standard IP Table rules (that's the standard firewall within linux). To do that it monitors your apache logfiles (I however use lighttpd with my own rules, workes like a charm). Makes only sense if you have root access to the server of course.
    It is very easy to setup your own rules. It uses regex (uh!) which allows very complex rules but there is a regex-test util included which makes testing of your own rules a piece of cake. It doesn't eat much of your servers CPU (1% maybe) and whitelist-rules are also possible (and I'd recommend to use em :) ).

    It's within the official ubuntu and fedora repositories, for sure in debian as well, with some standard rulesets included.

    More details can be found on their page .

    If you can python script your blenders it will be no match to set this up :) But be sure to test it before using it on this productive page, if you use it the wrong way you soon prevent your honest visitors to look at blender nation!

    Regards,
    Herr Irrtum!

  24. More than likely a automated bot going through your site. One thing 99% of bots have in common is that they connect directly using your IP address rather than your DNS name.

    I set up an Apache rewrite rule similar to the following:

    #Blacklist bots via server name
    RewriteCond %{SERVER_NAME} !www.foobar.com #Server's DNS
    RewriteCond %{SERVER_NAME} !192.168.0.5 #LAN IP Address
    RewriteCond %{SERVER_NAME} !127.0.0.1
    RewriteRule ^.*$ http://127.0.0.1/blacklist?ip=%{REMOTE_ADDR} [P]

    #Blacklist bots via http host name
    RewriteCond %{HTTP_HOST} !www.foobar.com #Server's DNS
    RewriteCond %{HTTP_HOST} !192.168.0.5 #LAN IP Address
    RewriteCond %{HTTP_HOST} !127.0.0.1
    RewriteRule ^.*$ http://127.0.0.1/blacklist?ip=%{REMOTE_ADDR} [P]

    The blacklist function that it is redirected to returns a nasty little message ;) and then adds the ip address to the iptables blacklist using ipset (you have to have the ipset kernel module installed, as well as an iptables rule to use it)

    Programmed mine in ruby, but I'm sure you could call the ipset command just as easily in PHP. You may have to set up a sudo rule to allow the apache user to call the ipset command.

    Browsers always fill SERVER_NAME and HTTP_HOST with the DNS name, but when bots connect those values will either not be set, or be set to the servers IP address.

  25. Though this is not a technical measure, trusting that they are not doing this with an evil intention, how about providing a service option to let anyone have a copy of site contents (in static html) for a certain amount of fees?

    edit: * I meant the entire site contents in a zip archive. It might be a valuable service for those living in countries where internet connections are still unstable or foreign sites are blocked by a national fire wall.

  26. Some somewhat simple things to try that should stop the script-kiddies.

    From http://forum.httrack.com/readmsg/11552/143/index.html?q=prevent+website+mirroring -
    "there is a way to protect sites in general against grabbers like httrack.
    if you detect a very high rate of requests from one IP over a specific period
    of time (maybe more than one request per second for more than 10 minutes),
    then
    a) you can block this IP permanently
    b) you can block this IP a specific amount of time (e.g. 24 hours)
    c) ask the user to insert a code embedded in a graphic (captcha). "

    Additional protection can be had with a beefed-up firewall like http://www.modsecurity.org

    and protect WP with a right-click disabler like PreventCopyBlogs plugin from http://www.techtipsmaster.com/wp-preventcopyblogs.html

    and use .htaccess to protect spiders.txt and your critical WP files.

  27. Hi,
    I'm no expert in these things, so I may be wrong. But as far as I know, most Internet-Users get a temporary IP by their providers. So, in case you ban or block a specific IP, you don't "get" the specific User, who caused the trouble, because next time he will be surfing with a different IP. And the blocked one goes to someone, who isn't responsible for the trouble done using the IP he got now, and won't see blender-nation nevertheless. Isn't it?

  28. @Barts: the problem is that hammering a website with, say, 60 requests per minute can result in severy load problems.

    @Everyone else: thanks for all the suggestions! I'm currently thinking about the best strategy..

  29. I saw somewhere an image with a link on it that kills website crawlers and downloaders, but can't seem to find a link to it.

    I remember that it's supposed to "kill" the crawler and stop it but don't remember how.

    Hope this helps a little.

  30. Maybe there could be a 'Download Blendernation' option?

    Every 48hours compile all the files on the website into a single zip, a static version so there is no PHP processing on the server side and stored in one location. And let people download that? So they are only making one connection instead of like, 60+.

    Just don't include things like videos etc, make those links to the actual source, but include the pictures, text etc.

    Might be an option?

  31. maybe simply some chinese think that as possibility to make money, by copying and then burning DVD...
    even all is freely available to all, but they may think others havent 'found' your websites ?
    i'm here on 'place' and know how chinese try all way money..

  32. Perhaps its just some dude who's paranoid that BN will close down the moment he turns his back? Well he damn nearly well caused it, din't he? If only temporarily ...

    Personally, I'm inclined to go with Burana's opinion. Video game and movie privacy is very prevalent in where I live, and it isn't a long stretch to assume that there are people who are willing to rip off and recompile tutorials in order to make a quick buck. Still, since most of the content featured here is actually hosted on other websites (like videos from Vimeo, or the image&text tutorials from their respective websites), I don't see why they bother downloading the webpages themselves.

    I mean, its sorta like downloading an entire forum thread just to obtain an avatar image that was hosted on Photobucket or something.

  33. Dear Bart

    simply adding the ip address to the list will not work, as someone pointed out next time they'll have a different ip.

    Solutions

    1) with the ip use geo-location (eg maxmind) this will give you a range of ip's for that location or isp provider. This may well help, it'd be like blocking a city effectively.

    2) google block a country, drastic i know.

    3) Education, take the site down for a day next time this happens, and leave a page up as to why it happened (eg this person)

    4) Most of these except 2 will require lots of ongoing work. Talk to your hosting company, as what Apache mods are available allowed, see if there is an ip time/volume limiter. As chances are the user will be on the same ip all the time this happens.

    5) Accept this, and move along.

    Sorry if they don't all meet with your approval, but its about as complete as it can be. I know some solutions you don't want to hear 2? But it is an option, that's all i present it as such.

    Cheers
    Joe.

Leave A Reply

To add a profile picture to your message, register your email address with Gravatar.com. To protect your email address, create an account on BlenderNation and log in when posting a message.

Advertisement

×