flat assembler
Message board for the users of flat assembler.

Index > Heap > Fossil's antibot protection

Author
Thread Post new topic Reply to topic
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
revolution wrote:
JohnFound: Every link on that page points to the same page that says:
Access by spiders and robots is forbidden
You might want to fix that.


It is because of security reasons and your disabled JS. The fix is easy - you simply need to go to Login page and login as an "anonymous" user name, with the provided one-time password. Then you can browse all pages in the repository, create new tickets and append content in the wiki pages.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 16 Mar 2014, 12:01
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
JohnFound wrote:
It is because of security reasons ...
Who's security is it for? And how does that enhance security? Security of what? It appears to be lowering my security if I am expected to have JS enabled.

I don't understand the rationale behind this. Question If there is a real purpose behind it then I can accept the need for it. But using JS wouldn't stop me from spidering the site if I so desired. A quick script could do it easily.
Post 16 Mar 2014, 13:12
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
revolution: The software repository as a web site contains thousands links pointing to the all versions of all files and other pages, as diffs between any two versions of files. These pages are definitely not good to be indexed by the search engines. On the other hand, requiring the users to log for browsing files is not goos as well.

So, here is the question how to identify the spiders. Fossil uses JS to set the links on the page, some time after the page is loaded (0.5s). This way, for the spider, all links leads to the honeypot page.

If you have better idea how to protect the repository without using JS, share it. I can try to suggest it to the fossil mailing list. Also, if you have idea how to break this protection, some detailed explanation is desired.

This protection works only for the users that are not logged in, so it respects your preference to not use JS. You have a way to browse everything, anonymously and without affecting your security. Simply log-in.

If you don't want to log-in, simply don't browse the repository. It probably does not contain any valuable source code.
Post 16 Mar 2014, 13:46
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
JohnFound wrote:
These pages are definitely not good to be indexed by the search engines.
That is what robots.txt is used for. Search engines are not a problem here.
JohnFound wrote:
So, here is the question how to identify the spiders.
You can't. I can make my user agent be anything I want in a script. BTW: I won't be spidering your site. I do have respect for other people things.
JohnFound wrote:
If you have better idea how to protect the repository without using JS, share it
Many spiders available now DO execute JS so it is not even protecting against such a thing. But it seems that it is not really for security, but instead it is for limiting bandwidth used by simple spiders run by naive web users?

I would suggest to create a suitable robots.txt file and forget about the spiders. Your site will get spidered if someone wants to do it.
Post 16 Mar 2014, 14:06
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
Okay, I just realised an even easier method to spider and doesn't even require coding knowledge.

1. Log in and get a cookie.
2. Pass the cookie to the spider and let it run.

I think this fossil security mechanism is very poor and is only giving the users a false sense of security at best.
Post 16 Mar 2014, 14:16
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
This method works only for the special attack. For the regular spiders, it will not work so good. The protection is not against the hacker attack. It is against the wild spiders.

As I said, you are free to suggest better solution. Here, or on the fossil mailing list.
Post 16 Mar 2014, 14:48
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Is there any official documentation of what this protection was put in place for?

If it is meant to limit the rate at which pages are visited, then I think exponential backoff by IP address could be a good option, where the time to serve the response would rise when the IP visits again in less than 0.5 seconds after the last byte of the previous request was sent. (All this implemented server-side of course)
Post 16 Mar 2014, 16:02
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
Post 16 Mar 2014, 16:40
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Looks that the ultimate goal is to limit the request rate rather than making sure a human interacts with the site.

I think that something can be built around the exponential backoff idea on the server side. The metrics to add more delay would be the requests bursts lengths and how many request were served in the last minute or so (i.e. keep some history to make sure the bot won't bypass the protection by just requesting at a fixed rate just below the maximum requests per second). Also, if the bandwidth and CPU usage are easy to compute, those should be added to the metrics.

This is probably not effective against large botnets, but I believe it is better than the JS idea.

PS: Also, the number of open connections by IP should be limited by refusing more connections rather than delaying the response.
Post 16 Mar 2014, 19:20
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
hahahaha

Protecting against indexing robots is understandable, but if you were an actual target of an attack bot you wouldn't last using that kind of protection.

Anyway, hint hint
Post 16 Mar 2014, 23:08
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
In the general case, any protection techniques that rely upon the client side respecting the constraints is never the right long term solution. Only server side mechanisms can work for this.

Even something simple like each IP getting a quota of, say, 20 requests initially and then adding one request per minute. It would be easy to implement and rate limits a spider/bot without harming normal human surfing. The site random.org uses this quota method.
Post 17 Mar 2014, 12:40
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
revolution, you have one mistake (at least from my point of view).

The main goal is not to lower the server load. The goal is to stop the search engines from indexing millions of useless pages in the repository and to limit them only to the main pages of the repository - notice, that the menu links are always active and the bots can freely crawl them.

Even if the server limits the allowed requests, early of later, the pages we don't want, will be indexed, will flood the search engine index and will ruin the SEO of the repository.

The lower load of the server is only a side effect.
Post 17 Mar 2014, 15:36
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
robots.txt will stop the search engines indexing the "millions of useless pages". I mentioned that above.
Post 17 Mar 2014, 15:45
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
AFAIK, not all search engines respect robots.txt
Post 17 Mar 2014, 15:48
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
All important engines fully respect robots.txt AND run JS. All rogue engines do whatever they feel like AND run JS. IMO JS is not a proper solution here.
Post 17 Mar 2014, 15:57
View user's profile Send private message Visit poster's website Reply with quote
badc0de02



Joined: 25 Nov 2013
Posts: 216
Location: %x
badc0de02
Happy st patricks day!
Post 17 Mar 2014, 16:02
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
badc0de02 wrote:
Happy st patricks day!

What? Feeling lucky already?
Post 17 Mar 2014, 16:17
View user's profile Send private message Reply with quote
badc0de02



Joined: 25 Nov 2013
Posts: 216
Location: %x
badc0de02
Quote:
What? Feeling lucky already?

today was st patricks day and
Quote:
Happy st patricks day!

does not mean im lucky
now i dont understand what you mean do you feel lucky or iam feel lucky please ask clearly what you want
typedev.
Post 17 Mar 2014, 22:11
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.