[penguicon-general] anyone out there know....??
Rick Scott
rick at shadowspar.dyndns.org
Mon Sep 8 12:26:42 EDT 2008
(Lady Sarah:)
> What he's saying is that he should be exempt from our rule of "no
> more than 250 results per set of criteria entered" rule because
> each new page is a new query to the database and therefor a new
> search and the data can NOT be scraped this way.
>
> Is he telling me the truth? or has there just not been a hacker
> clever enough to pull the data from their site yet?
Saying that the data can't be scraped out because it's paginated and
skipping to the next page requires javascript? I can't say for sure
without looking at it, but if it's like most such sites, I could work
around it in a day. Someone who knows what they are doing could
probably do it in an hour.
Most web-bots and other such automatic page-fetching tools don't
implement javascript, so a site that requires it to get results out
is more difficult to scrape. Usually the javascripty bits can be
worked around with a bit of cleverness. Alternatively, you can just
use a tool like Selenium RC which lets you write an automated script
that drives a real web browser.
I'm not saying that the 250-hit limit per search is a great solution
either, but it probably makes it more difficult to scrape out your
entire database than whatever javascript this guy has implemented.
Cheers,
Rick
--
key CF8F8A75 / print C5C1 F87D 5056 D2C0 D5CE D58F 970F 04D1 CF8F 8A75
Try not! Do, or do not. There is no "try".
:Yoda
More information about the penguicon-general
mailing list