[penguicon-general] anyone out there know....??
Lady Sarah
ladysarahmarie at gmail.com
Mon Sep 8 13:43:04 EDT 2008
Ooo.... good info! Thanks! :)
*~*~*~*~*~*~*~*
Lady Sarah, que_sara_sara
"It only takes 20 years for a liberal to become a conservative without
changing a single idea." ~~Robert Anton Wilson
Music Programming, Penguicon 7.0
The Chocolate Goddess, coming soon to a con near you
Warrior Princess of the Clan of the Lonely Goatherd
IWG Wench #539 MCL, Local 69
Scarlet B. Harlot, Figure Head for the Scarlet Harlot -- Privateer #36
W3NCH, the HAM Radio Wench
"...because you can't spell Wench with an 8!"
On Mon, Sep 8, 2008 at 12:26 PM, Rick Scott <rick at shadowspar.dyndns.org>wrote:
> (Lady Sarah:)
> > What he's saying is that he should be exempt from our rule of "no
> > more than 250 results per set of criteria entered" rule because
> > each new page is a new query to the database and therefor a new
> > search and the data can NOT be scraped this way.
> >
> > Is he telling me the truth? or has there just not been a hacker
> > clever enough to pull the data from their site yet?
>
> Saying that the data can't be scraped out because it's paginated and
> skipping to the next page requires javascript? I can't say for sure
> without looking at it, but if it's like most such sites, I could work
> around it in a day. Someone who knows what they are doing could
> probably do it in an hour.
>
> Most web-bots and other such automatic page-fetching tools don't
> implement javascript, so a site that requires it to get results out
> is more difficult to scrape. Usually the javascripty bits can be
> worked around with a bit of cleverness. Alternatively, you can just
> use a tool like Selenium RC which lets you write an automated script
> that drives a real web browser.
>
> I'm not saying that the 250-hit limit per search is a great solution
> either, but it probably makes it more difficult to scrape out your
> entire database than whatever javascript this guy has implemented.
>
>
>
>
> Cheers,
> Rick
> --
> key CF8F8A75 / print C5C1 F87D 5056 D2C0 D5CE D58F 970F 04D1 CF8F 8A75
> Try not! Do, or do not. There is no "try".
> :Yoda
> _______________________________________________
> penguicon-general mailing list
> penguicon-general at penguicon.org
> http://penguicon.org/mailman/listinfo/penguicon-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://penguicon.org/pipermail/penguicon-general/attachments/20080908/db458f34/attachment.htm
More information about the penguicon-general
mailing list