Books Home | About Us | Index | Next Record | Browse

The online computer book shop for UK & Europe                                   

Tel: 0121 706 6000 

Static Book Details Page - Computer Manuals Website

 Spidering Hacks
  

  Spidering Hacks by Kevin Hemenway ; Tara Calishain

  • Published by: O'REILLY & ASSOCIATES
  • Author: Kevin Hemenway ; Tara Calishain
  • Page Count: 400
  • Group: ADVANCED
  • ISBN: 0596005776 / 9780596005771
  • Published: Nov 2003

Our Price: 12.43
Discount: 29%
RRP: 17.50 

For Latest Pricing and Availability Click Here
 

The online computer book shop for UK & Europe

Book Information and Description:

Spidering Hacks
The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren't enough. If you've ever wanted your data in a different form than it's presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you.

Spidering Hacks takes you to the next level in Internet data retrieval--beyond search engines--by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented--you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you.

Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to:

Aggregate and associate data from disparate locations, then store and manipulate the data as you like

Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites

Integrate third-party data into your own applications or web sites

Make your own site easier to scrape and more usable to others

Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day

contents

Chapter 1. Walking Softly
       1. A Crash Course in Spidering and Scraping
       2. Best Practices for You and Your Spider
       3. Anatomy of an HTML Page
       4. Registering Your Spider
       5. Preempting Discovery
       6. Keeping Your Spider Out of Sticky Situations
       7. Finding the Patterns of Identifiers

Chapter 2. Assembling a Toolbox
    Perl Modules 22
    Resources You May Find Helpful 23
       8. Installing Perl Modules
       9. Simply Fetching with LWP::Simple
       10. More Involved Requests with LWP::UserAgent
       11. Adding HTTP Headers to Your Request
       12. Posting Form Data with LWP
       13. Authentication, Cookies, and Proxies
       14. Handling Relative and Absolute URLs
       15. Secured Access and Browser Attributes
       16. Respecting Your Scrapee's Bandwidth
       17. Respecting robots.txt
       18. Adding Progress Bars to Your Scripts
       19. Scraping with HTML::TreeBuilder
       20. Parsing with HTML::TokeParser
       21. WWW::Mechanize 101
       22. Scraping with WWW::Mechanize
       23. In Praise of Regular Expressions
       24. Painless RSS with Template::Extract
       25. A Quick Introduction to XPath
       26. Downloading with curl and wget
       27. More Advanced wget Techniques
       28. Using Pipes to Chain Commands
       29. Running Multiple Utilities at Once
       30. Utilizing the Web Scraping Proxy
       31. Being Warned When Things Go Wrong
       32. Being Adaptive to Site Redesigns

Chapter 3. Collecting Media Files
       33. Detective Case Study: Newgrounds
       34. Detective Case Study: iFilm
       35. Downloading Movies from the Library of Congress
       36. Downloading Images from Webshots
       37. Downloading Comics with dailystrips
       38. Archiving Your Favorite Webcams
       39. News Wallpaper for Your Site
       40. Saving Only POP3 Email Attachments
       41. Downloading MP3s from a Playlist
       42. Downloading from Usenet with nget

Chapter 4. Gleaning Data from Databases
       43. Archiving Yahoo! Groups Messages with yahoo2mbox
       44. Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
       45. Gleaning Buzz from Yahoo!
       46. Spidering the Yahoo! Catalog
       47. Tracking Additions to Yahoo!
       48. Scattersearch with Yahoo! and Google
       49. Yahoo! Directory Mindshare in Google
       50. Weblog-Free Google Results
       51. Spidering, Google, and Multiple Domains
       52. Scraping Amazon.com Product Reviews
       53. Receive an Email Alert for Newly Added Amazon.com Reviews
       54. Scraping Amazon.com Customer Advice
       55. Publishing Amazon.com Associates Statistics
       56. Sorting Amazon.com Recommendations by Rating
       57. Related Amazon.com Products with Alexa
       58. Scraping Alexa's Competitive Data with Java
       59. Finding Album Information with FreeDB and Amazon.com
       60. Expanding Your Musical Tastes
       61. Saving Daily Horoscopes to Your iPod
       62. Graphing Data with RRDTOOL
       63. Stocking Up on Financial Quotes
       64. Super Author Searching
       65. Mapping O'Reilly Best Sellers to Library Popularity
       66. Using All Consuming to Get Book Lists
       67. Tracking Packages with FedEx
       68. Checking Blogs for New Comments
       69. Aggregating RSS and Posting Changes
       70. Using the Link Cosmos of Technorati
       71. Finding Related RSS Feeds
       72. Automatically Finding Blogs of Interest
       73. Scraping TV Listings
       74. What's Your Visitor's Weather Like?
       75. Trendspotting with Geotargeting
       76. Getting the Best Travel Route by Train
       77. Geographic Distance and Back Again
       78. Super Word Lookup
       79. Word Associations with Lexical Freenet
       80. Reformatting Bugtraq Reports
       81. Keeping Tabs on the Web via Email
       82. Publish IE's Favorites to Your Web Site
       83. Spidering GameStop.com Game Prices
       84. Bargain Hunting with PHP
       85. Aggregating Multiple Search Engine Results
       86. Robot Karaoke
       87. Searching the Better Business Bureau
       88. Searching for Health Inspections
       89. Filtering for the Naughties

Chapter 5. Maintaining Your Collections
       90. Using cron to Automate Tasks
       91. Scheduling Tasks Without cron
       92. Mirroring Web Sites with wget and rsync
       93. Accumulating Search Results Over Time

Chapter 6. Giving Back to the World
       94. Using XML::RSS to Repurpose Data
       95. Placing RSS Headlines on Your Site
       96. Making Your Resources Scrapable with Regular Expressions
       97. Making Your Resources Scrapable with a REST Interface
       98. Making Your Resources Scrapable with XML-RPC
       99. Creating an IM Interface
       100. Going Beyond the Book

Index

 

Book store with some thing for everyone