Scrape A Website: June 2015

Thursday, 25 June 2015

Data Scraping - About Hand Scraped Flooring

Hand scraped hardwood flooring is one of the best floors that you can install in your house.

Advantages of Hand Scraped Hardwood Flooring

The product comes with a number of advantages which include:

Antique and modern technology: The floor professionally brings out the best elements of both antique and modern technology. The modern elements are in the quality of the product.

Unique patterns: Who doesn't want to be unique? These floors allow you to create your unique design. If you are going to use a machine, all you need to do is to set the machine such that it creates the pattern that you want. If the floor will be scraped by a craftsman, you should ask the craftsman to craft your desired pattern.

Character: The different depths in the floor provide you with character and color that you can't find in other types of floors. As the sun changes its angle during the day, the nooks and valleys on the board lit differently thus providing your board with an endless rich appearance.

Durability: Experts have been able to show that hand-scraped hardwood retains its look for a long time. If your kid or pet hits the floor, the dent just blends with the rest of the character making it hard for people to tell that there is a dent.

Making the floors shine again

Although, the scraped floors are designed to look worn and aged, they are made from modern wood which needs to be taken care of in order to retain its original look.

To make the floors shine again you need to remove all the dust and dirt that might be causing the wood to look dull.

After doing this you should mix 1 gallon of warm water with ½ teaspoon of dishwashing detergent and use it to clean the surface of the floor. The aim of doing this is to remove any stains that might be on the floor. When you complete doing this you should dampen the piece of cloth with club soda and then use another piece of cloth to buff the wood until it shines.

Conclusion

This is what you need to know about hand scraped hardwood flooring. When cleaning the floors you should avoid using oil based soaps as they dull the surface making your efforts worthless.

If the above method of shining the floor doesn't work, you should mix one part white vinegar and one part of cooking oil and use it to clean the floor.

Source: http://ezinearticles.com/?About-Hand-Scraped-Flooring&id=8990255

Saturday, 20 June 2015

Making data on the web useful: scraping

Introduction

Many times data is not easily accessible – although it does exist. As much as we wish everything was available in CSV or the format of our choice – most data is published in different forms on the web. What if you want to use the data to combine it with other datasets and explore it independently?

Scraping to the rescue!

Scraping describes the method to extract data hidden in documents – such as Web Pages and PDFs and make it useable for further processing. It is among the most useful skills if you set out to investigate data – and most of the time it’s not especially challenging. For the most simple ways of scraping you don’t even need to know how to write code.

This example relies heavily on Google Chrome for the first part. Some things work well with other browsers, however we will be using one specific browser extension only available on Chrome. If you can’t install Chrome, don’t worry the principles remain similar.

Code-free Scraping in 5 minutes using Google Spreadsheets & Google Chrome

Knowing the structure of a website is the first step towards extracting and using the data. Let’s get our data into a spreadsheet – so we can use it further. An easy way to do this is provided by a special formula in Google Spreadsheets.

Save yourselves hours of time in copy-paste agony with the ImportHTML command in Google Spreadsheets. It really is magic!

Recipes

In order to complete the next challenge, take a look in the Handbook at one of the following recipes:

    Extracting data from HTML tables.

    Scraping using the Scraper Extension for Chrome

Both methods are useful for:

    Extracting individual lists or tables from single webpages

The latter can do slightly more complex tasks, such as extracting nested information. Take a look at the recipe for more details.

Neither will work for:

    Extracting data spread across multiple webpages

Challenge

Task: Find a website with a table and scrape the information from it. Share your result on datahub.io (make sure to tag your dataset with schoolofdata.org)

Tip

Once you’ve got your table into the spreadsheet, you may want to move it around, or put it in another sheet. Right click the top left cell and select “paste special” – “paste values only”.

Scraping more than one webpage: Scraperwiki

Note: Before proceeding into full scraping mode, it’s helpful to understand the flesh and bones of what makes up a webpage. Read the Introduction to HTML recipe in the handbook.

Until now we’ve only scraped data from a single webpage. What if there are more? Or you want to scrape complex databases? You’ll need to learn how to program – at least a bit.

It’s beyond the scope of this course to teach how to scrape, our aim here is to help you understand whether it is worth investing your time to learn, and to point you at some useful resources to help you on your way!

Structure of a scraper

Scrapers are comprised of three core parts:

1.    A queue of pages to scrape
2.    An area for structured data to be stored, such as a database
3.    A downloader and parser that adds URLs to the queue and/or structured information to the database.

Fortunately for you there is a good website for programming scrapers: ScraperWiki.com

ScraperWiki has two main functions: You can write scrapers – which are optionally run regularly and the data is available to everyone visiting – or you can request them to write scrapers for you. The latter costs some money – however it helps to contact the Scraperwiki community (Google Group) someone might get excited about your project and help you!.

If you are interested in writing scrapers with Scraperwiki, check out this sample scraper – scraping some data about Parliament. Click View source to see the details. Also check out the Scraperwiki documentation: https://scraperwiki.com/docs/python/

When should I make the investment to learn how to scrape?

A few reasons (non-exhaustive list!):

1.    If you regularly have to extract data where there are numerous tables in one page.

2.    If your information is spread across numerous pages.

3.    If you want to run the scraper regularly (e.g. if information is released every week or month).

4.    If you want things like email alerts if information on a particular webpage changes.

…And you don’t want to pay someone else to do it for you!

Summary:

In this course we’ve covered Web scraping and how to extract data from websites. The main function of scraping is to convert data that is semi-structured into structured data and make it easily useable for further processing. While this is a relatively simple task with a bit of programming – for single webpages it is also feasible without any programming at all. We’ve introduced =importHTML and the Scraper extension for your scraping needs.

Further Reading

1.    Scraping for Journalism: A Guide for Collecting Data: ProPublica Guides

2.    Scraping for Journalists (ebook): Paul Bradshaw

3.    Scrape the Web: Strategies for programming websites that don’t expect it : Talk from PyCon

4.    An Introduction to Compassionate Screen Scraping: Will Larson

Any questions? Got stuck? Ask School of Data!

ScraperWiki has two main functions: You can write scrapers – which are optionally run regularly and the data is available to everyone visiting – or you can request them to write scrapers for you. The latter costs some money – however it helps to contact the Scraperwiki community (Google Group) someone might get excited about your project and help you!.

If you are interested in writing scrapers with Scraperwiki, check out this sample scraper – scraping some data about Parliament. Click View source to see the details. Also check out the Scraperwiki documentation: https://scraperwiki.com/docs/python/

When should I make the investment to learn how to scrape?

A few reasons (non-exhaustive list!):

1.    If you regularly have to extract data where there are numerous tables in one page.

2.    If your information is spread across numerous pages.

3.    If you want to run the scraper regularly (e.g. if information is released every week or month).

4.    If you want things like email alerts if information on a particular webpage changes.

…And you don’t want to pay someone else to do it for you!

Summary:

In this course we’ve covered Web scraping and how to extract data from websites. The main function of scraping is to convert data that is semi-structured into structured data and make it easily useable for further processing. While this is a relatively simple task with a bit of programming – for single webpages it is also feasible without any programming at all. We’ve introduced =importHTML and the Scraper extension for your scraping needs.

Source: http://schoolofdata.org/handbook/courses/scraping/

Monday, 8 June 2015

Scraping Services - Assuring Scraping Success with Proxy Data Scraping

Have you ever heard of "Data Scraping?" Data Scraping is the process of collecting useful data that has been placed in the public domain of the internet (private areas too if conditions are met) and storing it in databases or spreadsheets for later use in various applications. Data Scraping technology is not new and many a successful businessman has made his fortune by taking advantage of data scraping technology.

Sometimes website owners may not derive much pleasure from automated harvesting of their data. Webmasters have learned to disallow web scrapers access to their websites by using tools or methods that block certain ip addresses from retrieving website content. Data scrapers are left with the choice to either target a different website, or to move the harvesting script from computer to computer using a different IP address each time and extract as much data as possible until all of the scraper's computers are eventually blocked.

Thankfully there is a modern solution to this problem. Proxy Data Scraping technology solves the problem by using proxy IP addresses. Every time your data scraping program executes an extraction from a website, the website thinks it is coming from a different IP address. To the website owner, proxy data scraping simply looks like a short period of increased traffic from all around the world. They have very limited and tedious ways of blocking such a script but more importantly -- most of the time, they simply won't know they are being scraped.

You may now be asking yourself, "Where can I get Proxy Data Scraping Technology for my project?" The "do-it-yourself" solution is, rather unfortunately, not simple at all. Setting up a proxy data scraping network takes a lot of time and requires that you either own a bunch of IP addresses and suitable servers to be used as proxies, not to mention the IT guru you need to get everything configured properly. You could consider renting proxy servers from select hosting providers, but that option tends to be quite pricey but arguably better than the alternative: dangerous and unreliable (but free) public proxy servers.

There are literally thousands of free proxy servers located around the globe that are simple enough to use. The trick however is finding them. Many sites list hundreds of servers, but locating one that is working, open, and supports the type of protocols you need can be a lesson in persistence, trial, and error. However if you do succeed in discovering a pool of working public proxies, there are still inherent dangers of using them. First off, you don't know who the server belongs to or what activities are going on elsewhere on the server. Sending sensitive requests or data through a public proxy is a bad idea. It is fairly easy for a proxy server to capture any information you send through it or that it sends back to you. If you choose the public proxy method, make sure you never send any transaction through that might compromise you or anyone else in case disreputable people are made aware of the data.

A less risky scenario for proxy data scraping is to rent a rotating proxy connection that cycles through a large number of private IP addresses. There are several of these companies available that claim to delete all web traffic logs which allows you to anonymously harvest the web with minimal threat of reprisal. Companies such as offer large scale anonymous proxy solutions, but often carry a fairly hefty setup fee to get you going.

The other advantage is that companies who own such networks can often help you design and implementation of a custom proxy data scraping program instead of trying to work with a generic scraping bot. After performing a simple Google search, I quickly found one company (www.ScrapeGoat.com) that provides anonymous proxy server access for data scraping purposes. Or, according to their website, if you want to make your life even easier, ScrapeGoat can extract the data for you and deliver it in a variety of different formats often before you could even finish configuring your off the shelf data scraping program.

Whichever path you choose for your proxy data scraping needs, don't let a few simple tricks thwart you from accessing all the wonderful information stored on the world wide web!

Source: http://ezinearticles.com/?Assuring-Scraping-Success-with-Proxy-Data-Scraping&id=248993

Scrape A Website