Web Scraping with AppleScript and JavaScript

There are many tools one can use to scrape data from the web. A set of tools I came upon to do this task is the use of AppleScript combined with JavaScript. AppleScript is used to control applications on macOS, namely Safari, and Numbers. JavaScript is used to interact with the webpage after it’s been loaded into Safari. All data extracted is then written to a table in a Numbers spreadsheet.

The benefit of this combination of tools is that it loads all the html and javascript in Safari first, then you start interacting with it to extract information. Compared with using BeautifulSoup, this method is slower, but it lets you scrape web pages that only have usable data after the pages’ javascript runs. BeautifulSoup can only scrape data that is produced via html, but not javascript.

The process goes something like this:

  1. Use AppleScript to load the webpage in Safari.
  2. With AppleScript’s ‘do javascript’ command, extract the first row of data you need and put them into AppleScript variables.
  3. Put the above step into a loop to extract all the rows you need. This can span multiple pages in Safari.
  4. Once you have all your data in AppleScript lists (arrays), write them, line by line to a Numbers spreadsheet using AppleScript.

Alternately, if you don’t want to keep track of AppleScript lists or arrays, you can have your script write to a Number spreadsheet after each row of data you read from the web page. This way, you only need one variable for each field that gets rewritten for each row.

For example, the following script loads a page from Coursera and extracts all the business degrees they offer:

https://github.com/chinwayland/applescripts/blob/master/Web%20Scraping%20-%20Coursera%20-%20Pull%20Degree%20Info.applescript

Here’s a video of the script in action:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store