This post documents installing Scrapy on a MacBook Pro (13-inch, 2017, Four Thunderbolt 3 Ports) running macOS High Sierra. The TL;DR setup steps would be:
sudo easy_install pip
sudo pip install virtualenv
- Mkdir a directory to be used with scrapy and activate virtualenv
source ENV/bin/activate
pip install scrapy
into this virtualenv- Get started with scrapy
scrapy startproject craigslist
by following a tutorial e.g. scraping Craigslist
The long, real-time documented version including mildly funny commentary can be found below.
I am on High Sierra, 10.13.5, nowadays and my laptop is so futuristic it has a touchy touch bar that no one has found a really good use for yet, and the buses are so fresh that no contemporary device can physically connect to it.
However, today I am trying to scrape some web with this amazing shiny piece of aluminium.
After having worked with lots of manual scraping techniques from the command line, like urrlib2 and Beautiful soup and whatnot, today I will give Scrapy a go.
The cute scrapy spatula icon gives me hope, that this “open source and collaborative framework for extracting the data you need from websites” really works “In a fast, simple, yet extensible way.” So here we go.
- This computer is pretty much naked when it comes to any useful command line tools so to install Scrapy I need pip first.
sudo easy_install pip
Bummer the fingerprint reader from the future doesn’t work on the command line and I actually need to type out my very safe and strong password. A couple of lines printed in my window. So far so good. pip install scrapy
A couple of lines dumped in my terminal suggest this is working out until I get some red warnings around pillow, nose and tornado. Dependencies, here we come…sudo pip install pillow
sudo pip install nose
sudo pip install tornado
Unclear if that just worked? – No, not really.Cannot uninstall 'six'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
OK, fail, back to start. I take the time and actually read a bit more. It is recommended to install scrapy in a virtual environment. Well here we go.
sudo pip install virtualenv
So far so good, let’s just ignore the warnings for now.- Looks good. I got my project folder
/Users/.../scrapy
so this is where I create my virtual environment ENV by virtualenv ENV - I activaye this environment with
source ENV/bin/activate
- Here we go. I am in my virtual envirnoment. Hurrah. So what happens when I install Scrapy in here?
pip install scrapy
My terminal joyously floods with words and vintagely appealing ASCII progress bars until we gleefully end up having installed all kinds of dubious packages by the names ofAutomat-0.7.0 PyDispatcher-2.0.5 PyHamcrest-1.9.0 Twisted-18.7.0 asn1crypto-0.24.0 attrs-18.1.0 cffi-1.11.5 constantly-15.1.0 cryptography-2.3 cssselect-1.0.3 enum34-1.1.6 functools32-3.2.3.post2 hyperlink-18.0.0 idna-2.7 incremental-17.5.0 ipaddress-1.0.22 lxml-4.2.3 parsel-1.5.0 pyOpenSSL-18.0.0 pyasn1-0.4.3 pyasn1-modules-0.2.2 pycparser-2.18 queuelib-1.5.0 scrapy-1.5.1 service-identity-17.0.0 six-1.11.0 w3lib-1.19.0 zope.interface-4.5.0
- I follow the Getting Started tutorial
scrapy startproject craigslist
and end up getting a message that confirms the successful creation of the Craigslist Scrapy project.