Python for the Web

by Rich Jones on Oct 18, 2011.
_________________

The theme music for this blog post is: Air - Playground Love.

Python is the best language in the world for interacting with the web, and I'm going to show you why.

This article will give an extremely high level overview of how to use python for the web. There are many ways you can interact with the web using python, and this post will cover all of them. This includes python web scraping, interacting with APIs (Application Programming Interfaces) and running your own python web site using python server software. There are many ways to do all these things in python, but I'm going to show you how to do it the right way using the most modern techniques.

Interacting with Websites and APIs Using Python

The single best package for interacting with the web using Python is 'Requests' by Kenneth Reitz. I really cannot stress what a good library this is. I use it every single day of my life and I absolutely love it. It is the reason that python is the best language for the web.

First, you'll need to install it. The best way to do this is using 'pip', the python package manager. If you don't have pip, read this article and follow the instructions, or, if you are on Windows, look at this post on Stack Overflow.

Once you have pip installed, run:

pip install requests

And now you have Requests installed! You may need to run this as 'sudo' if you're on Linux or OSX. Now let's look at a few examples.

The two methods you'll need the most are GET and POST. GET does exactly what it says, it gets a web page. POST is similar, only it sends information to a web page.

First let's take a look at GET. Let's say we want to grab all of Gun.io's front page.

That's it! In only three lines of python, you can grab a whole webpage and print it to the screen. Awesome!

Now let's look at a slightly more complicated example. Let's try a case where we have to use a username and password.

Here, YOURUSERNAME and YOURPASSWORD will be send as login credentials to the server.

Now, let's try a POST request to send some data TO the server. This is for the case where there is a form, and you wan to use python to fill in the values.

This code will send the values "RoboCop" and "The best movie ever." for the fields "title" and "description", respectively. You can use the 'auth' parameter from the previous example if you are posting to a password-protected form.

Processing JSON in Python

Many times you interact with an API in python, you will be given response in a form called "JSON", or JavaScript Object Notation. JSON is almost identical to the python dictionary format. The best way to interact with JSON in Python is by using the 'simplejson' python library, which you can find documentation for here. Again, use pip to install it like so:

pip install simplejson

Let's take a look at an example.

This code will get a list of recent events from GitHub, in JSON format, and parse that JSON using python. As the resulting object (in this example, 'j') is a python dictionary, we can loop over it an print the information it contains. So, this code will then print out the name of each repository for each item in the response.

Scraping the Web Using Python

Unfortunately, we can't always interact with the web in a nice format like JSON. Most of the time, websites only return HTML, the kind that your browser turns into the nice-looking webpages you see on your screen. In this case, we have to do what's called 'scraping', turning that ugly HTML and turning it into usable data for our python program.

The best way to do this is by using a python package called LXML. If I had to describe LXML, I would call it shitty and awesome. LXML is extremely fast and very capable, but it also has a confusing interface and some difficult to read docs. It is certainly the best tool for the job, but it is not without fault.

Let's say there is a webpage that has a value you want to get into your python program. You know from looking at the source of the webpage that the value you want is inside an element which has a specific "id" attribute. Let's use LXML to get that value.

First, install it using pip:

pip install lxml

Okay, now let's try it.

This code uses Requests (from before) to get our webpage. Then, it uses the HTML parser in LXML to get the 'tree' of parsed HTML elements. The next line calls the "Get Element By Id" function to return a list of all elements which have the id value of "frontsubtext". Then, we iterate over the items in that list, and print the text content of each element. Tada!

Python Web Sites

The other side of using python on the web is using python to make web sites. The best way to do that is to use a web 'framework' called Django.

Now, Django can be tricky. Django isn't the fastest or the easiest way to get your python code executing on the web, but Django has the largest community and the most documentation available, so it's the best thing to learn in the long run. This is going to be a very, very brief introduction to Django - I'm just going to teach you how to get your python code to return a result to an HTML web page.

So, let's get started!

First things first, install Django using pip. This should be easy by now!

sudo pip install django

Okay, now you've got django installed on your system. Let's make a new Django project. Let's say we want a website which returns an uppercase version of a string we pass it, and we're going to call it UppercaseMaker. So, call this to make a new Django project:

django-admin.py startproject UppercaseMaker

Then, go into the directory it made:

cd UppercaseMaker

You'll see some files in there, like settings.py and urls.py. We'll get back to those in a second. Now that you're in the folder, you'll need to make a new 'application.' In Django, applications are where the actual work is done. Let's make one called 'upper'.

django-admin.py startapp upper

For this application to be activated in our Django project, we'll need to edit settings.py and add it to the list of INSTALLED_APPS. So, you need to change the settings.py (at around line 111) so that it looks like this:

While you're here, you should also change the TEMPLATE_DIRS variable so that it looks like this:

This will make it so when Django needs to render templates, it will look in the 'templates' directory of your project's folder.

Now in your project directory, you'll see that there is a directory called 'upper'. Let's go in and take a look. You'll see that's a file called 'views.py' - that's where the magic happens. Let's put some code in it.

So this is defining a function called 'home' which takes two parameters, 'request', which contains information about the request which was sent to the server (information about the user, their browser, etc), and a string called 'input', which defaults to "No input supplied." The next line is pretty obvious, it takes the input string, puts it in uppercase and makes a variable called output.

Then, we pass that in a dictionary to a Django function called "render_to_response", which takes a template file and a dictionary of variables and makes it into the nice HTML you see as the the final website. We haven't looked at the template file yet, so let's do that now.

Go back to the project directory and make a new folder called 'templates' and inside it, make a file called 'home.html', and put this in it:

This is an extremely simple HTML page which takes whatever value we put in the 'output' variable from our views.py and puts it to the screen wherever we wrap it with double curly braces.

Only one thing left now! Let's take a look at urls.py in the project folder. Put this in it:

This says, for the empty path (the blank space in between the '^' and the '(' ), call the function "UppercaseMaker.upper.views.home" and pass it the trailing value and call it "input". So, when somebody visits "http://www.ourwebsite.com/test" , the value "test" is sent to our 'home' function from before, made uppercase, and printed to the screen.

Now, you can try this for yourself by running this command from your project's directory.

python manage.py runserver

Then, in your web broswer, go to the URL 'http://localhost:8000/test', and you should see the output, "Your output is: TEST." on the screen.

Hooray! You're now executing your own python code as a website. Pretty cool!

I've made this example as a 'git' repository, so if you want to have your own copy of this example to play with, execute

git clone https://Gunio@github.com/Gunio/UppercaseMaker.git 

Conclusion

So there you have it, a very high level introduction to the major ways you'll be interacting with the web using python. This guide is by no means meant to be exhaustive, but hopefully you are now on the right path.

The key take-aways are: to make HTTP requests, use the 'Requests' library. To parse JSON, use 'simplejson'. To parse HTML, use 'LXML'. And to serve your own python websites, use 'Django'. Other guides will tell you to use things like 'urllib2' and 'BeautifulSoup' - don't waste your time!. Those packages and the tutorials which recommend them are now outdated - Requests and LXML are the best tools for the job. Django is still the best python web framework, but make sure that any tutorials you are reading are compatible with the version of Django you are using, as the project can change quite quickly and there are lots of outdated Django tutorials on the web.

Did this guide help you? Would you like to know more? Leave a note in the comments below!

Learn and Earn!

Sign up for great tutorials, guides, rants, raves and opportunities to earn more money!