Learn python

Course: Using Python to Access Web Data

Charles Severance, University of Michigan

Week 4: Programs that Surf the Web (Chapter 12)

Quiz: Reading Web Data From Python

  1. Which of the following Python data structures is most similar to the value returned in this line of Python:
x = urllib.urlopen('http://www.py4inf.com/code/romeo.txt')

Answer: file handle
2. In this Python code, which line actually reads the data?

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))
mysock.send('GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')
while True:
data = mysock.recv(512)
if ( len(data) < 1 ) :
break
print data
mysock.close()

Answer: mysock.recv()

  1. Which of the following regular expressions
<p>Please click <a href="http://www.dr-chuck.com">here</a></p>

Answer: href=”(.+)”
4. In this Python code, which line is most like the open() call to read a file:

import socket
 mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 mysock.connect(('www.py4inf.com', 80))
 mysock.send('GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')
 while True:
 data = mysock.recv(512)
 if ( len(data) < 1 ) :
 break
 print data
 mysock.close()

Answer: mysock.connect()
5. Which HTTP header tells the browser the kind of document that is being returned?
Answer: Content-Type:

  1. What should you check before scraping a web site?
    Answer: That the web site allows scraping

  2. What is the purpose of the BeautifulSoup Python library?
    Answer: It repairs and parses HTML to make it easier for a program to understand

  3. What ends up in the “x” variable in the following code:

html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
x = soup('a')

Answer: A list of all the anchor tags (<a..) in the HTML from the URL

A Place For IT System Administrators