Learn python

Course: Using Python to Access Web Data

Charles Severance, University of Michigan

Week 4: Programs that Surf the Web (Chapter 12)

Quiz: Reading Web Data From Python

  1. Which of the following Python data structures is most similar to the value returned in this line of Python:

[python]x = urllib.urlopen(‘http://www.py4inf.com/code/romeo.txt’)[/python]

Answer: file handle
2. In this Python code, which line actually reads the data?

[python]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((‘www.py4inf.com’, 80))
mysock.send(‘GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n’)
while True:
data = mysock.recv(512)
if ( len(data) < 1 ) :
break
print data
mysock.close()
[/python]

Answer: mysock.recv()

  1. Which of the following regular expressions

[python]<p>Please click <a href="http://www.dr-chuck.com">here</a></p>[/python]

Answer: href=”(.+)”
4. In this Python code, which line is most like the open() call to read a file:

[python]import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((‘www.py4inf.com’, 80))
mysock.send(‘GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n’)
while True:
data = mysock.recv(512)
if ( len(data) < 1 ) :
break
print data
mysock.close()[/python]

Answer: mysock.connect()
5. Which HTTP header tells the browser the kind of document that is being returned?
Answer: Content-Type:

  1. What should you check before scraping a web site?
    Answer: That the web site allows scraping

  2. What is the purpose of the BeautifulSoup Python library?
    Answer: It repairs and parses HTML to make it easier for a program to understand

  3. What ends up in the “x” variable in the following code:

[python]
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
x = soup(‘a’)

[/python]

Answer: A list of all the anchor tags (<a..) in the HTML from the URL