Thursday, November 12, 2015

Files - Python

Files - Python

Open a file:
open(): return a handle to operate the file
syntax:
handle = open(filename, mode)
fhandle = open('mbox.txt', 'r')
handle is not the actual data from the file, it is a "connection".


The newline character:
\n
it is one character.

Can treat a file handle as a sequence: (a sequence is an ordered set)
xhandle = open('mbox.txt')
for cheese in xhandle:
    print cheese

Read the Whole file (into a single string inlucding the newlines):
inp = xhandle.read()

Searching through the file:
for and if
line.startwith('xxx')
line.rstrip() #strip the white space(s) at the right of the line

Skipping with continue:
use continue in for and if

'xxx' in line
not 'xxx' in line

Prompt for the file name:
fname = raw_input('Enter the file name: ')

Try to open a file:
fname = raw_input...
try:
    fhand = open(fname)
except:
    print "Cannot open the file: ", fname
    exit()

Python Data Structures

<String>
### String Processing

# String literals
s1 = "Rixner's funny"
s2 = 'Warren wears nice ties!'
s3 = " t-shirts!"
#print s1, s2
#print s3

# Combining strings
a = ' and '
s4 = "Warren" + a + "Rixner" + ' are nuts!'
print s4

# Characters and slices
print s1[3]
print len(s1)
print s1[0:6] + s2[6:]
print s2[:13] + s1[9:] + s3

# Converting strings
s5 = str(375)
print s5[1:]
i1 = int(s5[1:])
print i1 + 38

Another example:
# Handle single quantity
def convert_units(val, name):
    result = str(val) + " " + name
    if val > 1:
        result = result + "s"
    return result
        
# convert xx.yy to xx dollars and yy cents
def convert(val):
    # Split into dollars and cents
    dollars = int(val)
    cents = int(round(100 * (val - dollars)))

    # Convert to strings
    dollars_string = convert_units(dollars, "dollar")
    cents_string = convert_units(cents, "cent")

    # return composite string
    if dollars == 0 and cents == 0:
        return "Broke!"
    elif dollars == 0:
        return cents_string
    elif cents == 0:
        return dollars_string
    else:
        return dollars_string + " and " + cents_string
    
    
# Tests
print convert(11.23)
print convert(11.20) 
print convert(1.12)
print convert(12.01)
print convert(1.01)
print convert(0.01)
print convert(1.00)
print convert(0)

dir function: return the available built-in functions of a type.

<Sets>

Sets: Keep track of a collection of objects.

List - ordered sequence
Dictionary - Key-Value Mapping
Sets - unordered collection of data with no duplicates

list:
[1, 2, 2, 3, 1]

set:
set([1, 2, 3])

set1.add()
set2.remove()
set3.difference_update(set2)

<List>

List
a = [1, 2, 3]
b = a  #b point to where a is pointing
c = list(a) #c point to a new list copied from a
list is mutable

list can be empty
list can contain different of types

list is a ordered sequence
lis1.sort()

Tuple
a = (1, 2, 3)

tuple is immutable (a[1] = 4, throws an type error)

Methods:
lst = [1, 82, -6, 4,  3, 8]

len(lst) give the number of elements in list
sum(lst)
max(lst)
min(lst)
avg = sum(lst)/len(lst)

range(n) returns a list [0, ..., n-1] //usually used to construct a loop

82 in list => T/F
if 4 in list:
    print "4 is there"

lst.index(8) => 5

lst.append(632) => [1, 82, -6, 4, 3, 8, 632]

lst.pop() => [1, 82, -6, 4, 3, 8]

lst.pop(4) => [1, 82, -6, 4, 8]

lst.remove(82) => [1, -6, 4, 8]

list1 + list2 to concatenate two lists.

use ":" to slice lists
list1[:]
list1[1:3]

split() split a string into a word list.
or split(";")

<Dictionary>

Dictionary
compare to List:
List - a linear collection of values that stays in order / use index (position) to lookup element
Dictionary - A bag (unordered) of values, each with its own label / use key (label) to lookup values

Properties or Map or HashMap in Java
Associate Arrays Perl/PHP
Property Bag C#/.net

Mapping
    Key -> Values

d = {1:2, 3:4}
d[1] -> 2

d = dict()
d = {} # empty dictionary
d = {"abc":1, "cd":2}
d["abc"] = 1
d["abc"] = d["abc"]+1
d = {"abc":2, "cd":2}

if reference a key which is not in the dict, trace error.
to check, use:
"key" in dict1 (True or False)

Methods:
get
dict1.get(key, default): return the value for key, if key does not exist, then return the default value.

list(dict1) list all the keys
dict1.keys() list all the keys
dict1.values() list all the values
dict1.items() list of (key, value) tuples.

Two iteration variables:
for aaa.bbb in dict1.items():

Typical application:
1) Most common names (many counters)
use name as key, go through all the names, and for each "name", do dict["name"]+1

2) most common word and the number of appearances
same as above
bigcount = None
bignumber = None
for word.count in dict1.items():
    if bigcount is None or count > bigcount:
        bigword = word
        bigcount = count

<Tuples>
Tuples are like list. Use index to lookup, and ordered.
x = ('Glenn', 'Jen', 'Steve')
x[2] -> Steve
max(x)
for i in x

Tuple is immutable. So cannot do: sort(), append(), reverse()
use dir(tuple1) to check what methods are available.
count() and index()

Tuples are more efficient. (faster since do have to save space for modification)

Can put tuple on the LHS of assignment:
(a, b) = ('jack', 'Annie')
(x, y) = (1, 2)
a, b = ('jack', 'Annie')
a -> 'jack'

Dictionary items() return list of tuples

for (k, v) in dict1.items():

tups = dict1.items()

tuples are comparable (compare one by one)
(0, 1, 20000) < (0, 2, 3) -> True
so use dict1.items() and sort, we can sort by keys (since only look at the first one)
or use t = sorted(dict1.items())
if want to sort by value,
tmp = [] (or list())
then for k, v in dict1.items():
            tmp.append((v,k))
        tmp.sort(reverse=True)

applications:
top 10 common used words
print sorted([(v,k) for k, v in dict1.items()] )

Tuesday, November 10, 2015

Web - Python

Python - Web





















Python built-in library for TCP sockets
import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))

Application:


Write a web browser:
A http request in python:
import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))

mysock.send('GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')

while True:
    data = mysock.recv(512)
    if ( len(data) < 1 ) :
        break
    print data
mysock.close()

urllib:
import urllib

fhand = urllib.urlopen('http://www.py4inf.com/code/romeo.txt')

for line in fhand:
    print line.strip()

<HTML>
Use beautifulsoup to parse HTML.
Download beautifulsoup.py and put it with your python code.
http://www.crummy.com/software/BeautifulSoup/

import urllib
from BeautifulSoup import *

url = raw_input("Enter - ")

html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

# Retrieve a list of anchor tags
# Each tag is like a dictionary of HTML Attributes

tags = soup('a')

for tag in tags:
    print tag.get('href', None)



<XML>











































use xml.etree.elementtree
import urllib
import xml.etree.ElementTree as ET

serviceurl = 'http://maps.googleapis.com/maps/api/geocode/xml?'

while True:
    address = raw_input('Enter location: ')
    if len(address) < 1 : break

    url = serviceurl + urllib.urlencode({'sensor':'false', 'address': address})
    print 'Retrieving', url
    uh = urllib.urlopen(url)
    data = uh.read()
    print 'Retrieved',len(data),'characters'
    print data
    tree = ET.fromstring(data)


    results = tree.findall('result')
    lat = results[0].find('geometry').find('location').find('lat').text
    lng = results[0].find('geometry').find('location').find('lng').text
    location = results[0].find('formatted_address').text

    print 'lat',lat,'lng',lng
    print location

<JSON>
import json

info = json.loads(data)
info is a dictionary in Python.
So can use dictionary method to access the value associated with some key.

json list -> [{"key1":"v1"}, {"key2":"v2"}]
after loads, it is a list in Python

JSON vs. XML
JSON is easier to use, but XML is more expressive.
js = json.loads(data)
json.dumps(js. indent=4) # good formatting

<WEB Service Technology>
REST - Representational State Transfer - Remote resources which we create, read, update and delete remotely.



import json
import urllib

url = raw_input("Enter json location: ")

print "Retrieving ", url

urljson = urllib.urlopen(url)
data = urljson.read()
info = json.loads(data)
print 'Retrieved ', len(data), " characters"

print "Count ", len(info["comments"])

#print json.dumps(info, indent=4)
sumofcount = sum([item["count"] for item in info["comments"]])
print "Sum ", sumofcount
#for item in info["comments"]:
#    sumofcount += item["count"]

#print "Sum ", sumofcount
#print type(info["comments"]["count"])
#print "Sum ", sum(info["comments"]["count"])


import urllib
import twurl
import json

TWITTER_URL = 'https://api.twitter.com/1.1/friends/list.json'

while True:
    print ''
    acct = raw_input('Enter Twitter Account:')
    if ( len(acct) < 1 ) : break
    url = twurl.augment(TWITTER_URL,
        {'screen_name': acct, 'count': '5'} )
    print 'Retrieving', url
    connection = urllib.urlopen(url)
    data = connection.read()
    headers = connection.info().dict
    print 'Remaining', headers['x-rate-limit-remaining']
    js = json.loads(data)
    print json.dumps(js, indent=4)

    for u in js['users'] :
        print u['screen_name']
        s = u['status']['text']
        print '  ',s[:50]