View on GitHub

wiki

Python

Collections

Common types

Grouping

Allows you to iterate upon group changes. Groups must be sorted, or they will change w/ every key change and not group properly.

for k, g in groupby(data, keyfunc):
    for item in group:
        print(key, group)

Slicing

a[start:end] # items start through end-1
a[start:]    # items start through the rest of the array
a[:end]      # items from the beginning through end-1
a[start:end:step] # start through not past end, by step
a[-1]    # last item in the array
a[-2:]   # last two items in the array
a[:-2]   # everything except the last two items

List comprehensions

# Creates a list from an iterable in one line:
[str(x) for x in my_list if x.a==x.b]

# Nested list comp; z is returned in each iteration in this case
[z for b in a for c in b for d in c ... for z in y]

# Is the same as
for b in a:
    for c in b:
        for d in c:
            ...
                for z in y:

Zip

To combine two iterables, for example, if you have array A and B, and you want to go through them both together (without having to keep an index around), use the following. Note, this is truncated to the shorter of the two.

for i, j in zip( range(5), range(1,20,2) ):
    print i, j
0 1
1 3
2 5
3 7
4 9

Chain

To chain two generators together

import itertools
for i in itertools.chain(range(5),range(1,20,2)):
    print (i)
0
1
2
3
4
1
3
5
7
9
11
13
15
17
19

Concatenate

Can also just concatenate if they are lists (doesn’t work on generators)

for i in [1,2,3,4,5]+[3,5,7,9,11,13,15,17,19]:
    print (i)
0
1
2
3
4
1
3
5
7
9
11
13
15
17
19

Overhead

To analyze the amount of overhead that python data structures has, I wrote a 2D set of random numbers to various data structures and analyzed their memory usage:

Unicode

Python 2v3 unicode types

Dealing with unicode

Python 2’s unicode support is tricky (as is unicode in general).

What this basically amounts to is that Python 2 uses ASCII as the default encoding for byte strings. Hence, if you ever try to encode something (without specifying the encoding as UTF-8) in Python, it will use ASCII by default. This is perfectly fine, unless it encounters something which is beyond the ASCII range of unicode (e.g., foreign characters or emojis). This is unfortunately not always easily detectable; your code could run fine as long as it is receiving ASCII characters, but will crash as soon as a non-ASCII character is encountered. Note, if you are 100% sure that you will never encounter any unicode (e.g., you’re working on closed-loop internal code), you can still use str and format to your heart’s content. If you are dealing with external data, however, this should definitely be considered.

More detail is below, but the gist of this is:

unicode_char = unichr(1024)

# Can't write unicode w/ ASCII encoding
handle = open("out.txt", 'w')
handle.write(unicode_char)
handle.close()

# Can't convert unicode to ASCII-encoded bytestring
str(unicode_char)

# Can't format ASCII w/ unicode variable
"{}".format(unicode_char)

#
# Here's how you can solve these problems:
#

# Use the codecs module to write utf-8
import codecs
handle = codecs.open("out.txt", 'w', 'utf-8')
handle.write(unicode_char)
handle.close()

# Alternatively, encode your strings explicitly
handle = open("out.txt", 'w')
handle.write(unicode_char.encode('utf-8'))
handle.close()

# Don't use str(), encode explicitly
unicode_char.encode('utf-8')

# Use unicode object to format, not str literal
print(u"{}".format(unicode_char))

# Note, the above examples use unichr(1024), which is of type `unicode`.
# If you have an un-decoded str in the unicode range that you are using as an input to the format function (as below), it will throw an error.
# This is because the system attempts to decode it with the default codec, ASCII, which is impossible.
print(u"{}".format("\xe2\x80\xa6"))

# To fix this, you must fist decode the string to a unicode object, rather than a bytestring.
print(u"{}".format("\xe2\x80\xa6".decode('utf-8')))

# As a general rule, decode all strs to a unicode object ASAP.

Context manager

A class w/ __enter__() and __exit__() can be used with a “with” directive to run code when the block starts/stops. One example of this is when opening files; cleanup happens automatically when the block completes:

with open('workfile', 'r') as f:
    read_data = f.read()

Equality and comparisons

is vs ==

‘is’ is identity testing, ‘==’ is equality testing.

a = 'pub'
b = "".join(['p', 'u', 'b'])
a == b
>>True
a is b
>>False

“is” comparison works for strings because they are interned (stored in a hash map w/ a pointer). This is actually faster than ‘==’, since the actual string content doesn’t need to be compared; you only need to compare the pointers of the two objects.

Check “memory location”

id(variable)

This gives you a unique identifier to that variable. Useful for checking if you are using a copy of or the original object.

Boolean equivalency

# Although [] doesn't equal False, it converts to False if you use it in a boolean expression
# This is useful for using them in if statements
[] == False
>> False
bool([]) == False
>> True

The boolean equivalent of all of these also are also False:

Exceptions

As a rule, always use exceptions instead of returning error codes.

AFAIK, KeyboardInterrupt isn’t a subclass of Exception.

Define new exception

class MyException(Exception):
    pass

Raise exception w/ description

raise Exception("My hovercraft is full of eels")

Get exception description

try:
    raise MyException({"message":"My hovercraft is full of animals", "animal":"eels"})
except MyException as e:
    details = e.args[0]
# Inside an exception
try:
    raise TypeError("Oops")
except Exception:
    import traceback
    traceback.print_stack()

# Outside an exception
import traceback
print(''.join(traceback.format_stack()))

Types

Check if a string is a number

def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

Immutable/mutable types

Classes

Class vs instance vars

class MyClass(object):
    class_var1 = "class var"
    def __init__(self, x, y):
        self.inst_var = x
        MyClass.class_var2 = y

inst = MyClass()

#This will create a instance variable w/ a conflicting name, not a class variable
class_inst.class_var1 = ""

#This will change the class variable
MyClass.class_var1 = ""

Properties

Public/private/protected

No actual public/private/protected, is actually done just by programming convention. A preceding double underscore mangles the variable name so that the class name precedes it. A single underscore does nothing but suggests to other programmers that it’s a protected variable. By convention, no underscore = public, single = protected, double = private.

Mocking/patching

@patch('some_module.sys.stdout')
def test_something_with_a_patch(self, mock_sys_stdout):
    mock_sys_stdout.return_value = 'My return value from stdout'

    my_function_under_test()

    self.assertTrue(mock_sys_stdout.called)
    self.assertEqual(output, mock_sys_stdout.return_value)
@patch('some_module.sys.stdout', Mock())
def test_something_with_a_patch(self):
@patch.object(some.package.Class, 'someattr')
# Print traceback when attribute is accessed
import mock
import traceback
def hi():
    traceback.print_stack()
p = mock.PropertyMock(wraps=db.session, side_effect=hi)
type(db).session = p
# Also useful:
traceback.print_stack()

Decorators

# Decorator replaces original function signature
# Original function is passed in as an argument; replaced function is returned
def decorator(decorated_function):
    def replaced_function(input_to_replaced_func):
        print("pre")

        # This will call decorated function like normal
        print(input_to_replaced_func)
        decorated_function()

        print("post")
    return replaced_function

def orig_func():
    print("Hi")

@decorator
def orig_func_2():
    print("Hi")

# Identical:
decorator(orig_func)("Input to replaced function")
orig_func_2("Input to replaced function")

# Decorator with an argument:
def decorator2(decorator_arg):
    def real_decorator(decorated_function):
        def replaced_function(input_to_replaced_func):
            return decorator_arg + decorated_function(input_to_replaced_func) + decorator_arg
        return replaced_function
    return real_decorator

@decorator2("***")
def double_it(input_str):
    return input_str + input_str

print(double_it("goo"))

Logging

Don’t use .format or “%s”%var; the logger handles this internally. Instead, use logger.info(“%s”, var). Has better unicode support and performance.

See python’s docs for its flowchart diagram for how it deals with loggers, formatters, handlers, etc.: https://docs.python.org/2/_images/logging_flow.png

Argument passing

Passing style

Python passes by reference. As long as you are working with the original variable reference (only doing mutations of it), it is still working on that same reference. The second you change the variable you are working with by doing something that’s more than a simple mutation, a copy is made, and all changes are only local to that function.

def test(input):
    input.append(1)
    print input
    input = ['hi']
    print input
    input.append('test')
    print input

inny = [1,2]
print inny
test(inny)
print inny

Output:

 [1, 2]
 [1, 2, 1]
 ['hi']
 ['hi', 'test']
 [1, 2, 1]

*args and **kwargs

def func(required_arg, *args, **kwargs):
  print("req", required_arg)
  print("args", args)
  print("kwargs", kwargs)

func(1, "a", "b", x="x", y="y")
>>>('req', 1)
>>>('args', ('a', 'b'))
>>>('kwargs', {'y': 'y', 'x': 'x'})

Profiling

# Profile externally
python -m cProfile -s time ./manage.py worker

# Profile function directly within ipython
%prun some_function()

Warnings

Run python with python -Wdefault to show all warnings

Snippets

TCP and UDP client/server

TCP Server

import socket

TCP_IP = ''
TCP_PORT = 5005
BUFFER_SIZE = 1024  # Normally 1024, but we want fast response

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((TCP_IP, TCP_PORT))
s.listen(1)

conn, addr = s.accept()
print('Connection address:', addr)
while 1:
    data = conn.recv(BUFFER_SIZE)
    if not data:break
    print("received data:", data)
    conn.send(data)  # echo
conn.close()

TCP Client

#!/usr/bin/env python

import socket


TCP_IP = '127.0.0.1'
TCP_PORT = 5005
BUFFER_SIZE = 1024
MESSAGE = "Hello, World!"

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))
s.send(MESSAGE)
data = s.recv(BUFFER_SIZE)
s.close()

print "received data:", data

UDP Server

import socket

UDP_IP = ""
UDP_PORT = 5005

sock = socket.socket(socket.AF_INET, # Internet
                     socket.SOCK_DGRAM) # UDP
sock.bind((UDP_IP, UDP_PORT))

while True:
    data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes
    print "received message:", data

UDP Client

import socket

UDP_IP = "127.0.0.1"
UDP_PORT = 5005
MESSAGE = "Hello, World!"

print "UDP target IP:", UDP_IP
print "UDP target port:", UDP_PORT
print "message:", MESSAGE

sock = socket.socket(socket.AF_INET, # Internet
                     socket.SOCK_DGRAM) # UDP
sock.sendto(MESSAGE, (UDP_IP, UDP_PORT))

Sqlite access

import sqlite3
db = sqlite3.connect('database.db')
cursor=db.cursor()
cursor.execute('SELECT * from Power')
for row in cursor.fetchall():
    print(row[1], row[5])

virtualenv/pip

Create virtualenv

virtualenv env

Install local library

To copy project, in editable mode (to src dir)

pip install -e git+file:///home/shook/some-lib@HEAD#egg=some-lib

Can add [] for optional depndencies

pip install -e git+file:///home/shook/some-lib@HEAD#egg=some-lib[all]

To link to the directory (lib development mode); make sure src and site-packages are cleared out, and may need to remove/re-add interpreter in intellij This creates an egg-link file and updates easy-install.pth; may need to synchronize intellij to pick up changes (ctrl+alt+y)

pip install -e /home/shook/some-lib

Library notes

Coverage

coverage run manage.py test -n module
coverage report --omit="*/test*" --include=path/to/analyze/*,other/path/*

Mypy

Adds type annotations and checking

pip install mypy-lang

Run with: python.exe C:\Python34\Scripts\mypy ..\performance_debugging_analysis\src\main.py

Add as an external tool to IntelliJ:

Program: C:/tools/python3/python.exe
Parameters: C:/tools/python3/scripts/mypy $FilePath$

Pandas

Pandas is useful for any SIMD or linear algebra-like manipulations. Basically, it can be very useful any time you need work on spreadsheet-like data.

# Use pandas to parse xlsx/Excel workbooks
rows = read_excel("path_to_xlsx")

Enforce

Use to enable runtime enforcement of python types.

Flask

Using blueprints fixes circular import with using circular app import directly

Sqlalchemy

By default, SQLAlchemy uses eager joins. This means the entire data structure and all subfields are grabbed during any access.

Lazy joins instead only bring in data when a particular data element is requested.

Life cycle

Can use automatic session scoping by tying into the app’s lifecycle, but this means a session will last the full duration of the web request. I’m using a context manager to allow the flexibility of sharing code w/ non web apps.

# In flask:
def set_up_session_cleanup():
    def after_request(response):
        from website_workout.dao.utils import scoped_session_registry
        for scoped_session in scoped_session_registry.values():
            scoped_session.remove()
        return response
    app.after_request(after_request)

Sessions

Pools

pool_recycle refreshes connections older than n seconds upon access (not only idle ones)

Logging

Get info about sqlalchemy pools and connections (can use root sqlalchemy pool to get more info)

import sqlalchemy
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.pool').setLevel(logging.DEBUG)

Docs

Numpy/scipy

Linear regression

x = [[/1,_.1],_[20,_.2],_[3,_.3],_[4,_.4|1, .1], [20, .2], [3, .3], [4, .4]]
y = [1, 2, 3, 4]
coeffs = numpy.linalg.lstsq(x, y)[0]
print(coeffs)

Or

def func(xval, a, b, c):
    return a * xval ** c + b

popt, pcov = scipy.optimize.curve_fit(func, x, y, maxfev=100000)
curvefit_x= func(x_val, *popt)

Fabric and paramiko

For running ssh commands, paramiko is much better than fabric for anything requiring threading or dynamically determined hosts

Cross-correlation

numpy.corrcoef([numpy.array(data1), numpy.array(data2)])

pipdeptree

Shows dependencies between libraries (including any breakages)

pip install pipdeptree
pipdeptree

piprot

Shows out of date libraries

pip install piprot
piprot -o requirements.txt

jsonschema

Useful for validating JSON

pip install jsonschema

Profile imports

PYTHONPATH=server python -X importtime -c 'import somecode; somecode.run()'