brogramming - python, bash for data processing, and git
TRANSCRIPT
¡ Python § The Zen of Python § Conventions and PEP8 § Tips & Tricks, Do’s and don’t’s
¡ Bash for Data Processing ¡ Git
§ Merge vs rebase § Git flow
>>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-‐-‐ and preferably only one -‐-‐obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -‐-‐ let's do more of those!
There should be one and preferably only one
obvious way to do it. (Although that way may not be obvious at first unless you're Dutch.)
¡ PEP 8 defines the standard coding convention for the global Python community.
¡ It does not leave much room for different code conventions
¡ Makes code more readable ¡ Gets people to easily go in and out of code which they did not write
Yes: # Aligned with opening delimiter. foo = long_function_name(var_one, var_two, var_three, var_four)
Yes (please use hanging indents at Crosswise): # More indentation included to distinguish this from the rest. def long_function_name( var_one, var_two, var_three, var_four): print(var_one) # Hanging indents should add a level. foo = long_function_name( var_one, var_two, var_three, var_four)
No:
# Arguments on first line forbidden when not using vertical alignment. foo = long_function_name(var_one, var_two, var_three, var_four) # Further indentation required as indentation is not distinguishable. def long_function_name( var_one, var_two, var_three, var_four): print(var_one)
This is OK:
my_list = [ 1, 2, 3, 4, 5, 6, ] result = some_function_that_takes_arguments( 'a', 'b', 'c', 'd', 'e', 'f', )
This is OK too (use this at Crosswise):
my_list = [ 1, 2, 3, 4, 5, 6, ] result = some_function_that_takes_arguments( 'a', 'b', 'c', 'd', 'e', 'f', )
¡ Method definitions inside a class, 1 blank line Class Foo(object): def foo1(self): pass def foo2(self): pass
¡ Use blank lines in functions, sparingly, to indicate logical sections.
# do something something = [x for x in list] # now do something different something_different = do_it(something)
¡ PEP 8 recommends absolute imports, so use them.
¡ They avoid confusion and duplicate imports.
from foo.bar.yourclass import YourClass
¡ Immediately inside parentheses, brackets or braces.
Yes: spam(ham[1], {eggs: 2}) No: spam( ham[ 1 ], { eggs: 2 } )
¡ Immediately before a comma, semicolon, or colon:
Yes: if x == 4: print x, y; x, y = y, x No: if x == 4 : print x , y ; x , y = y , x
¡ Immediately before the open parenthesis that starts the argument list of a function call:
Yes: spam(1) No: spam (1)
¡ More than one space around an assignment (or other) operator to align it with another.
Yes: x = 1 y = 2 long_variable = 3 No: x = 1 y = 2 long_variable = 3
¡ Immediately before the open parenthesis that starts an indexing or slicing:
Yes: dict['key'] = list[index] No: dict ['key'] = list [index]
¡ Don't use spaces around the = sign when used to indicate a keyword argument or a default parameter value.
Yes: def complex(real, imag=0.0): return magic(r=real, i=imag) No: def complex(real, imag = 0.0): return magic(r = real, i = imag)
¡ Use inline comments sparingly. ¡ Don’t document obvious things – if you write Python expressive enough, you might need even need documentation at all.
¡ Use list comprehensions instead of map with lambdas. ¡ If you want to apply a function, use map. ¡ Generally, map and filter are less readable.
Yes: result = [x + 1 for x in xrange(10) if x % 2 == 0] No: result = filter(lambda x: x % 2 == 0, map( lambda x: x + 1, xrange(10))) Yes: result = map(get_data, items)
¡ Python Facts – a website I wrote: § http://thingsyoumust.appspot.com/
¡ Quora § http://www.quora.com/Python-‐programming-‐language-‐1/What-‐are-‐some-‐cool-‐Python-‐tricks
¡ Pipe contents of file into another file
cat file.txt > newfile.txt cat file1.txt file2.txt > newfile.txt ¡ Pipe contents of a compressed file gzcat file.txt.gz > newfile.txt ¡ Pipe contents of a compressed file and write it to a
compressed file gzcat file.txt.gz | gzip > newfile.txt.gz
¡ awk is a mini programming language which is very convenient mostly for outputting a column/line according to a different condition
echo "1\t2\t3\n4\t5\t6\t" | awk '{ if ($1 % 2 == 0) print $2 }' 5
¡ awk can also do cool stuff like sampling a file. echo "1\n2\n3\n4\n5\n6" | awk 'BEGIN {srand()} {if (rand() < 0.1) print $0}'
¡ uniq returns unique values, assuming the input is sorted.
echo "1\n2\n1" | uniq 1 2 1 echo "1\n2\n1" | sort | uniq 1 2 ¡ uniq also knows to return value counts for each value using –c. echo "a\nb\na" | sort | uniq -‐c 2 a 1 b
¡ Counting unique values in a column: cat foo.txt | cut –f 2 | sort | uniq | wc –l ¡ Showing the value counts
cat foo.txt | cut –f 2 | sort | uniq -‐c
¡ Git is a distributed version control system (DVCS)
¡ Written by Linus Torvalds, the author of Linux, in order to maintain the open source project
¡ Advantages of rebase over merge: § Simplifies history – merge commits are ugly because they have two parents
§ Each commit is an addition over the existing code ¡ Advantages of merge over rebase:
§ Preserves the commits as they were in the developer’s computer
¡ HOWTO http://danielkummer.github.io/git-‐flow-‐cheatsheet/ http://nvie.com/posts/a-‐successful-‐git-‐branching-‐model/
¡ Install brew install git-‐flow-‐avh
¡ Git flow defines a universal standard for developing features and maintaining releases over git.
¡ There are 3 commands in git flow: § feature – for implementing new features § release – for adjusting the release according to feedback from QA and such
§ hotfix – for creating a new release on top of a previous release
¡ Features are branched in and out of develop ¡ Releases are branched in and out of develop and master
¡ Hotfixes are branched in and out of master
¡ There are 3 operations: § start – start a feature/release/hotfix by branching out
§ publish – push to the remote repository § finish – Merge back and remove the branch ▪ When finishing, please use the -‐r feature to rebase the changes back instead of merging them, so we would have a cleaner history to look at.