class: center, middle, inverse # Programming Best Practices --- # Agenda 1. General Programming Principles: What is good code ? --- # Agenda 1. General Programming Principles: What is good code ? 2. Automated Code Testing: Write code to test your code. --- # Agenda 1. General Programming Principles: What is good code ? 2. Automated Code Testing: Write code to test your code. 3. Refactoring: Incrementally improve your code quality. --- # Agenda 1. General Programming Principles: What is good code ? 2. Automated Code Testing: Write code to test your code. 3. Refactoring: Incrementally improve your code quality. 4. Refactoring demo session --- class: center, middle, inverse # General Programming Principles --- class: center # Everything we teach today will slow you down ! --- class: center # Everything we teach today will slow you down at first.
--- # The three laws of informatics (source: https://medium.com/@schemouil/rust-and-the-three-laws-of-informatics-4324062b322b) --- # The three laws of informatics (source: https://medium.com/@schemouil/rust-and-the-three-laws-of-informatics-4324062b322b) ### 1. Programs must be correct. --- # The three laws of informatics (source: https://medium.com/@schemouil/rust-and-the-three-laws-of-informatics-4324062b322b) ### 1. Programs must be correct. ### 2. Programs must be maintainable, except where it would conflict with the First Law. --- # The three laws of informatics (source: https://medium.com/@schemouil/rust-and-the-three-laws-of-informatics-4324062b322b) ### 1. Programs must be correct. ### 2. Programs must be maintainable, except where it would conflict with the First Law. ### 3. Programs must be efficient, except where it would conflict with the First or Second Law. --- class: middle # 1. Correctness 1. A program produces correct results for regular input. 2. A program produces consistent results for corner case input. 3. A program detects invalid input and reports it. 4. A Program does not crash. --- class: middle # 3. Efficiency - Not part of this presentation. - Don't guess, measure first (use a profiler) - Optimization usually needs good knowledge of programming language - Optimization usually needs good knowledge of algorithms and data structures. --- class: middle, center, inverse # Maintainability --- class: middle # Maintainability 1. It is easy to extend / modify the program. 2. It is easy to find and fix bugs. 3. Others can understand your program. --- class: middle # General recommendations - Control over development process (testing, git) - Clear structure / well understood approach. - Don't program by coincidence. - Understand bugs before you fix them. - Be brave to trash your program and start from scratch. - Rethink what you do. - DRY: Don't repeat yourself (reduce code duplication) --- class: middle, center # Readability # Code is read more often than it is written --- class: middle # Readability: Layout - Little scrolling needed. - Keep related code together. - Use empty lines to structure code. - Avoid very long lines. - Reduce nesting. - Choose a code style and use it. --- class: middle # Readability: PEP 8 style guide for Python Most important points from PEP 8: - `snake_case` for functions and variable names. - `CamelCase` for class names. - space after `,` - spaces around `+`, `*`, ... - indent by multiples of four spaces - maximal line length: 80 characters. --- class: middle # Readability: Other style guides - R:
http://adv-r.had.co.nz/Style.html
- Matlab:
https://www.mathworks.com/matlabcentral/fileexchange/2529-matlab-programming-style-guidelines
--- ## Use names to reveal intention Is this program correct ? ```python def compute(a, b): return a + b ``` --- ## Use names to reveal intention Is this program correct ? ```python def compute(a, b): return a + b ``` With appropriate names: ```python def area_rectangle(height, width): return height + width ``` --- ## Good names reduce comments ```python time_tolerance = 60 # in seconds ``` --- ## Good names reduce comments ```python time_tolerance = 60 # in seconds ``` ```python time_tolerance_in_seconds = 60 ``` --- ## Good names reduce comments ```python time_tolerance = 60 # in seconds ``` ```python time_tolerance_in_seconds = 60 ``` https://en.wikipedia.org/wiki/Mars_Climate_Orbiter#/Cause_of_failure
--- ## Good names reduce comments ```python mem_used = mem_used / 1073741824 # bytes to gb ``` --- ## Good names reduce comments ```python mem_used = mem_used / 1073741824 # bytes to gb ``` Introduce a constant for such "magic numbers": ```python BYTES_PER_GB = 1073741824 ... mem_used = mem_used / BYTES_PER_GB ``` --- ## Good names reduce comments ```python mem_used = mem_used / 1073741824 # bytes to gb ``` Introduce a constant for such "magic numbers": ```python BYTES_PER_GB = 1073741824 ... mem_used = mem_used / BYTES_PER_GB ``` - Use capital letters for constants. - Constants also reduce *undetected* typing errors. --- ## Names should not encode types ```python names_list = read_names_as_list() for name in names_list: ... ``` --- ## Names should not encode types ```python names_list = read_names_as_list() for name in names_list: ... ``` ```python names = read_names() for name in names: ... ``` --- ## Names should not encode types ```python names_list = read_names_as_list() for name in names_list: ... ``` ```python names = read_names() for name in names: ... ``` If types change you will have to change your code, else your code will lie ! --- ## Explanatory variables to express intent ```python if taste_score > 30: print("I like this") ``` --- ## Explanatory variables to express intent ```python if taste_score > 30: print("I like this") ``` ```python is_sweet = (taste_score > 30) if is_sweet: print("I like this") ``` --- ## Explanatory variables to reduce duplication ```python if width * height > 30: print("area", width * height, "is too large.") ``` --- ## Explanatory variables to reduce duplication ```python if width * height > 30: print("area", width * height, "is too large.") ``` ```python area = width * height if area > 30: print("area", area, "is too large.") ``` Now imagine more complicated expressions ! --- class: middle ## About comments - Don't comment the obvious. - Only comment the unexpected. - If you use a solution from the internet: add a comment with the link. - Check if your comment is understandable for others. --- class: middle, center # Functions --- ## Write many functions - to avoid code duplication --- ## Write many functions - to avoid code duplication - to reduce nesting level --- ## Write many functions - to avoid code duplication - to reduce nesting level - to support re-usability --- ## Write many functions - to avoid code duplication - to reduce nesting level - to support re-usability - to simplify code structure --- ## Write many functions - to avoid code duplication - to reduce nesting level - to support re-usability - to simplify code structure - to reduce comments --- ## Write many functions - to avoid code duplication - to reduce nesting level - to support re-usability - to simplify code structure - to reduce comments - to support testability --- ## A function should only do "one thing" ! ```python def ask_and_check_if_prime(): number = int(input("give me a number: ")) divisor = 2 while divisor * divisor <= number: if number % divisor == 0: print(number, "is not prime") return divisor += 1 print(number, "is prime") ``` --- ## A function should only do "one thing" ! ```python def ask_and_check_if_prime(): number = int(input("give me a number: ")) divisor = 2 while divisor * divisor <= number: if number % divisor == 0: print(number, "is not prime") return divisor += 1 print(number, "is prime") ``` ```python def is_prime(number): divisor = 2 while divisor * divisor <= number: if number % divisor == 0: return False divisor += 1 return True def ask_and_check_if_prime(): number = int(input("give me a number: ")) if is_prime(number): print(number, "is prime") else: print(number, "is not prime") ``` --- ## A function should have few arguments ```python def my_algorithm(input_data, n_iter, epsilon, tau, variant): ... ``` --- ## A function should have few arguments ```python def my_algorithm(input_data, n_iter, epsilon, tau, variant): ... ``` ```python def my_algorithm(input_data, configuration): n_iter = configurateion["n_iter"] epsilon = configurateion["epsilon"] tau = configurateion["tau"] variant = configurateion["variant"] ... ``` --- ## A function should have few arguments ```python def distance_3d(x0, y0, z0, x1, y1, z1): dx = x1 - x0 dy = y1 - y0 dz = z1 - z0 return (dx ** 2 + dy ** 2 + dz ** 2) ** .5 distance = distance_3d(1, 2, 3, 3, 2, 1) ``` --- ## A function should have few arguments ```python def distance_3d(x0, y0, z0, x1, y1, z1): dx = x1 - x0 dy = y1 - y0 dz = z1 - z0 return (dx ** 2 + dy ** 2 + dz ** 2) ** .5 distance = distance_3d(1, 2, 3, 3, 2, 1) ``` ```python from collections import namedtuple Point3D = namedtuple("Point3D", "x,y,z") def distance_3d(p1, p2): dx = p2.x - p1.x dy = p2.y - p1.y dz = p2.z - p1.z return (dx ** 2 + dy ** 2 + dz ** 2) ** .5 distance = distance_3d(Point3D(1, 2, 3), Point3D(3, 2, 1)) ``` --- ## A function should have few arguments ```python def distance_3d(x0, y0, z0, x1, y1, z1): dx = x1 - x0 dy = y1 - y0 dz = z1 - z0 return (dx ** 2 + dy ** 2 + dz ** 2) ** .5 distance = distance_3d(1, 2, 3, 3, 2, 1) ``` ```python from collections import namedtuple Point3D = namedtuple("Point3D", "x,y,z") def distance_3d(p1, p2): dx = p2.x - p1.x dy = p2.y - p1.y dz = p2.z - p1.z return (dx ** 2 + dy ** 2 + dz ** 2) ** .5 distance = distance_3d(Point3D(1, 2, 3), Point3D(3, 2, 1)) ``` - R has named vectors and lists - Python 3.7 has also `dataclass`es - Else: use objects --- ## The code of a function should operate on the same level of abstraction Avoid long functions with sections ! Instead: ```python def workflow(): configuration = read_configuration() data = read_data(configuration) results = process(data, configuration) write_result(results, configuration) ``` --- ## The code of a function should operate on the same level of abstraction Avoid long functions with sections ! Instead: ```python def workflow(): configuration = read_configuration() data = read_data(configuration) results = process(data, configuration) write_result(results, configuration) ``` ```python def read_data(configuration): input_path = configuration['input_file'] if input_path.endswith(".csv"): return read_csv(input_path) elif input_path.endswith(".xlsx"): return read_xlsx(input_path) else: raise NotImplementedError('no not know how to read {}'.format(input_file)) ``` --- ## Return early ```python def is_prime(number): if number >= 1: ... ... ... else: return False ``` --- ## Return early ```python def is_prime(number): if number >= 1: ... ... ... else: return False ``` Better: ```python def is_prime(number): if number < 1: return False ... ... ... ``` --- ## Return early ```python def is_prime(number): if number >= 1: ... ... ... else: return False ``` Better: ```python def is_prime(number): if number < 1: return False ... ... ... ``` Benefits: - reduces indentation - reduces scrolling --- ## Use break / continue early ```python for value in values: if value > 0: # a long code section follows ... ... ... ``` --- ## Use break / continue early ```python for value in values: if value > 0: # a long code section follows ... ... ... ``` Better: ```python for value in values: if value <= 0: continue # a long code section follows ... ... ... ``` --- ## Be consistent - consistent naming (e.g. either `index` or `idx`, not both) --- ## Be consistent - consistent naming (e.g. either `index` or `idx`, not both) - consistent physical units --- ## Be consistent - consistent naming (e.g. either `index` or `idx`, not both) - consistent physical units - consistent error handling (`return None` vs `raise ...`) --- ## Be consistent - consistent naming (e.g. either `index` or `idx`, not both) - consistent physical units - consistent error handling (`return None` vs `raise ...`) - consistent style (`CamelCase` vs `snake_case` names, ...) --- class: center
# If I follow all your advice I will have no time to focus on my problem --- class: center
## Learn best practices incrementally --- class: center
## Learn best practices incrementally ## Automate techniques and habits step by step --- class: center, middle Robert C Martin: Clean Code
--- class: center ## My professor won't give me time for the stuff you teach ! --- class: center ## My professor won't give me time for the stuff you teach ! ## Talk about risks and future costs, not about technicalities.
--- class: inverse # Take home messages: --- class: inverse # Take home messages: ## - think about good names --- class: inverse # Take home messages: ## - think about good names ## - write many good functions --- class: inverse # Take home messages: ## - think about good names ## - write many good functions ## - be consistent --- class: inverse # Take home messages: ## - think about good names ## - write many good functions ## - be consistent --- class: inverse # Take home messages: ## - think about good names ## - write many good functions ## - be consistent ## - exercise and automate best practices --- class: center, middle, inverse # Questions ? --- class: center, middle, inverse # Thanks for your attention !