This two-day course teaches the basics of the Python programming language. Python is an open-source programming language that runs on each major operating system and offers high readability and programming productivity. No previous programming experience is required. However, we assume that participants come with a working Python installation on their notebooks. Python's language elements will be taught by examining example tools from high throughput DNA sequence analysis with next generation sequencing (NGS) data.
Python is based on the concept of objects defined by classes and operations associated with them. For example, a DNA sequence is a Python object, for which we can implement the reverse complement operation). We thus introduce the terminology of object-oriented programming. In parallel, we discuss Python's statements, which are similar to those in many other programming languages (assignments, loops, conditionals, context managers, error handling by exceptions). We next discuss Python's data types: basic types (booleans, numbers, strings), sequence and container types (lists, bytes, arrays, sets), and dictionaries. To interact with the outside world, we discuss how to write command line programs and work with files. As an example, we compute and output the base quality distribution in a FASTQ file using only elementary programming techniques.
We then develop a more complex application: the rapid estimation of the rate of (PCR) duplicates in a sequencing run and its visualization. For this, we will explore several of Python's advanced features (iterators, generators, comprehensions), standard library modules and extensions (e.g., collections, itertools, numpy, matplotlib). We also discuss how to write good (readable and "Pythonic") code and write documentation.