Intro

“Structure” means making clean code whose logic and dependencies are clear as well as how the files and folders are organized in the filesystem.

Questions for beginning:

  1. Which functions should go into which modules?
  2. How does data flow through the project?
  3. What features and functions can be grouped together and isolated?

Sample Repository

This repository is available here.

1
2
3
4
5
6
7
8
9
10
11
12
README.md
LICENSE
setup.py
requirements.txt
sample/__init__.py
sample/core.py
sample/helpers.py
docs/conf.py
docs/index.rst
tests/context.py
tests/test_basic.py
tests/test_advanced.py

The Actual Module

Location: ./sample/ or ./sample.py

Purpose: The code of interest

The module package is the core focus of the repository. It should not be tucked away.

If the module consists of only a single file, you can place it directly in the root of your repository.

License

Location: ./LICENSE

Purpose: The full license text and copyright claims should exist in this file

Of course, you are free to publish code without a license.

Setup.py

Location: ./setup.py

Purpose: Package and distribution management

For general packages, we can download them directly using the pip install.

But if we write our own packages locally and want to publish them to the server, we can use setup.py to build the environment. Many GitHub codes provide setup.py for one-click installation.

For more details: 花了两天,终于把 Python 的 setup.py 给整明白了 - 知乎

Requirements File

Location: ./requirements.txt

Purpose: Development dependencies, containing a list of items to be installed using pip install

If your project has no development dependencies, or if you prefer setting up a development environment via setup.py, this file may be unnecessary.

Documentation

Location: ./docs/

Purpose: Package reference documentation

There is little reason for this to exist elsewhere.

Test Suite

Location: ./test_sample.py or ./tests

Purpose: Package integration and unit tests

Test modules must import your packaged module to test it. You can do this a few ways:

  • Expect the package to be installed in site-packages.
  • Use a simple (but explicit) path modification to resolve the package properly.

The latter is recommended. Because the former method assumes that the package is already installed, which may not be the way the developer wants to do it. If the latter method is used, it requires the developer to run setup.py develop to set up the development environment and ensure that each instance of code has a separate environment.

In short, during testing, it is recommended to use path modification to import modules in packages, as this avoids package dependent installation and also prevents the test environment from being affected by other factors.

To give the individual tests import context, create a tests/context.py file:

1
2
3
4
5
import os
import sys
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))

import sample

Then, within the individual test modules, import the module like so:

1
**from** .context **import** sample

Some people will assert that you should distribute your tests within your module itself. It often increases complexity for your users; many test suites often require additional dependencies and runtime contexts.

Makefile

Location: ./Makefile

Purpose: Generic management tasks

Make is a highly beneficial tool for defining general tasks in your project, even if it’s not written in C.

Sample Makefile:

1
2
3
4
5
init:pip install -r requirements.txt

test:py.test tests

.PHONY: init test

Other generic management scripts (e.g. manage.py or fabfile.py) belong at the root of the repository as well.

Regarding Django Applications

When using Django, use the following commands to better initialize the repository:

1
$ django-admin.py startproject samplesite .

Note the “.”.

Structure of Code is Key

Some signs of a poorly structured project include:

  • Multiple and messy circular dependencies
  • Hidden coupling
  • Heavy usage of global state or context
  • Spaghetti code (multiple pages of nested if clauses and for loops with a lot of copy-pasted procedural code and no proper segmentation)
  • Ravioli code (hundreds of similar little pieces of logic, often classes or objects, without proper structure)

Modules

Keep module names short, lowercase, and be sure to avoid using special symbols like the dot (.) or question mark (?), even our trusty friend the underscore (_), should not be seen that often in module names. Don’t namespace with underscores; use sub-modules instead.

1
2
3
4
# OK
import library.plugin.foo
# not OK
import library.foo_plugin

The import modu statement will look for the proper file, which is modu.py in the same directory as the caller, if it exists. If it is not found, the Python interpreter will search for modu.py in the “path” recursively and raise an ImportError exception when it is not found.

When modu.py is found, the Python interpreter will execute the module in an isolated scope. Any top-level statement in modu.py will be executed, including other imports if any. Function and class definitions are stored in the module’s dictionary.

Then, the module’s variables, functions, and classes will be available to the caller through the module’s namespace.

In many languages, an include file directive is used by the preprocessor to take all code found in the file and ‘copy’ it into the caller’s code. It is different in Python: the included code is isolated in a module namespace, which means that you generally don’t have to worry that the included code could have unwanted effects, e.g. override an existing function with the same name.

Very bad

1
2
3
4
[...]
from modu import *
[...]
x = sqrt(4) # Is sqrt part of modu? A builtin? Defined above?

Better

1
2
3
from modu import sqrt
[...]
x = sqrt(4) # sqrt may be part of modu, if not redefined in between

Best

1
2
3
import modu
[...]
x = modu.sqrt(4) # sqrt is visibly part of modu's namespace

Packages

Any directory with an __init__.py file is considered a Python package. The different modules in the package are imported in a similar manner as plain modules, but with a special behavior for the __init__.py file, which is used to gather all package-wide definitions.

A file modu.py in the directory pack/ is imported with the statement import pack.modu. This statement will look for __init__.py file in pack and execute all of its top-level statements. Then it will look for a file named pack/modu.py and execute all of its top-level statements. After these operations, any variable, function, or class defined in modu.py is available in the pack.modu namespace.

When the project complexity grows, there may be sub-packages and sub-sub-packages in a deep directory structure. In this case, importing a single item from a sub-sub-package will require executing all __init__.py files met while traversing the tree.

Leaving an __init__.py file empty is considered normal and even good practice, if the package’s modules and sub-packages do not need to share any code.

Lastly, a convenient syntax is available for importing deeply nested packages: import very.deep.module as mod. This allows you to use mod in place of the verbose repetition of very.deep.module.

Decorators

A decorator is a function or a class that wraps (or decorates) a function or a method. The ‘decorated’ function or method will replace the original ‘undecorated’ function or method. Because functions are first-class objects in Python, this can be done ‘manually’, but using the @decorator syntax is clearer and thus preferred.

1
2
3
4
5
6
7
8
9
10
11
12
13
def foo():
# do something

def decorator(func):
# manipulate func
return func

foo = decorator(foo) # Manually decorate

@decorator
def bar():
# Do something
# bar() is decorated

References

  1. Structuring Your Project
  2. Python 项目工程化开发指南
  3. Building and Distributing Packages with Setuptools
  4. 花了两天,终于把 Python 的 setup.py 给整明白了 - 知乎