Code Style

Reference: https://refactoring.guru/refactoring/smells

Please follow the best coding practices and the following guidelines when contributing to the project.

Variable Naming

  1. Follow language specific naming conventions.

    1. In Python, use snake_case for variable names, function names. Use CamelCase for class names.

    2. In C++ and JavaScript, use camelCase for variable names, function names. Use CamelCase for class names.

  2. Use meaningful names.

    • Avoid short and meaningless names like a, b, c, x, y, z, foo, bar, baz, aa, ab

    1. Exception: i, j, k are OK for loop index

    2. Exception: x, y are OK for coordinates

    3. Exception: Math equations are OK, e.g. y = a * x + b

    4. It depends, make sure it’s meaningful

Watch this https://youtu.be/-J3wNP6u5YU

Type Hints

All functions and methods should have type hints (for both paraemter and return type).

For example

# Bad
def add(a, b):
    return a + b

# Good
def add(a: int, b: int) -> int:
    return a + b

# it's OK to omit return type if it returns None
def log(msg: str):
    print(msg)

Use flake8 to find all missing type hints

# first make sure flake8 and flake8-annotations are installed
flake8 --select=ANN001,ANN201 --suppress-none-returning --count --statistics .

Our CI will fail if there are any missing type hints.

Code Formatting

Format your code. In VSCode, you can use Black formatter to format Python. PyCharm comes with a built-in formatter. Same for other languages.

Docstrings

Please write docstrings for functions and methods unless the code is trivial enough.

In VSCode, you can use extension autoDocstring - Python Docstring Generator to generate docstrings template.

Write Structured Code

Good code should explain itself, even without comments.

Avoid passing around strings, dict, pd.DataFrame; also int/float (unless they are numeric data). Categorical data should be represented by enum or constant variable.

Constant Variable and Enum

Never hard code a string (number as well) in your code if they represent categories. Use constant variable or enum instead.

# Bad
def __call__(time_interval: int) -> BarHistory:
    if time_interval == 5:
        pass
    ...

# Good
class TimeInterval(Enum):
    FIVE_MINUTE = 5
    FIFTEEN_MINUTE = 15

def __call__(time_interval: TimeInterval) -> BarHistory:
    if self.time_interval == TimeInterval.FIVE_MINUTE:
        resample_period = "5T" # OK
    elif self.time_interval == TimeInterval.FIFTEEN_MINUTE:
        resample_period = "15T" # OK
    ...

We only have finite number of time intervals, they are categories. If we pass 5 around, we don’t know what it means. In another function when we see a time_interval var of type int, we don’t know what it means (5 minutes? hours? seconds?). But if we pass TimeInterval.FIVE_MINUTE around, we know it’s a time interval of 5 minutes. We get both the semantic meaning and auto completion (intellisense).

Careful with Pandas DataFrame

Do not pass pd.DataFrame around. We don’t know the columns and types of a data frame. It’s OK to create and pass DataFrame around within a function because the author knows the context very well and it won’t affect another function; but when a data frame is passed to another function, it’s very hard to understand the code for another programmer without running the code.

Instead, use dataclasses or pydantic to define data structures.

Read every python file in [quantlib/definition] folder to learn how to define data structures.

For example, quantlib/definition/bar.py contains a BarHistory class. It is based on pydantic model. It represents a list of bars, and the corresponding pandas data frame. The data frame can be accessed by bar_history.df.

Columns can be accessed with

  • bar_history.time

  • bar_history.open

  • bar_history.close

  • bar_history.high

  • bar_history.low

They are all pd.Series objects.

The benefit of this is, with bar_history object, we know exactly the columns and types of the data frame with intellisense without reading the context. This prevent developers from messing with data frames.

If a data frame is used, developers don’t know what’s inside without reading the code or even running the code. I sometimes have to use a debugger to see what’s inside a data frame. Pandas data frame can be modified easily, and it’s hard to track the changes. If one renamed a column in a function, and passed the data frame to another function, it’s very hard to maintain the other function.

Writing history['High'] while the column is actually high could result in a serious and hard-to-debug bug. When we hard code columns in string, it’s also easier to make typos. e.g. Spelling business_day as busines_day could cost time to debug as they look similar in the first glance.

For a data structure that will be used throughout the entire workflow, it’s crucial to keep the data structure clean and consistent.

Careful with Dict

Also see quantlib/definition/config.py for how dataclass can be used in a similar way. Instead of passing a dict around, we can pass a Config object around. It’s much easier to understand the code. A dict has no intellisense (auto-complete), but a Config object has.

If I give you the following code, the only way to know what’s inside config is to read the code, run the code, print out the config dict, or use a debugger.

def run_app(config):
    pass

Now, if I tell you config is a dict, how do you know what’s inside config? What is key? What is value? What are the types?

def run_app(config: dict):
    pass

Even if I give you the key and value type of the config dict, you still don’t know what’s inside. You have to read the code to understand what’s inside. Plus, the values can have different types, resulting in an union type.

def run_app(config: dict[str, int | str]):
    pass

But if I give you the following code, you know exactly what’s inside config without reading the code. You can use intellisense to see what’s inside config; if class Config is defined in another file, IDE lets you jump to the definition of Config class in one click.

from dataclasses import dataclass

@dataclass
class Config:
    time_interval: int
    start_date: str
    end_date: str

def run_app(config: Config):
    pass

Sample Code

The following code roughly demonstrates the idea to write structured code.

  • For line, instead of passing a slope or dict around, we encapsulate the 2 associated bars, mode, slope and intercept into a Line object. We can pass the Line object around. We can also add more attributes to the Line object if needed.

    • Line is a dataclass, it not only can contain data, but also methods, e.g. line.interpolate(xs)

  • When representing mode, instead of using string or number, we use LineMode enum. We can add more modes in the future. Intellisense will provide auto completion.

  • For BarHistory: we know for sure it has index, high and low columns from IDE intellisense, without needing to worry about the column names or letter case.

  • Docstring helps others understand the code without reading logic of the code. Sometimes variable names are vague, but docstring can help others understand the code.

def line_wrap(history: BarHistory, line: Line, search_range: int, cur_bar_idx: int) -> bool:
    """Check if the line wraps other bars within the search range in the given bar history.

    :param history: Bar History object representing the price history
    :type history: BarHistory
    :param line: A line consists of two bars, with slope, intercept and mode
    :type line: Line
    :param search_range: distance to search
    :type search_range: int
    :param cur_bar_idx: Index of current bar
    :type cur_bar_idx: int
    :raises ValueError: Invalid Line Mode
    :return: Whether the line wraps other bars within the search range
    :rtype: bool
    """
    start_idx = cur_bar_idx - search_range
    xs = np.array(history.index)[start_idx: cur_bar_idx]
    ys = line.interpolate(xs)
    if line.mode == LineMode.RESISTANCE:
        wrap = np.all(history.high[start_idx: cur_bar_idx] <= ys)
    elif line.mode == LineMode.SUPPORT:
        wrap = np.all(history.low[start_idx: cur_bar_idx] >= ys)
    else:
        raise ValueError("Invalid Line Mode", line.mode)
    return wrap