Code Style
Reference: https://refactoring.guru/refactoring/smells
Please follow the best coding practices and the following guidelines when contributing to the project.
Variable Naming
Follow language specific naming conventions.
In Python, use
snake_casefor variable names, function names. UseCamelCasefor class names.In C++ and JavaScript, use
camelCasefor variable names, function names. UseCamelCasefor class names.
Use meaningful names.
Avoid short and meaningless names like
a,b,c,x,y,z,foo,bar,baz,aa,ab
Exception:
i,j,kare OK for loop indexException:
x,yare OK for coordinatesException: Math equations are OK, e.g.
y = a * x + bIt depends, make sure it’s meaningful
Watch this https://youtu.be/-J3wNP6u5YU
Type Hints
All functions and methods should have type hints (for both paraemter and return type).
For example
# Bad
def add(a, b):
return a + b
# Good
def add(a: int, b: int) -> int:
return a + b
# it's OK to omit return type if it returns None
def log(msg: str):
print(msg)
Use flake8 to find all missing type hints
# first make sure flake8 and flake8-annotations are installed
flake8 --select=ANN001,ANN201 --suppress-none-returning --count --statistics .
Our CI will fail if there are any missing type hints.
Code Formatting
Format your code. In VSCode, you can use Black formatter to format Python. PyCharm comes with a built-in formatter. Same for other languages.
Docstrings
Please write docstrings for functions and methods unless the code is trivial enough.
In VSCode, you can use extension autoDocstring - Python Docstring Generator to generate docstrings template.
Write Structured Code
Good code should explain itself, even without comments.
Avoid passing around strings, dict, pd.DataFrame; also int/float (unless they are numeric data). Categorical data should be represented by enum or constant variable.
Constant Variable and Enum
Never hard code a string (number as well) in your code if they represent categories. Use constant variable or enum instead.
# Bad
def __call__(time_interval: int) -> BarHistory:
if time_interval == 5:
pass
...
# Good
class TimeInterval(Enum):
FIVE_MINUTE = 5
FIFTEEN_MINUTE = 15
def __call__(time_interval: TimeInterval) -> BarHistory:
if self.time_interval == TimeInterval.FIVE_MINUTE:
resample_period = "5T" # OK
elif self.time_interval == TimeInterval.FIFTEEN_MINUTE:
resample_period = "15T" # OK
...
We only have finite number of time intervals, they are categories. If we pass 5 around, we don’t know what it means. In another function when we see a time_interval var of type int, we don’t know what it means (5 minutes? hours? seconds?). But if we pass TimeInterval.FIVE_MINUTE around, we know it’s a time interval of 5 minutes. We get both the semantic meaning and auto completion (intellisense).
Careful with Pandas DataFrame
Do not pass pd.DataFrame around. We don’t know the columns and types of a data frame. It’s OK to create and pass DataFrame around within a function because the author knows the context very well and it won’t affect another function; but when a data frame is passed to another function, it’s very hard to understand the code for another programmer without running the code.
Instead, use dataclasses or pydantic to define data structures.
Read every python file in [quantlib/definition] folder to learn how to define data structures.
For example, quantlib/definition/bar.py contains a BarHistory class. It is based on pydantic model. It represents a list of bars, and the corresponding pandas data frame. The data frame can be accessed by bar_history.df.
Columns can be accessed with
bar_history.timebar_history.openbar_history.closebar_history.highbar_history.low
They are all pd.Series objects.
The benefit of this is, with bar_history object, we know exactly the columns and types of the data frame with intellisense without reading the context. This prevent developers from messing with data frames.
If a data frame is used, developers don’t know what’s inside without reading the code or even running the code. I sometimes have to use a debugger to see what’s inside a data frame. Pandas data frame can be modified easily, and it’s hard to track the changes. If one renamed a column in a function, and passed the data frame to another function, it’s very hard to maintain the other function.
Writing history['High'] while the column is actually high could result in a serious and hard-to-debug bug. When we hard code columns in string, it’s also easier to make typos. e.g. Spelling business_day as busines_day could cost time to debug as they look similar in the first glance.
For a data structure that will be used throughout the entire workflow, it’s crucial to keep the data structure clean and consistent.
Careful with Dict
Also see quantlib/definition/config.py for how dataclass can be used in a similar way. Instead of passing a dict around, we can pass a Config object around. It’s much easier to understand the code. A dict has no intellisense (auto-complete), but a Config object has.
If I give you the following code, the only way to know what’s inside config is to read the code, run the code, print out the config dict, or use a debugger.
def run_app(config):
pass
Now, if I tell you config is a dict, how do you know what’s inside config? What is key? What is value? What are the types?
def run_app(config: dict):
pass
Even if I give you the key and value type of the config dict, you still don’t know what’s inside.
You have to read the code to understand what’s inside. Plus, the values can have different types, resulting in an union type.
def run_app(config: dict[str, int | str]):
pass
But if I give you the following code, you know exactly what’s inside config without reading the code. You can use intellisense to see what’s inside config; if class Config is defined in another file, IDE lets you jump to the definition of Config class in one click.
from dataclasses import dataclass
@dataclass
class Config:
time_interval: int
start_date: str
end_date: str
def run_app(config: Config):
pass
Sample Code
The following code roughly demonstrates the idea to write structured code.
For line, instead of passing a slope or dict around, we encapsulate the 2 associated bars, mode, slope and intercept into a
Lineobject. We can pass theLineobject around. We can also add more attributes to theLineobject if needed.Lineis adataclass, it not only can contain data, but also methods, e.g.line.interpolate(xs)
When representing mode, instead of using string or number, we use
LineModeenum. We can add more modes in the future. Intellisense will provide auto completion.For
BarHistory: we know for sure it has index, high and low columns from IDE intellisense, without needing to worry about the column names or letter case.Docstring helps others understand the code without reading logic of the code. Sometimes variable names are vague, but docstring can help others understand the code.
def line_wrap(history: BarHistory, line: Line, search_range: int, cur_bar_idx: int) -> bool:
"""Check if the line wraps other bars within the search range in the given bar history.
:param history: Bar History object representing the price history
:type history: BarHistory
:param line: A line consists of two bars, with slope, intercept and mode
:type line: Line
:param search_range: distance to search
:type search_range: int
:param cur_bar_idx: Index of current bar
:type cur_bar_idx: int
:raises ValueError: Invalid Line Mode
:return: Whether the line wraps other bars within the search range
:rtype: bool
"""
start_idx = cur_bar_idx - search_range
xs = np.array(history.index)[start_idx: cur_bar_idx]
ys = line.interpolate(xs)
if line.mode == LineMode.RESISTANCE:
wrap = np.all(history.high[start_idx: cur_bar_idx] <= ys)
elif line.mode == LineMode.SUPPORT:
wrap = np.all(history.low[start_idx: cur_bar_idx] >= ys)
else:
raise ValueError("Invalid Line Mode", line.mode)
return wrap