# Code Style Reference: https://refactoring.guru/refactoring/smells Please follow the best coding practices and the following guidelines when contributing to the project. ## Variable Naming 1. Follow language specific naming conventions. 1. In Python, use `snake_case` for variable names, function names. Use `CamelCase` for class names. 2. In C++ and JavaScript, use `camelCase` for variable names, function names. Use `CamelCase` for class names. 2. Use meaningful names. - Avoid short and meaningless names like `a`, `b`, `c`, `x`, `y`, `z`, `foo`, `bar`, `baz`, `aa`, `ab` 1. Exception: `i`, `j`, `k` are OK for loop index 2. Exception: `x`, `y` are OK for coordinates 3. Exception: Math equations are OK, e.g. `y = a * x + b` 4. It depends, make sure it's meaningful Watch this https://youtu.be/-J3wNP6u5YU ## Type Hints All functions and methods should have type hints (for both paraemter and return type). For example ```python # Bad def add(a, b): return a + b # Good def add(a: int, b: int) -> int: return a + b # it's OK to omit return type if it returns None def log(msg: str): print(msg) ``` Use `flake8` to find all missing type hints ```bash # first make sure flake8 and flake8-annotations are installed flake8 --select=ANN001,ANN201 --suppress-none-returning --count --statistics . ``` Our CI will fail if there are any missing type hints. ## Code Formatting Format your code. In VSCode, you can use Black formatter to format Python. PyCharm comes with a built-in formatter. Same for other languages. ## Docstrings Please write docstrings for functions and methods unless the code is trivial enough. In VSCode, you can use extension [autoDocstring - Python Docstring Generator](https://marketplace.visualstudio.com/items?itemName=njpwerner.autodocstring) to generate docstrings template. ## Write Structured Code Good code should explain itself, even without comments. Avoid passing around strings, dict, `pd.DataFrame`; also int/float (unless they are numeric data). Categorical data should be represented by enum or constant variable. ### Constant Variable and Enum Never hard code a string (number as well) in your code if they represent categories. Use constant variable or enum instead. ```python # Bad def __call__(time_interval: int) -> BarHistory: if time_interval == 5: pass ... # Good class TimeInterval(Enum): FIVE_MINUTE = 5 FIFTEEN_MINUTE = 15 def __call__(time_interval: TimeInterval) -> BarHistory: if self.time_interval == TimeInterval.FIVE_MINUTE: resample_period = "5T" # OK elif self.time_interval == TimeInterval.FIFTEEN_MINUTE: resample_period = "15T" # OK ... ``` We only have finite number of time intervals, they are categories. If we pass `5` around, we don't know what it means. In another function when we see a `time_interval` var of type int, we don't know what it means (5 minutes? hours? seconds?). But if we pass `TimeInterval.FIVE_MINUTE` around, we know it's a time interval of 5 minutes. We get both the semantic meaning and auto completion (intellisense). ### Careful with Pandas DataFrame Do not pass `pd.DataFrame` around. We don't know the columns and types of a data frame. It's OK to create and pass `DataFrame` around within a function because the author knows the context very well and it won't affect another function; but when a data frame is passed to another function, it's very hard to understand the code for another programmer without running the code. Instead, use `dataclasses` or `pydantic` to define data structures. Read every python file in [quantlib/definition] folder to learn how to define data structures. For example, [quantlib/definition/bar.py](../../quantlib/definition/bar.py) contains a `BarHistory` class. It is based on `pydantic` model. It represents a list of bars, and the corresponding pandas data frame. The data frame can be accessed by `bar_history.df`. Columns can be accessed with - `bar_history.time` - `bar_history.open` - `bar_history.close` - `bar_history.high` - `bar_history.low` They are all `pd.Series` objects. The benefit of this is, with `bar_history` object, we know exactly the columns and types of the data frame with intellisense without reading the context. This prevent developers from messing with data frames. If a data frame is used, developers don't know what's inside without reading the code or even running the code. I sometimes have to use a debugger to see what's inside a data frame. Pandas data frame can be modified easily, and it's hard to track the changes. If one renamed a column in a function, and passed the data frame to another function, it's very hard to maintain the other function. Writing `history['High']` while the column is actually `high` could result in a serious and hard-to-debug bug. When we hard code columns in string, it's also easier to make typos. e.g. Spelling `business_day` as `busines_day` could cost time to debug as they look similar in the first glance. For a data structure that will be used throughout the entire workflow, it's crucial to keep the data structure clean and consistent. ### Careful with Dict Also see [quantlib/definition/config.py](../../quantlib/definition/config.py) for how `dataclass` can be used in a similar way. Instead of passing a `dict` around, we can pass a `Config` object around. It's much easier to understand the code. A dict has no intellisense (auto-complete), but a `Config` object has. If I give you the following code, the only way to know what's inside `config` is to read the code, run the code, print out the `config` dict, or use a debugger. ```python def run_app(config): pass ``` Now, if I tell you `config` is a `dict`, how do you know what's inside `config`? What is key? What is value? What are the types? ```python def run_app(config: dict): pass ``` Even if I give you the key and value type of the `config` dict, you still don't know what's inside. You have to read the code to understand what's inside. Plus, the values can have different types, resulting in an union type. ```python def run_app(config: dict[str, int | str]): pass ``` But if I give you the following code, you know exactly what's inside `config` without reading the code. You can use intellisense to see what's inside `config`; if `class Config` is defined in another file, IDE lets you jump to the definition of `Config` class in one click. ```python from dataclasses import dataclass @dataclass class Config: time_interval: int start_date: str end_date: str def run_app(config: Config): pass ``` ## Sample Code The following code roughly demonstrates the idea to write structured code. - For line, instead of passing a slope or dict around, we encapsulate the 2 associated bars, mode, slope and intercept into a `Line` object. We can pass the `Line` object around. We can also add more attributes to the `Line` object if needed. - `Line` is a `dataclass`, it not only can contain data, but also methods, e.g. `line.interpolate(xs)` - When representing mode, instead of using string or number, we use `LineMode` enum. We can add more modes in the future. Intellisense will provide auto completion. - For `BarHistory`: we know for sure it has index, high and low columns from IDE intellisense, without needing to worry about the column names or letter case. - Docstring helps others understand the code without reading logic of the code. Sometimes variable names are vague, but docstring can help others understand the code. ```python def line_wrap(history: BarHistory, line: Line, search_range: int, cur_bar_idx: int) -> bool: """Check if the line wraps other bars within the search range in the given bar history. :param history: Bar History object representing the price history :type history: BarHistory :param line: A line consists of two bars, with slope, intercept and mode :type line: Line :param search_range: distance to search :type search_range: int :param cur_bar_idx: Index of current bar :type cur_bar_idx: int :raises ValueError: Invalid Line Mode :return: Whether the line wraps other bars within the search range :rtype: bool """ start_idx = cur_bar_idx - search_range xs = np.array(history.index)[start_idx: cur_bar_idx] ys = line.interpolate(xs) if line.mode == LineMode.RESISTANCE: wrap = np.all(history.high[start_idx: cur_bar_idx] <= ys) elif line.mode == LineMode.SUPPORT: wrap = np.all(history.low[start_idx: cur_bar_idx] >= ys) else: raise ValueError("Invalid Line Mode", line.mode) return wrap ```