# Code Style

Reference: https://refactoring.guru/refactoring/smells

Please follow the best coding practices and the following guidelines when contributing to the project.

## Variable Naming

1. Follow language specific naming conventions.
   1. In Python, use `snake_case` for variable names, function names. Use `CamelCase` for class names.
   2. In C++ and JavaScript, use `camelCase` for variable names, function names. Use `CamelCase` for class names.
2. Use meaningful names.
   - Avoid short and meaningless names like `a`, `b`, `c`, `x`, `y`, `z`, `foo`, `bar`, `baz`, `aa`, `ab`
   1. Exception: `i`, `j`, `k` are OK for loop index
   2. Exception: `x`, `y` are OK for coordinates
   3. Exception: Math equations are OK, e.g. `y = a * x + b`
   4. It depends, make sure it's meaningful

Watch this https://youtu.be/-J3wNP6u5YU

## Type Hints

All functions and methods should have type hints (for both paraemter and return type).

For example

```python
# Bad
def add(a, b):
    return a + b

# Good
def add(a: int, b: int) -> int:
    return a + b

# it's OK to omit return type if it returns None
def log(msg: str):
    print(msg)
```

Use `flake8` to find all missing type hints

```bash
# first make sure flake8 and flake8-annotations are installed
flake8 --select=ANN001,ANN201 --suppress-none-returning --count --statistics .
```

Our CI will fail if there are any missing type hints.

## Code Formatting

Format your code. In VSCode, you can use Black formatter to format Python. PyCharm comes with a built-in formatter. Same for other languages.

## Docstrings

Please write docstrings for functions and methods unless the code is trivial enough.

In VSCode, you can use extension [autoDocstring - Python Docstring Generator](https://marketplace.visualstudio.com/items?itemName=njpwerner.autodocstring) to generate docstrings template.

## Write Structured Code

Good code should explain itself, even without comments. 

Avoid passing around strings, dict, `pd.DataFrame`; also int/float (unless they are numeric data). Categorical data should be represented by enum or constant variable.

### Constant Variable and Enum

Never hard code a string (number as well) in your code if they represent categories. Use constant variable or enum instead.

```python
# Bad
def __call__(time_interval: int) -> BarHistory:
    if time_interval == 5:
        pass
    ...

# Good
class TimeInterval(Enum):
    FIVE_MINUTE = 5
    FIFTEEN_MINUTE = 15

def __call__(time_interval: TimeInterval) -> BarHistory:
    if self.time_interval == TimeInterval.FIVE_MINUTE:
        resample_period = "5T" # OK
    elif self.time_interval == TimeInterval.FIFTEEN_MINUTE:
        resample_period = "15T" # OK
    ...
```

We only have finite number of time intervals, they are categories. If we pass `5` around, we don't know what it means. In another function when we see a `time_interval` var of type int, we don't know what it means (5 minutes? hours? seconds?). But if we pass `TimeInterval.FIVE_MINUTE` around, we know it's a time interval of 5 minutes. We get both the semantic meaning and auto completion (intellisense).

### Careful with Pandas DataFrame

Do not pass `pd.DataFrame` around. We don't know the columns and types of a data frame. It's OK to create and pass `DataFrame` around within a function because the author knows the context very well and it won't affect another function; but when a data frame is passed to another function, it's very hard to understand the code for another programmer without running the code.

Instead, use `dataclasses` or `pydantic` to define data structures.

Read every python file in [quantlib/definition] folder to learn how to define data structures.

For example, [quantlib/definition/bar.py](../../quantlib/definition/bar.py) contains a `BarHistory` class. It is based on `pydantic` model. It represents a list of bars, and the corresponding pandas data frame. The data frame can be accessed by `bar_history.df`.

Columns can be accessed with

- `bar_history.time`
- `bar_history.open`
- `bar_history.close`
- `bar_history.high`
- `bar_history.low`

They are all `pd.Series` objects.

The benefit of this is, with `bar_history` object, we know exactly the columns and types of the data frame with intellisense without reading the context. This prevent developers from messing with data frames.

If a data frame is used, developers don't know what's inside without reading the code or even running the code. I sometimes have to use a debugger to see what's inside a data frame. Pandas data frame can be modified easily, and it's hard to track the changes. If one renamed a column in a function, and passed the data frame to another function, it's very hard to maintain the other function.

Writing `history['High']` while the column is actually `high` could result in a serious and hard-to-debug bug. When we hard code columns in string, it's also easier to make typos. e.g. Spelling `business_day` as `busines_day` could cost time to debug as they look similar in the first glance.

For a data structure that will be used throughout the entire workflow, it's crucial to keep the data structure clean and consistent.

### Careful with Dict

Also see [quantlib/definition/config.py](../../quantlib/definition/config.py) for how `dataclass` can be used in a similar way. Instead of passing a `dict` around, we can pass a `Config` object around. It's much easier to understand the code. A dict has no intellisense (auto-complete), but a `Config` object has.

If I give you the following code, the only way to know what's inside `config` is to read the code, run the code, print out the `config` dict, or use a debugger.

```python
def run_app(config):
    pass
```

Now, if I tell you `config` is a `dict`, how do you know what's inside `config`? What is key? What is value? What are the types?

```python
def run_app(config: dict):
    pass
```

Even if I give you the key and value type of the `config` dict, you still don't know what's inside. 
You have to read the code to understand what's inside. Plus, the values can have different types, resulting in an union type.

```python
def run_app(config: dict[str, int | str]):
    pass
```

But if I give you the following code, you know exactly what's inside `config` without reading the code. You can use intellisense to see what's inside `config`; if `class Config` is defined in another file, IDE lets you jump to the definition of `Config` class in one click.

```python
from dataclasses import dataclass

@dataclass
class Config:
    time_interval: int
    start_date: str
    end_date: str

def run_app(config: Config):
    pass
```

## Sample Code

The following code roughly demonstrates the idea to write structured code.

- For line, instead of passing a slope or dict around, we encapsulate the 2 associated bars, mode, slope and intercept into a `Line` object. We can pass the `Line` object around. We can also add more attributes to the `Line` object if needed.
  - `Line` is a `dataclass`, it not only can contain data, but also methods, e.g. `line.interpolate(xs)`
- When representing mode, instead of using string or number, we use `LineMode` enum. We can add more modes in the future. Intellisense will provide auto completion.
- For `BarHistory`: we know for sure it has index, high and low columns from IDE intellisense, without needing to worry about the column names or letter case.
- Docstring helps others understand the code without reading logic of the code. Sometimes variable names are vague, but docstring can help others understand the code.

```python
def line_wrap(history: BarHistory, line: Line, search_range: int, cur_bar_idx: int) -> bool:
    """Check if the line wraps other bars within the search range in the given bar history.

    :param history: Bar History object representing the price history
    :type history: BarHistory
    :param line: A line consists of two bars, with slope, intercept and mode
    :type line: Line
    :param search_range: distance to search
    :type search_range: int
    :param cur_bar_idx: Index of current bar
    :type cur_bar_idx: int
    :raises ValueError: Invalid Line Mode
    :return: Whether the line wraps other bars within the search range
    :rtype: bool
    """
    start_idx = cur_bar_idx - search_range
    xs = np.array(history.index)[start_idx: cur_bar_idx]
    ys = line.interpolate(xs)
    if line.mode == LineMode.RESISTANCE:
        wrap = np.all(history.high[start_idx: cur_bar_idx] <= ys)
    elif line.mode == LineMode.SUPPORT:
        wrap = np.all(history.low[start_idx: cur_bar_idx] >= ys)
    else:
        raise ValueError("Invalid Line Mode", line.mode)
    return wrap
```