feat: User-Interface I18n System (#728)

## Summary **What is the goal of this PR?** This PR introduces Internationalization (i18n) support, enabling users to switch the UI language dynamically. **What changes are included?** - Core Logic: Added I18n class (`lib/I18n/I18n.h/cpp`) to manage language state and string retrieval. - Data Structures: - `lib/I18n/I18nStrings.h/cpp`: Static string arrays for each supported language. - `lib/I18n/I18nKeys.h`: Enum definitions for type-safe string access. - `lib/I18n/translations.csv`: single source of truth. - Documentation: Added `docs/i18n.md` detailing the workflow for developers and translators. - New Settings activity: `src/activities/settings/LanguageSelectActivity.h/cpp` ## Additional Context This implementation (building on concepts from #505) prioritizes performance and memory efficiency. The core approach is to store all localized strings for each language in dedicated arrays and access them via enums. This provides O(1) access with zero runtime overhead, and avoids the heap allocations, hashing, and collision handling required by `std::map` or `std::unordered_map`. The main trade-off is that enums and string arrays must remain perfectly synchronized—any mismatch would result in incorrect strings being displayed in the UI. To eliminate this risk, I added a Python script that automatically generates `I18nStrings.h/.cpp` and `I18nKeys.h` from a CSV file, which will serve as the single source of truth for all translations. The full design and workflow are documented in `docs/i18n.md`. ### Next Steps - [x] Python script `generate_i18n.py` to auto-generate C++ files from CSV - [x] Populate translations.csv with initial translations. Currently available translations: English, Español, Français, Deutsch, Čeština, Português (Brasil), Русский, Svenska. Thanks, community! **Status:** EDIT: ready to be merged. As a proof of concept, the SPANISH strings currently mirror the English ones, but are fully uppercased. --- ### AI Usage Did you use AI tools to help write this code? _**< PARTIALLY >**_ I used AI for the black work of replacing strings with I18n references across the project, and for generating the documentation. EDIT: also some help with merging changes from master. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: yeyeto2788 <juanernestobiondi@gmail.com>
2026-02-16 15:28:42 +02:00
parent 3d47c081f2
commit 7ba5978848
52 changed files with 4516 additions and 379 deletions
--- a/scripts/gen_i18n.py
+++ b/scripts/gen_i18n.py
@@ -0,0 +1,620 @@
+#!/usr/bin/env python3
+"""
+Generate I18n C++ files from per-language YAML translations.
+
+Reads YAML files from a translations directory (one file per language) and generates:
+- I18nKeys.h:     Language enum, StrId enum, helper functions
+- I18nStrings.h:  String array declarations
+- I18nStrings.cpp: String array definitions with all translations
+
+Each YAML file must contain:
+  _language_name: "Native Name"     (e.g. "Español")
+  _language_code: "ENUM_NAME"       (e.g. "SPANISH")
+  STR_KEY: "translation text"
+
+The English file is the reference. Missing keys in other languages are
+automatically filled from English, with a warning.
+
+Usage:
+    python gen_i18n.py <translations_dir> <output_dir>
+
+Example:
+    python gen_i18n.py lib/I18n/translations lib/I18n/
+"""
+
+import sys
+import os
+import re
+from pathlib import Path
+from typing import List, Dict, Tuple
+
+
+# ---------------------------------------------------------------------------
+# YAML file reading (simple key: "value" format, no PyYAML dependency)
+# ---------------------------------------------------------------------------
+
+def _unescape_yaml_value(raw: str, filepath: str = "", line_num: int = 0) -> str:
+    """
+    Process escape sequences in a YAML value string.
+
+    Recognized escapes:  \\\\  →  \\       \\"  →  "       \\n  →  newline
+    """
+    result: List[str] = []
+    i = 0
+    while i < len(raw):
+        if raw[i] == "\\" and i + 1 < len(raw):
+            nxt = raw[i + 1]
+            if nxt == "\\":
+                result.append("\\")
+            elif nxt == '"':
+                result.append('"')
+            elif nxt == "n":
+                result.append("\n")
+            else:
+                raise ValueError(
+                    f"{filepath}:{line_num}: unknown escape '\\{nxt}'"
+                )
+            i += 2
+        else:
+            result.append(raw[i])
+            i += 1
+    return "".join(result)
+
+
+def parse_yaml_file(filepath: str) -> Dict[str, str]:
+    """
+    Parse a simple YAML file of the form:
+        key: "value"
+
+    Only supports flat key-value pairs with quoted string values.
+    Aborts on formatting errors.
+    """
+    result = {}
+    with open(filepath, "r", encoding="utf-8") as f:
+        for line_num, raw_line in enumerate(f, start=1):
+            line = raw_line.rstrip("\n\r")
+
+            if not line.strip():
+                continue
+
+            match = re.match(r'^([A-Za-z_][A-Za-z0-9_]*)\s*:\s*"(.*)"$', line)
+            if not match:
+                raise ValueError(
+                    f"{filepath}:{line_num}: bad format: {line!r}\n"
+                    f'  Expected:  KEY: "value"'
+                )
+
+            key = match.group(1)
+            raw_value = match.group(2)
+
+            # Un-escape: process character by character to handle
+            # \\, \", and \n sequences correctly
+            value = _unescape_yaml_value(raw_value, filepath, line_num)
+
+            if key in result:
+                raise ValueError(f"{filepath}:{line_num}: duplicate key '{key}'")
+
+            result[key] = value
+
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Load all languages from a directory of YAML files
+# ---------------------------------------------------------------------------
+
+def load_translations(
+    translations_dir: str,
+) -> Tuple[List[str], List[str], List[str], Dict[str, List[str]]]:
+    """
+    Read every YAML file in *translations_dir* and return:
+        language_codes   e.g. ["ENGLISH", "SPANISH", ...]
+        language_names   e.g. ["English", "Español", ...]
+        string_keys      ordered list of STR_* keys (from English)
+        translations     {key: [translation_per_language]}
+
+    English is always first;
+    """
+    yaml_dir = Path(translations_dir)
+    if not yaml_dir.is_dir():
+        raise FileNotFoundError(f"Translations directory not found: {translations_dir}")
+
+    yaml_files = sorted(yaml_dir.glob("*.yaml"))
+    if not yaml_files:
+        raise FileNotFoundError(f"No .yaml files found in {translations_dir}")
+
+    # Parse every file
+    parsed: Dict[str, Dict[str, str]] = {}
+    for yf in yaml_files:
+        parsed[yf.name] = parse_yaml_file(str(yf))
+
+    # Identify the English file (must exist)
+    english_file = None
+    for name, data in parsed.items():
+        if data.get("_language_code", "").upper() == "ENGLISH":
+            english_file = name
+            break
+
+    if english_file is None:
+        raise ValueError("No YAML file with _language_code: ENGLISH found")
+
+    # Order: English first, then by _order metadata (falls back to filename)
+    def sort_key(fname: str) -> Tuple[int, int, str]:
+        """English always first (0), then by _order, then by filename."""
+        if fname == english_file:
+            return (0, 0, fname)
+        order = parsed[fname].get("_order", "999")
+        try:
+            order_int = int(order)
+        except ValueError:
+            order_int = 999
+        return (1, order_int, fname)
+
+    ordered_files = sorted(parsed, key=sort_key)
+
+    # Extract metadata
+    language_codes: List[str] = []
+    language_names: List[str] = []
+    for fname in ordered_files:
+        data = parsed[fname]
+        code = data.get("_language_code")
+        name = data.get("_language_name")
+        if not code or not name:
+            raise ValueError(f"{fname}: missing _language_code or _language_name")
+        language_codes.append(code)
+        language_names.append(name)
+
+    # String keys come from English (order matters)
+    english_data = parsed[english_file]
+    string_keys = [k for k in english_data if not k.startswith("_")]
+
+    # Validate all keys are valid C++ identifiers
+    for key in string_keys:
+        if not re.match(r"^[a-zA-Z_][a-zA-Z0-9_]*$", key):
+            raise ValueError(f"Invalid C++ identifier in English file: '{key}'")
+
+    # Build translations dict, filling missing keys from English
+    translations: Dict[str, List[str]] = {}
+    for key in string_keys:
+        row: List[str] = []
+        for fname in ordered_files:
+            data = parsed[fname]
+            value = data.get(key, "")
+            if not value.strip() and fname != english_file:
+                value = english_data[key]
+                lang_code = parsed[fname].get("_language_code", fname)
+                print(f"  INFO: '{key}' missing in {lang_code}, using English fallback")
+            row.append(value)
+        translations[key] = row
+
+    # Warn about extra keys in non-English files
+    for fname in ordered_files:
+        if fname == english_file:
+            continue
+        data = parsed[fname]
+        extra = [k for k in data if not k.startswith("_") and k not in english_data]
+        if extra:
+            lang_code = data.get("_language_code", fname)
+            print(f"  WARNING: {lang_code} has keys not in English: {', '.join(extra)}")
+
+    print(f"Loaded {len(language_codes)} languages, {len(string_keys)} string keys")
+    return language_codes, language_names, string_keys, translations
+
+
+# ---------------------------------------------------------------------------
+# C++ string escaping
+# ---------------------------------------------------------------------------
+
+LANG_ABBREVIATIONS = {
+    "english": "EN",
+    "español": "ES", "espanol": "ES",
+    "italiano": "IT",
+    "svenska": "SV",
+    "français": "FR", "francais": "FR",
+    "deutsch": "DE", "german": "DE",
+    "português": "PT", "portugues": "PT", "português (brasil)": "PO",
+    "中文": "ZH", "chinese": "ZH",
+    "日本語": "JA", "japanese": "JA",
+    "한국어": "KO", "korean": "KO",
+    "русский": "RU", "russian": "RU",
+    "العربية": "AR", "arabic": "AR",
+    "עברית": "HE", "hebrew": "HE",
+    "فارسی": "FA", "persian": "FA",
+    "čeština": "CZ",
+}
+
+
+def get_lang_abbreviation(lang_code: str, lang_name: str) -> str:
+    """Return a 2-letter abbreviation for a language."""
+    lower = lang_name.lower()
+    if lower in LANG_ABBREVIATIONS:
+        return LANG_ABBREVIATIONS[lower]
+    return lang_code[:2].upper()
+
+
+def escape_cpp_string(s: str) -> List[str]:
+    r"""
+    Convert *s* into one or more C++ string literal segments.
+
+    Non-ASCII characters are emitted as \xNN hex sequences. After each
+    hex escape a new segment is started so the compiler doesn't merge
+    subsequent hex digits into the escape.
+
+    Returns a list of string segments (without quotes). For simple ASCII
+    strings this is a single-element list.
+    """
+    if not s:
+        return [""]
+
+    s = s.replace("\n", "\\n")
+
+    # Build a flat list of "tokens", where each token is either a regular
+    # character sequence or a hex escape.  A segment break happens after
+    # every hex escape.
+    segments: List[str] = []
+    current: List[str] = []
+    i = 0
+
+    def _flush() -> None:
+        segments.append("".join(current))
+        current.clear()
+
+    while i < len(s):
+        ch = s[i]
+
+        if ch == "\\" and i + 1 < len(s):
+            nxt = s[i + 1]
+            if nxt in "ntr\"\\":
+                current.append(ch + nxt)
+                i += 2
+            elif nxt == "x" and i + 3 < len(s):
+                current.append(s[i : i + 4])
+                _flush()                       # segment break after hex
+                i += 4
+            else:
+                current.append("\\\\")
+                i += 1
+        elif ch == '"':
+            current.append('\\"')
+            i += 1
+        elif ord(ch) < 128:
+            current.append(ch)
+            i += 1
+        else:
+            for byte in ch.encode("utf-8"):
+                current.append(f"\\x{byte:02X}")
+                _flush()                       # segment break after hex
+            i += 1
+
+    # Flush remaining content
+    _flush()
+
+    return segments
+
+
+def format_cpp_string_literal(segments: List[str], indent: str = "    ") -> List[str]:
+    """
+    Format string segments (from escape_cpp_string) as indented C++ string
+    literal lines, each wrapped in quotes.
+    Also wraps long segments to respect ~120 column limit.
+    """
+    # Effective limit for content: 120 - 4 (indent) - 2 (quotes) - 1 (comma/safety) = 113
+    # Using 113 to match clang-format exactly (120 - 4 - 2 - 1)
+    MAX_CONTENT_LEN = 113
+
+    lines: List[str] = []
+
+    for seg in segments:
+        # Short segment (e.g. hex escape or short text)
+        if len(seg) <= MAX_CONTENT_LEN:
+            lines.append(f'{indent}"{seg}"')
+            continue
+
+        # Long segment - wrap it
+        current = seg
+        while len(current) > MAX_CONTENT_LEN:
+            # Find best split point
+            # Scan forward to find last space <= MAX_CONTENT_LEN
+            last_space = -1
+            idx = 0
+            while idx <= MAX_CONTENT_LEN and idx < len(current):
+                if current[idx] == ' ':
+                    last_space = idx
+
+                # Handle escapes to step correctly
+                if current[idx] == '\\':
+                    idx += 2
+                else:
+                    idx += 1
+
+            # If we found a space, split after it
+            if last_space != -1:
+                # Include the space in the first line
+                split_point = last_space + 1
+                lines.append(f'{indent}"{current[:split_point]}"')
+                current = current[split_point:]
+            else:
+                # No space, forced break at MAX_CONTENT_LEN (or slightly less)
+                cut_at = MAX_CONTENT_LEN
+                # Don't cut in the middle of an escape sequence
+                if current[cut_at - 1] == '\\':
+                    cut_at -= 1
+
+                lines.append(f'{indent}"{current[:cut_at]}"')
+                current = current[cut_at:]
+
+        if current:
+            lines.append(f'{indent}"{current}"')
+
+    return lines
+
+
+# ---------------------------------------------------------------------------
+# Character-set computation
+# ---------------------------------------------------------------------------
+
+def compute_character_set(translations: Dict[str, List[str]], lang_index: int) -> str:
+    """Return a sorted string of every unique character used in a language."""
+    chars = set()
+    for values in translations.values():
+        for ch in values[lang_index]:
+            chars.add(ord(ch))
+    return "".join(chr(cp) for cp in sorted(chars))
+
+
+# ---------------------------------------------------------------------------
+# Code generators
+# ---------------------------------------------------------------------------
+
+def generate_keys_header(
+    languages: List[str],
+    language_names: List[str],
+    string_keys: List[str],
+    output_path: str,
+) -> None:
+    """Generate I18nKeys.h."""
+    lines: List[str] = [
+        "#pragma once",
+        "#include <cstdint>",
+        "",
+        "// THIS FILE IS AUTO-GENERATED BY gen_i18n.py. DO NOT EDIT.",
+        "",
+        "// Forward declaration for string arrays",
+        "namespace i18n_strings {",
+    ]
+
+    for code, name in zip(languages, language_names):
+        abbrev = get_lang_abbreviation(code, name)
+        lines.append(f"extern const char* const STRINGS_{abbrev}[];")
+
+    lines.append("}  // namespace i18n_strings")
+    lines.append("")
+
+    # Language enum
+    lines.append("// Language enum")
+    lines.append("enum class Language : uint8_t {")
+    for i, lang in enumerate(languages):
+        lines.append(f"  {lang} = {i},")
+    lines.append("  _COUNT")
+    lines.append("};")
+    lines.append("")
+
+    # Extern declarations
+    lines.append("// Language display names (defined in I18nStrings.cpp)")
+    lines.append("extern const char* const LANGUAGE_NAMES[];")
+    lines.append("")
+    lines.append("// Character sets for each language (defined in I18nStrings.cpp)")
+    lines.append("extern const char* const CHARACTER_SETS[];")
+    lines.append("")
+
+    # StrId enum
+    lines.append("// String IDs")
+    lines.append("enum class StrId : uint16_t {")
+    for key in string_keys:
+        lines.append(f"  {key},")
+    lines.append("  // Sentinel - must be last")
+    lines.append("  _COUNT")
+    lines.append("};")
+    lines.append("")
+
+    # getStringArray helper
+    lines.append("// Helper function to get string array for a language")
+    lines.append("inline const char* const* getStringArray(Language lang) {")
+    lines.append("  switch (lang) {")
+    for code, name in zip(languages, language_names):
+        abbrev = get_lang_abbreviation(code, name)
+        lines.append(f"    case Language::{code}:")
+        lines.append(f"      return i18n_strings::STRINGS_{abbrev};")
+    first_abbrev = get_lang_abbreviation(languages[0], language_names[0])
+    lines.append("    default:")
+    lines.append(f"      return i18n_strings::STRINGS_{first_abbrev};")
+    lines.append("  }")
+    lines.append("}")
+    lines.append("")
+
+    # getLanguageCount helper (single line to match checked-in format)
+    lines.append("// Helper function to get language count")
+    lines.append(
+        "constexpr uint8_t getLanguageCount() "
+        "{ return static_cast<uint8_t>(Language::_COUNT); }"
+    )
+
+    _write_file(output_path, lines)
+
+
+def generate_strings_header(
+    languages: List[str],
+    language_names: List[str],
+    output_path: str,
+) -> None:
+    """Generate I18nStrings.h."""
+    lines: List[str] = [
+        "#pragma once",
+        '#include <string>',
+        "",
+        '#include "I18nKeys.h"',
+        "",
+        "// THIS FILE IS AUTO-GENERATED BY gen_i18n.py. DO NOT EDIT.",
+        "",
+        "namespace i18n_strings {",
+        "",
+    ]
+
+    for code, name in zip(languages, language_names):
+        abbrev = get_lang_abbreviation(code, name)
+        lines.append(f"extern const char* const STRINGS_{abbrev}[];")
+
+    lines.append("")
+    lines.append("}  // namespace i18n_strings")
+
+    _write_file(output_path, lines)
+
+
+def generate_strings_cpp(
+    languages: List[str],
+    language_names: List[str],
+    string_keys: List[str],
+    translations: Dict[str, List[str]],
+    output_path: str,
+) -> None:
+    """Generate I18nStrings.cpp."""
+    lines: List[str] = [
+        '#include "I18nStrings.h"',
+        "",
+        "// THIS FILE IS AUTO-GENERATED BY gen_i18n.py. DO NOT EDIT.",
+        "",
+    ]
+
+    # LANGUAGE_NAMES array
+    lines.append("// Language display names")
+    lines.append("const char* const LANGUAGE_NAMES[] = {")
+    for name in language_names:
+        _append_string_entry(lines, name)
+    lines.append("};")
+    lines.append("")
+
+    # CHARACTER_SETS array
+    lines.append("// Character sets for each language")
+    lines.append("const char* const CHARACTER_SETS[] = {")
+    for lang_idx, name in enumerate(language_names):
+        charset = compute_character_set(translations, lang_idx)
+        _append_string_entry(lines, charset, comment=name)
+    lines.append("};")
+    lines.append("")
+
+    # Per-language string arrays
+    lines.append("namespace i18n_strings {")
+    lines.append("")
+
+    for lang_idx, (code, name) in enumerate(zip(languages, language_names)):
+        abbrev = get_lang_abbreviation(code, name)
+        lines.append(f"const char* const STRINGS_{abbrev}[] = {{")
+
+        for key in string_keys:
+            text = translations[key][lang_idx]
+            _append_string_entry(lines, text)
+
+        lines.append("};")
+        lines.append("")
+
+    lines.append("}  // namespace i18n_strings")
+    lines.append("")
+
+    # Compile-time size checks
+    lines.append("// Compile-time validation of array sizes")
+    for code, name in zip(languages, language_names):
+        abbrev = get_lang_abbreviation(code, name)
+        lines.append(
+            f"static_assert(sizeof(i18n_strings::STRINGS_{abbrev}) "
+            f"/ sizeof(i18n_strings::STRINGS_{abbrev}[0]) =="
+        )
+        lines.append("                  static_cast<size_t>(StrId::_COUNT),")
+        lines.append(f'              "STRINGS_{abbrev} size mismatch");')
+
+    _write_file(output_path, lines)
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _append_string_entry(
+    lines: List[str], text: str, comment: str = ""
+) -> None:
+    """Escape *text*, format as indented C++ lines, append comma (and optional comment)."""
+    segments = escape_cpp_string(text)
+    formatted = format_cpp_string_literal(segments)
+    suffix = f",  // {comment}" if comment else ","
+    formatted[-1] += suffix
+    lines.extend(formatted)
+
+
+def _write_file(path: str, lines: List[str]) -> None:
+    with open(path, "w", encoding="utf-8", newline="\n") as f:
+        f.write("\n".join(lines))
+        f.write("\n")
+    print(f"Generated: {path}")
+
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+def main(translations_dir=None, output_dir=None) -> None:
+    # Default paths (relative to project root)
+    default_translations_dir = "lib/I18n/translations"
+    default_output_dir = "lib/I18n/"
+
+    if translations_dir is None or output_dir is None:
+        if len(sys.argv) == 3:
+            translations_dir = sys.argv[1]
+            output_dir = sys.argv[2]
+        else:
+            # Default for no arguments or weird arguments (e.g. SCons)
+            translations_dir = default_translations_dir
+            output_dir = default_output_dir
+
+
+    if not os.path.isdir(translations_dir):
+        print(f"Error: Translations directory not found: {translations_dir}")
+        sys.exit(1)
+
+    if not os.path.isdir(output_dir):
+        print(f"Error: Output directory not found: {output_dir}")
+        sys.exit(1)
+
+    print(f"Reading translations from: {translations_dir}")
+    print(f"Output directory: {output_dir}")
+    print()
+
+    try:
+        languages, language_names, string_keys, translations = load_translations(
+            translations_dir
+        )
+
+        out = Path(output_dir)
+        generate_keys_header(languages, language_names, string_keys, str(out / "I18nKeys.h"))
+        generate_strings_header(languages, language_names, str(out / "I18nStrings.h"))
+        generate_strings_cpp(
+            languages, language_names, string_keys, translations, str(out / "I18nStrings.cpp")
+        )
+
+        print()
+        print("✓ Code generation complete!")
+        print(f"  Languages: {len(languages)}")
+        print(f"  String keys: {len(string_keys)}")
+
+    except Exception as e:
+        print(f"\nError: {e}")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
+else:
+    try:
+        Import("env")
+        print("Running i18n generation script from PlatformIO...")
+        main()
+    except NameError:
+        pass