Parsing Model ============= The MBF parsing model is one of the most important shared ideas in the project. At a high level, the parsing model can be understood as a layered process: 1. tokenisation 2. bracket parsing 3. operator parsing 4. regularisation or structural cleanup 5. optional preservation of original source-sensitive details Layer 1: tokenisation --------------------- The source is first broken into tokens such as: * identifiers * strings * numbers * bracket tokens * operator tokens Layer 2: bracket parsing ------------------------ Bracketed forms are then assembled into structural nodes. This is where the family gets one of its most recognisable properties: code, data, and markup can all be represented in closely related structural shapes. Layer 3: operator parsing ------------------------- Binary operators are then interpreted with precedence and associativity rules. This is what lets MBF support concise infix structure while still remaining structural rather than purely textual. Layer 4: regularisation ----------------------- Some consumers want a cleaned-up structural view of the parsed source. Regularisation is the step where the tree is made easier to work with in ordinary language processing. Layer 5: source-preserving modes -------------------------------- Some parts of the family need more than a simplified tree. Macros and embedded sublanguages may need access to source-shaped nodes, including whitespace-sensitive structure. That is why MBF parsing is not only about producing a convenient tree. It is also about preserving enough information for language extension and embedding. Why this matters ---------------- The parsing model is one of the reasons the Makrell family can support: * programming-language implementations * MRON * MRML * macros * embedded mini-languages