Fine-Tuning

The process of further training a pre-existing AI model on a domain-specific dataset in order to improve its performance on specialized tasks. Fine-tuning adjusts the model’s internal parameters, making it more accurate and fluent on topics related to the training data. A general-purpose LLM fine-tuned on commercial real estate data such as lease language, underwriting reports, and market commentary would produce more accurate and contextually appropriate outputs than its base version. Fine-tuning is more resource-intensive than prompt engineering or RAG but can yield significant performance gains for specialized CRE use cases.

Putting Fine-Tuning in Context

A CRE debt advisory firm processes hundreds of loan packages annually and finds that a general-purpose AI model consistently misclassifies loan structure terms and produces imprecise summaries of credit memos. After fine-tuning the model on several years of internal credit memos, term sheets, and lender correspondence, the firm’s AI tool produces summaries that use correct debt market terminology, recognize deal structure nuances, and require significantly less analyst correction before they are usable. The performance gain came not from changing the prompt but from adjusting what the model had learned.


Frequently Asked Questions about Fine-Tuning

Fine-tuning is most justified when a task requires consistent style, terminology, or reasoning patterns that are too complex or too numerous to specify reliably through prompts alone, and where the relevant knowledge is stable enough to train on rather than needing real-time retrieval. In CRE, this includes tasks like generating investment committee memos in a firm’s specific format, classifying lease clause types according to an internal taxonomy, or producing debt underwriting summaries that consistently use the firm’s preferred structure and vocabulary. If the performance gap can be closed through better prompting or by retrieving the right documents, fine-tuning is likely not worth the added cost and maintenance burden.

Fine-tuning requires a curated dataset of input and output pairs that demonstrate the behavior the model should learn. For CRE applications, this typically means hundreds to thousands of examples such as a lease clause paired with a correctly formatted abstraction, a property description paired with a well-structured market commentary, or a set of financial inputs paired with an appropriate underwriting narrative. Data quality matters far more than volume: a small set of high-quality, correctly labeled examples will produce better fine-tuning results than a large set of inconsistent or partially correct ones. Firms that have accumulated years of consistently formatted deal documents are generally well positioned to build a fine-tuning dataset.

Fine-tuning on a narrow domain can degrade a model’s performance on tasks outside that domain, a phenomenon sometimes called catastrophic forgetting. A model fine-tuned heavily on CRE lease language may become less capable at general writing, coding, or reasoning tasks that were well handled by the base model. For most CRE firms, this trade-off is acceptable if the fine-tuned model is deployed for a specific, dedicated workflow rather than as a general-purpose assistant. Teams should evaluate the fine-tuned model’s performance across both the target task and adjacent tasks before replacing the base model in any workflow that requires broader capability.

The fine-tuning process involves preparing a labeled training dataset, selecting a base model, running the training process using a fine-tuning API or platform, evaluating the resulting model against a held-out test set, and iterating on the dataset or training parameters until performance meets the required standard. Major AI providers including OpenAI and others offer fine-tuning APIs that reduce the technical barrier significantly, allowing teams with some familiarity with APIs and data formatting to run a fine-tuning job without building custom training infrastructure. However, dataset preparation, evaluation design, and ongoing model maintenance still require deliberate effort and at least a basic level of technical and domain expertise working in combination.

Evaluation should be based on a held-out test set of examples that were not included in the training data, scored against criteria that reflect actual production requirements. For CRE tasks, relevant evaluation criteria typically include terminological accuracy, structural adherence to the firm’s preferred format, reduction in analyst correction time, and error rate on key financial or legal details. Comparing the fine-tuned model’s outputs against the base model on the same test set provides a direct measure of improvement. Subjective reviewer ratings from experienced CRE analysts who assess output quality blind to which model produced them are also a reliable evaluation method, particularly for tasks involving narrative quality or professional judgment.


Click here to get this CRE Glossary in an eBook (PDF) format.