Inference
The process of running a trained AI model to generate a response or prediction from a given input. In practical terms, every time a user sends a message to an AI tool, an inference is performed. Inference speed and cost are key considerations when deploying AI in commercial real estate workflows, as faster and cheaper inference makes it feasible to run AI on large volumes of documents such as screening hundreds of deals or abstracting thousands of leases, rather than processing requests one at a time.
Putting Inference in Context
A CRE asset management firm processing a portfolio of 2,000 leases selects a smaller, faster AI model for its lease abstraction pipeline specifically because lower inference cost per document makes the full batch run economically viable, while a slower and more expensive model would have required the team to prioritize only a subset of the portfolio and process the remainder manually.
Frequently Asked Questions about Inference
How does inference cost affect the business case for AI in CRE?
Inference cost is typically measured per token, where tokens represent chunks of text in the input and output, and those costs accumulate quickly when processing large document sets like offering memoranda, lease portfolios, or deal pipeline submissions. A workflow that costs a fraction of a cent per document becomes meaningful at scale, and CRE teams evaluating AI tooling should model inference cost against expected document volume the same way they would underwrite any other operational expense.
What is the difference between inference and model training?
Training is the computationally intensive process of building a model from data, which happens once and is typically performed by AI developers, not end users. Inference is what happens every time the trained model is actually used, and it is the step that CRE teams interact with directly in their workflows. Most firms will never be involved in training a model, but every AI-assisted task they run, from summarizing a property report to screening an acquisition, is an inference.
Does inference speed matter for day-to-day CRE tasks?
For interactive tasks where an analyst is waiting on a response, inference latency directly affects how usable the tool feels in practice. For automated batch workflows running overnight or in the background, throughput matters more than per-request speed. CRE teams designing AI pipelines should distinguish between these two modes and select models and infrastructure accordingly, since optimizing for one does not necessarily optimize for the other.
What are the risks of relying on high-volume inference in a CRE workflow?
Running inference at scale amplifies any systematic errors in the model or the prompt, meaning a flawed extraction instruction applied across thousands of lease documents will produce thousands of flawed outputs before the problem is caught. Cost overruns are also a practical risk if volume assumptions are wrong or if document sizes are significantly larger than anticipated. Building in a sample review step before committing to a full batch run is a straightforward way to catch both types of issues early.
How do I choose between a faster, cheaper model and a more capable but slower model for CRE document processing?
The right choice depends on the complexity of the extraction task and the tolerance for error in the output. Simple, structured tasks like pulling lease dates or rent figures from a standard lease form often perform well with smaller, faster models, while nuanced tasks like interpreting non-standard co-tenancy clauses or summarizing complex deal structures typically benefit from a more capable model even at higher cost. Running a benchmark on a representative sample of your actual documents before committing to a model is more reliable than relying on general capability comparisons.
Click here to get this CRE Glossary in an eBook (PDF) format.

