The study introduces a theoretical framework for understanding in-context learning (ICL) in large language models (LLMs) by utilizing hierarchical concept modeling and optimization theory. It demonstrates how nonlinear residual transformers can effectively perform factual-recall tasks through vector arithmetic, proving strong generalization and robustness against concept recombination and distribution shifts. Empirical simulations support these theoretical findings, showcasing the advantages of transformers over traditional static embeddings.