TextRegion is a training-free framework that generates text-aligned region tokens using frozen image-text models and segmentation masks, achieving remarkable zero-shot performance in tasks like semantic segmentation and multi-object grounding. The framework allows for direct evaluation and inference on custom images, provided users follow the setup and dataset preparation guidelines. It builds on various existing models and is available for use and citation under the MIT License.