InteractVLM is a new method for estimating 3D contact points on human bodies and objects from single images, addressing challenges like occlusions and depth ambiguities. It combines Vision-Language Models and a Render-Localize-Lift module to enhance 3D reconstruction and introduces a Semantic Human Contact estimation task for improved interaction modeling. The approach outperforms existing methods and is scalable due to its reliance on limited 3D contact data.