This study investigates how a one-layer transformer learns to recognize regular languages, focusing on tasks such as 'even pairs' and 'parity check'. Through theoretical analysis of training dynamics under gradient descent, it reveals two distinct phases in the learning process, demonstrating how the attention and linear layers interact to achieve effective separation of data sequences. Experimental results confirm the theoretical findings.