Monitoring structures can be a relatively expensive, time and labor-consuming. Traditional sensors can only determine structural response at the points where they are installed, providing low spatial resolution for some analyses, such as damage detection. The use of computer vision in structural monitoring has gained increased attention due to its ability to extract large amounts of data without requiring large amounts of sensors, making this type of methodology more economical and agile compared to traditional techniques for monitoring structures. In this work, in order to determine the frequencies and modes of vibration, images of a footbridge have been acquired. The modal characteristics have been determined through the characterization of the principal components (PCA) of the signals recorded in each pixel of the image. Finally, an unsupervised learning technique called Blind Source Separation (BSS) is used to determine the modal coordinates, vibration frequencies, and modal shapes. Computer vision results were very close to those obtained by accelerometer monitoring, suggesting that the technique can provide promising results for dynamic structural monitoring, allowing a better understanding of structural behavior.