Remote sensing image retrieval (RSIR) is the process of searching for identical areas by investigating the similarities between a query image and the database images. RSIR is a challenging task owing to the time difference, viewpoint, and coverage area depending on the shooting circumstance, resulting in variations in the image contents. In this paper, we propose a novel method based on a coarse-to-fine strategy, which makes a deep network more robust to the variations in remote sensing images. Moreover, we propose a new triangular loss function to consider the whole relation within the tuple. This loss function improves the retrieval performance and demonstrates better performance in terms of learning the detailed information in complex remote sensing images. To verify our methods, we experimented with the Google Earth South Korea dataset, which contains 40,000 images, using the evaluation metric Recall@n. In all experiments, we obtained better performance results than those of the existing retrieval training methods. Our source code and Google Earth South Korea dataset are available online.