EDIT: never mind, I lost track of the specific subject (page caching) and went on a more general spiel. What I wrote below are all good topics to learn about, but they won't, as a rule, apply here. For something that
does apply, look at the Wicked-Pedo page on
Caching algorithms, especially Least Recently Used algorithms, generational caching, and priority weighting. It would be a good starting point, but you will need to do more once you have that overview.
I second Velko's point, and specifically recommend using a
B-Tree (or a variant such as a B+-Tree or B*-Tree), though you will want to see whether something different (such as an extensible hash table or a tree of hash tables) will be better suited for the specific purpose. Alternately, look at a
Trie, though that probably wouldn't fit here (they generally aren't good for disk-bound data) but might useful in a compound structure.
Start by comparing the algorithms and structures for their likely performance, then experiment a little with an implementation or two. If it varies a lot depending on the specific data or layout, you might want to consider an adaptable approach (i.e., one where the structures on disk could be used with different algorithms, and you can shift algorithms interchangeably as the situation changes), but that would be an end-stage optimization that would take an inordinate amount or work.
Oh, and if it is a fixed (or very slowly changing) data set, consider applying a
perfect hash instead. They are very fast for lookups, but if the hashed component of the data changes or new data is inserted, the whole thing needs to be recalculated, a rather slow and memory-intensive process, even with the newer dynamic perfect hashing algorithms. If data is insert-only, a tree of smaller perfect-hash tables might work well, as it would localize the hash recalculations.