Pieces of information across the internet can be pulled in from so-called “Dark web” sources (sounds sexy, right? It actually just refers to information that is contained in databases that are not indexed by search engines), public records, search engine indexed information, metadata information contained in posted documents (photos, PDF docs, various graphics formats, etc.), online newsgroups, social media sites to name a few.
Using these pieces of information to uncover locations, associations, activities, behaviors and motives is entirely possible (and, in fact, is done every day in active investigative work), but not in every case. As you may imagine, it is easy for the thread to get broken and for a logical disconnect to occur. The trick is to combine inductive and deductive reasoning with the real information you find, and then to develop theories about other possibly available pieces of information and test those theories.
At a certain point any investigation, electronic or otherwise, will likely require “boots on the ground” to verify assumptions.
For your reading pleasure I’ve provided a link to a popular story back in 2006 about the accidental release of “anonymous” search results by AOL and the subsequent work done by a NY Times reporter in using aggregated information about search queries to strip anonymity.
Wikipedia entry on the same incident: