Open Source Intelligence (OSI or OSINT) is intelligence collected from sources that are available publicly. Much of the information fed to the internet by users, collected by advertisers, or otherwise left behind during a person’s interaction with electronic systems (or with retailers and advertisers that store such information electronically and the resell it) can be identified through “deep-“, or “dark-“ web research. OSI is important enough of a research methodology that many law enforcement agencies, especially Federal, have dedicated resources to OSINT analysis and gathering.
In civil litigation OSI is an invaluable resource for:
- Research of retained and opposing experts
- Information regarding opposing attorneys
- Witness and litigant information
- Uncovering other emails, social site accounts, properties, activities, and repositories of information not disclosed
Consider a recent case that I was involved with: The opposing party had disclosed certain online accounts that contained relevant information regarding their corporate history, communications via web mail, and travel. An OSI search revealed two alternate web mail addresses, as well as a connection with a competing firm, travel information (previously undisclosed), and some “known associates” that had information relevant to the case. Metadata analysis of documents and photos contained on the newly discovered sites yielded even more information. None of this information was contained on the hard drive submitted for inspection.
OSI, on the web, is broken down into two main categories: Direct indexed information, and Dark web (or Deep web) information.
Direct indexed information is the category most familiar to practically anyone that uses the web. It is information that has been picked up and indexed by a search engine and, with the correct search techniques, can be narrowed down to particular people, places and things. Indexed information typically ends up on the web through three different paths:
Deliberate – Deliberate information is information that is on the web because of the direct interaction of an entity with a web resource. This could be information that is publicly available because of social sites, website registration, or signing on to public newsgroups and forums.
Accidental (Through fault of the information Owner) - Often times information is deliberately provided, but the provider of the information didn’t realize that the information would be publicly searchable. Facebook is a perfect example of where, by not understanding ALL the privacy implications of use, users (or their friends) often provide way more details, photos, or location information than is intended, desirable, or realized.
Accidental (Through fault of the information Custodian) – Very large data breaches are far too common these days. The reality is that they have been very common for years and years, but focus has only recently been turned towards the size, and frequency of breaches. Aside from breaches, however, “information leakage” is not at all uncommon. Information leakage is where a website or internet resource unintentionally will provide more information than the user, or the owner, realize. There are teams of people, advertisers, and intelligence gathering entities that look for information leak and harvest the results.
Dark (or Deep) Web information sounds very “techie” and mysterious, but in reality simply describes the large portions of the web that contain information that is not indexed by search engines. Typically these are databases of information that are accessible from a website, registration information, attendance and membership databases and information of that nature.
The challenge with OSI is to compile information both from direct indexed resources and dark web resources, and then correlate and narrow the information so that it accurate to the particular entity that is being researched. A thorough manual search can be performed using the “cheat sheets” provided with this book. The challenge is that aggregation, correlation and verification can take many hours. There are tools available to an attorney that speed up the process. LexisNexis offers access to a static database through the Accurint tool (http://www.Accurint.com), and Westlaw (http://www.Westlaw.com) also provides static database information as well. There are any number of smaller sites that offer various degrees of information through static databases.
Static information can quickly become inaccurate or stale, and there are tools that fill the niche for automated research. Vidoc Razor maintains such a tool (If you are an attorney, you can request a login at: http://www.vidocrazor.com/RSInfo.php) that actively mines “live” social information, media and publications, photos, as well as location and known relations and associate information. The information is then aggregated, correlated, and a baseline validity check performed. The information is available for filtering and refining from a single point, and custom reports can be generated.
Whether using manual techniques, static databases, or automated approaches, the nature of OSI is important to keep firmly in mind: it is fluid. The information “lives” and changes as people live and change. It is also contradictory; some OSI is incredibly volatile and can “evaporate” without warning, while other OSI is incredibly persistent, and will stay available through harvesting techniques even when the information owner is actively trying to remove it. Any information derived from any of the harvesting techniques discussed must be verified before action is taken on it.