Data Lakes are centralized repositories that allow organizations to store and secure their data in a raw format. It can include binary data, unstructured data, semi-structured data, and structured data from relational databases notwithstanding the scale or amount of data. Unlike a hierarchical data warehouse, the data isn’t stored in files or folders, but uses object storage and a flat architecture. Once stored, the data can be processed using analytics applications such as but not limited to reporting, visualization, artificial intelligence, and machine learning. A data lake can be established within an organization's data center or in the cloud.
Data lakes ensure that all data gets stored, unaltered, and can be accessed quickly. Any organization can, and should, utilize data lakes. Anyone in the organization can access the data to complete projects and nothing gets lost or missed. And by keeping it in a raw format, data scientists have more options as to how to evaluate the data and garner insights. As simple or as advanced as they want to go, it does not matter, they have the option since the data is untouched.