Cataloging the Drawbacks to Hadoop Data Analysis
Storage for big data often consists of scale-out NAS or object storage, and many look to commodity hardware as a cost-effective way of capturing petabytes of information. One of the most challenging big data problems is that big data storage systems must perform well enough to enable real-time analysis. Big data analytics often requires processes and people with specific skill sets, but there are software tools for analytics disciplines such as predictive analytics, data mining, text analytics and statistical analysis.
Because big data can scale to petabytes of capacity, organizations are looking for ways to manage it all that is easier and less expensive than traditional scale-out NAS. Object storage and software-defined storage are frequently mentioned as tools that can help remedy big data problems. Both can add intelligence required for analyzing data and take advantage of low-cost disk storage.
Data lakes can help manage those big data problems, but here is what you need to know before making the leap. Data lakes are strongly associated with Hadoop, and use the open source software as a replacement for traditional data warehouses. Hadoop clusters are based on commodity hardware and can hold structured, unstructured and semi-structured data. This makes Hadoop a good choice for log files, Web clickstreams, sensor data, social media posts and other types of applications that produce big data, but there are drawbacks to keep in mind.