Security and Analysis of Big Data
Emergency management situations generate an enormous amount of data, ranging from text reports (e.g. tweets, news articles, blog posts), to multimedia composed of images (e.g. maps, forecasts, pictures), video (e.g. amateur footage, news reports), and audio clips (e.g. phone calls, radio communication). Existing records may also need to be accessed. This overwhelming amount of data is complicated by the need to get answers to analytical inquiries rapidly. In an emergency situation, operators often do not have the luxury of waiting minutes or hours to analyze data, which would likely be outdated information by the time the program had halted. Harnessing the power of high-performance computing can solve this problem, specifically looking at GPGPU (General Purpose GPU computing) that parallelizes computation across the many cores of a graphics processing unit (GPU).
Programming for GPUs has never been easier. For example, OpenCL contains wrappers that convert Java code to the c99 language that is then executed by the GPU. OpenCL is cross-platform, freeing users from restrictions on what graphics card they can use; for example, CUDA is limited to NVIDIA cards. OpenCL does not make it technically easier, but reduces the effort due to the ability to reuse the same piece of code for all hardward.
For those that prefer working with CPUs but want to harness the power of parallelization, multi-core processor programming is an option. This once-daunting process of programming for multi-threaded applications has reduce considerably in difficulty. The Jibu library for .NET, Java, C++, and Delphi allows programmers to transform their operations simply by calling parallel methods or schedulers.
Combinations of algorithms can be used to speed up analysis further, such as combining Google’s MapReduce and Hadoop (programming models for processing large data sets). Other products, such as Terracotta’s BigMemory, speed up computations by storing more information in-memory, which reduces the number of accesses to the disk.
Security concerns arise when organizations rent space on large data grids and need to control which users have access to it. More than likely, sensitive information will be stored off-site and will need to be protected accordingly. Questions related to securing big data still remain largely unanswered, and the responsibility of securing the data often falls on the people uploading it.