Vietnam War Bombings Analysis
Cloud & Big Data project from Complutense University of Madrid
Tell Me More

Description

This project has was made to inform people about the Vietnam War because:

Common Knowledge

Despite being such a relatively recent issue most schools do not teach about it, which is why we consider young people ackowledge it.

History is cyclical

As the spanish philosopher George Santayana said “Those who don’t know history are destined to repeat it.” which is what we are trying to prevent

Easy to understand

We thought that the best and easiest way to really see the impact was through charts and maps, thus our results are mostly shown as maps and charts.

Why do we use big data?

Because at the moment we had 3 datasets which together had a size of 1.5GB, but taking into account these were incomplete, we hoped that in the future we will be able to find more information. Taking this and that we had to compute multiple operations on the dataset in order to draw conclusions and make charts, we considered that the computanial costs were too high and required to be optimized, thus the reason to use big data.

What can we achieve with large-scale processing?

With large-scale processing we will be able to compute more in even less time, which allow us to take all the data we can find into account, without the need of disregarding any information. This will be incredibly helpful in case that the dataset grows, which is possibly the case due to its incompleteness.

Model Description

As said before, all of our information can be found in 3 datasets, these are available for everybody at Kaggle, and can be found clicking here the name of these datasets are:

THOR_Vietnam _Bombing_ Operations.csv

THOR_Vietnam _Aircraft_ Glossary.csv

THOR_Vietnam _Weapons_ Glossary.csv




We have made use of spark.sql.module to do all the filtering, grouping operations and also to help us implement statistical methods such as the mean, standard deviation or variances.

We had to filter lots of rows in order to make some of the frameworks work with less data size (like Plotly for maps). We hope we find a way to include all the data into account, perhaps by finding another library for mapping optimized for large chunks of data.




MODEL OF THE SOLUTION




Where to run it: It can either be runned on local or on a cluster, but the following steps need to be done wherever chosen

Dependencies: Run the installation scr¡pt called “install.sh” (no Spark or Python3 included)

How to use it: Once all the dependencies are installed, the script “run.sh” should be runned. It will show a menu where an option should be chosen

Link to the repository!

TOOLS AND INFRASTRUCTURE

Results

Here can be found all the charts for each option.

Vietnam War Bombings (Monthly totals)
Option 1
Vietnam War Missions (Monthly totals)
Option 2
Vietnam War Bombings (Totals by Country)
Option 3
Vietnam War Missions (Totals by Country)
Option 4
Vietnam War Most affected Countries
Option 5
Vietnam War Type of Missions
Option 6
Vietnam War Bombings Locations By Date
Option 7
Vietnam War Most used Aircrafts
Option 8
Vietnam War Mission Type per Aircraft
Option 9
Vietnam War Bombings Locations
Option 10
Vietnam War Bombings per Aircrafts
Option 11
Vietnam War Most Common Take-off Locations
Option 12








PERFORMANCE

Performance Evaluation
Speed-Up

About us

Raquel Pérez González de Ossuna

Robert Farzan Rodríguez

Miguel Robledo Casal

This project was made in 2020 for the class Cloud and Big Data.