Privacy and security of Big data 33
0
Introduction
Big data means a large set of data, which is difficult to maintain. Storing, analyzing, transferring or updating of big data is a major challenge. Big data has been defined with five Vs that are – Volume, Velocity, Variety, Veracity, and Value (Joshi &Kadhiwala (2017). The amount of data in these sets is large (Volume), the rate at which they are generated is fast (Velocity), the diversity of information is so large that can hardly be categorized (Variety), the trust in information is questionable and challenging (Veracity), and the cost and benefit to its usage is yet to be explored more deeply (Value) (Broeders, 2017). Nowadays, the use of big data has increased drastically in various organizations for making better decisions and attaining business goals. As an organization grows, the collection of data also increases on a daily basis, and the mammoth size of datasets are created. However, the maintenance and security of big data is a major challenge. Therefore, BDSA (Big Data Security Analytics) is introduced. It detects vulnerabilities, misuse, fraudulent activities and intrusion on big data (Patgiri & Majhi, 2018). It also deals with real-time monitoring so that data breaches can be avoided. It predicts potential security attacks and develops prevention techniques to enhance data security. BDSA focuses on the network security to prevent the attack of malicious users, who can enter into the system through any network points. The paper will display various issues related to big data security. Further, some laws are also developed by the government to enhance the security and privacy of big data. The paper will cover the following research questions as described below:
What are the challenges of maintaining big data?
Why privacy and security are a major challenge for big data users?
What methodologies are involved for increasing security of big data in health care, social media, and other domains?
What recommendations can be provided to enhance security features?
Literature Review
A large amount of data is stored in the cloud or hard drives and can be accessed at the same time. It increases the efficiency and integrity of data. Moreover, the integrated big data can be simultaneously collected and processed to make effective business decisions. Thus, it saves a lot of time and cost of the companies. These are some advantages of big data
Even though big data is very much beneficial for organizations, security and privacy surrounding these large datasets pose a challenge. It is reported that more than 90% of big data is unstructured and therefore maintaining a proper standard of security becomes more challenging for the IT professionals (Patgiri & Majhi, 2018). The data warehouse of popular organizations such as NSA and Google contains a vast amount of data, which is more than exabytes (1024 petabytes). They are confronting many challenges while maintaining the privacy of data.
Challenges
Security issues involved with big data are described below:
Unstructured data storage and related security features
A large number of user and granular control (Chandra, Ray & Goswami, 2017)
A huge number of transaction longs and distributed databases
Real-time threat monitoring
Scalable data analytics and mining techniques
Filtering or validating inputs
Granular audits regarding diverse security alerts (cyber attacks or malicious activities)
Usually, an organization stores big data in different databases to enhance security as if a centralized server is used, hackers can easily destroy the data. To maintain such massive datasets, the BDSA method is being used by most of the organizations (Joshi & Kadhiwala (2017).
Big Data Security Analytics
In this technological era of computer systems, most of the data is stored in centralized servers, through which employees’ share or access data as per their requirement. It is a highly risky process. Therefore, BDA finds out various abnormities in structure. It includes access pattern in the web-based transactions, configuration issues in the network servers, anomalous user access or usage of credentials and issues with network sources (Lafuente, 2015). BDA also examined the reliability or vulnerability of the resources from which big data is collected. Distributed databases systems can be used to store the analysis results such as ElasticSearch, Apache, MongoDB, and Hadoop. Further, online, offline and forensic analysis can be performed to identify the security challenges.
BDSA uses the following methods to assess the security challenges on big data:
- a) Predictive Security Analytics
It examines the data access pattern and predicts certain threats, which can occur through some specific systems or device. However, it requires a huge space to store the examined datasets. Machine learning is an example of such a process (Patgiri & Majhi, 2018).
- b) Security Decision Analytics
The methodology helps a security professional to develop a systematic framework to deal with security issues.
- c) User Behavioural Analytics
By monitoring user activities, information is acquired and assessed. It discovers the threat patterns and risks associated with user actions. Thus, potential security features can be developed to prevent attacks.
However, there are some disadvantages of BDSA, and it is as follows:
Synchronization is required to integrate the distributed databases in cloud or different servers
Many experts are needed to monitor and operate the BDSA techniques
All the professional working with BDSA algorithm must have adequate training and knowledge in the technology
Maintaining the quality of data and storage is a major issue
Identifying the best procedures for data analytics is a major challenge in this rapid changing world of data
SRA (Security Reference Architecture) for big data
It is another model that is used for big data security. National Institute of Standard and Technology (NIST) developed the model. It consists of a few key parts, which are mentioned below:
Components of the architecture
Functional procedure
System Orchestrator
It addresses the security requirements and identifies the abstract pattern of threats or intrusions so that attacks can be controlled or avoided.
Data Provider
It examines the trustworthiness of metadata using specific certificates or policies so that proper filtering can be done
Big Data Application Provider
It executes various functions as per instructions of SO and develops certain security activities (Lafuente, 2015).
Big Data Framework Provider
It is a set of clusters, which controls the entire framework
Data Consumer
It deals with data drilling, report creation and manages authentication of end-users
Thus, various frameworks and methodologies are developed to deal with the security issues of big data.
Security and privacy issue in the healthcare domain
The healthcare industry stores a large amount of data (big data) in database systems for medication and treatment purposes. The data is collected from various sources such as patient’s medical reports, insurance summaries, pharmaceutical case studies and reliable medical journals or articles (Olaronke & Oluwaseun, 2016). The healthcare authorities can use the cloud as well as physical storages for storing the big data. The stored data is huge in size and thus maintaining integrity and privacy is a complex or difficult task. Similarly, analyzing large data sets is a time consuming process, consisting of complex procedures. The paper will discuss the benefits and security issues related to big data in health care. Further, based on various articles and journals, solutions will be compared.
Challenges
According to most of the reports and journals some common challenges are identified which affects the security or privacy of big data. These are listed as follows:
Confidentiality
In 2017, Nikunj Joshi and Bintu Kadhiwala said that confidentiality is a major legal challenge for healthcare professionals. If proper cryptographic encryption is not applied, hackers can steal or leak various vital data from the system.
According to Chandra, Ray & Goswami (2017), the doctors, pharmaceutical professionals, and various other medical executives examine healthcare data for identifying better treatment policies. However, they access data from different endpoints, where proper firewalls or security might not be available. Malware attack at these access point terminals also can corrupt the healthcare data. Due to such activities, the confidentiality is compromised. It also happens due to lack of proper access control and authorization policies.
In 2016, Olaronke & Oluwaseun stated that confidentiality or data privacy is a major ethical challenge in healthcare domains. If the proper security policy is not followed and confidentiality is breached, governance and ownership of patients can be severely affected (Olaronke & Oluwaseun, 2016). Organizations can lose valuable data from customers or employees. A large number of data can be corrupted or destroyed, which might affect an organization financially. Moreover, confidential customer data or business policies could be stolen or exposed, which might affect the business.
Integrity
As opined by Rao, Suma & Sunitha at 2015, the data ranges from terabyte to exabyte, which consists of humongous datasets. Thus, maintaining integrity is a tough task for security professionals. Session attacks, salami attacks, and data diddling attacks are some common attacks, which influences data integrity (Rao, Suma & Sunitha, 2015). Minor hardware or software errors can infect the whole source if backups are not created. DLP (Data loss prevention), data provenance and data trustworthiness techniques can be used to maintain the integrity of big data.
Session attack is also known as cookie hijacking. Hacker replicate authentic cookies to enter into the system
Salami attack or salami slicing is minor data attacks, which corrupt data during other major attacks. Mostly such attacks are undetectable
Data diddling attack is a network attack, used to alter data illegally
According to Joshi & Kadhiwala (2017), maintaining the integrity of big data is a major challenge for the organizations as the data is stored in distributed cloud databases and hardware drives. Sometimes, the associated data may be kept in disconnected drives, and it can affect data integrity (Joshi & Kadhiwala, 2017).
According to Chandra, Ray & Goswami (2017), linking distributed servers and cloud databases can be a big issue in the healthcare system. The intermediary network also can be hijacked through a session or other attacks, which will compromise the quality of data. Due to this, data processing and analysis is also being affected. For example, healthcare might not have physical data of a patient, and thus the data will be called by the local functions (Chandra, Ray & Goswami 2017). If network error, hardware or software error occurs in such circumstances, the transmission will be interrupted. Thereby, integrity becomes a great challenge.
Availability
Nikunj Joshi and Bintu Kadhiwala stated in 2017, data availability is a problem in are stored in different locations to enhance security. Moreover, backup or replica data are also stored in the cloud server or separate hard drives. If the respective servers have network, hardware or software issues, the data availability can be affected.
As stated by Fabian, Ermakova &Junghanns (2015), availability of data is a big issue as data are transmitted and processed through a distributed network. The amount of data is too huge, and thereby network disturbance can affect the availability. It means low bandwidth and high latency can affect the quality of data transmission. Data might not be received completely at the client end, and thus availability can be affected.
In 2016, Olaronke & Oluwaseun also stated that availability is a challenge in the healthcare system.
Key management
Many users are involved with the healthcare system, and thereby key sharing or key management is a major security threat. However, an encrypted medium can be used to share the keys (Joshi & Kadhiwala, 2017).
Use of IoT or wireless devices and improper implementation of data breach laws
These are some significant threats in the healthcare domain (Patil & Seshadri, 2014). The IoT devices used by healthcare users consist of secret password or keys, which is used for authorized access. If such a device is hacked or fall into wrong hands, vital data and personal information will be exposed. Moreover, proper laws and mechanism are not available to deal with such cybercrimes. These factors are also major challenges are in big data security.
Solution based on various articles
Various solutions are suggested to enhance security, which is listed below.
Big Data Security Analytics
It mainly focuses on the authenticity of the resources from which data is collected in the healthcare system. It also deals with proactive monitoring which helps to identify abnormal activities within the network systems. The approach predicts the potential threats and its occurrence areas (Blobel, Lopez & Gonzalez, 2016). It can be developed through machine learning or a systematic framework. Moreover, the pattern recognition process also can be applied to develop the BDSA policy. It was tested on cloud storage of the healthcare to determine threats. It also designs certain security models for enhancing security. Further, the user activities are monitored to recognize threat patterns and possible threats associated with those actions. Thus, it improves the security policies of the access mechanism and increases data security in the healthcare network.
Gao et al. proposed a “Haddle framework” for automated data analysis in the Hadoop database. The “Haddle framework” was developed with data analyzer and data collector. The framework consists of automated data collection techniques and analytic methods to recognize the security issues. It was tested on a Hadoop database available in the cloud to resolve the security issues. Issues on the Hadoop data logs were monitored to evaluate the efficiency of the framework.
Authentication scheme
This technique used to develop a secure channel or session through which users can access and maintain confidentiality as well as integrity. The scheme is divided into some key phases – setup phase, registration phase, pre-deployment phase, and an authentication phase. The user (patient or medical officer) intended to access big data in a healthcare system registers his or her unique identity with specific credentials (username, DOB, password, address, and other optional fields). Then a certain encryption algorithm is applied which creates a unique authorized ID in the healthcare system. However, the authenticated access mechanism is developed for a certain period. The user can change the password before its expiration. Even after the specific period, also user can change the password with specific security questions. It enhances security drastically. Moreover, denial of service encryption is also applied to prevent middle attacks.
Han et al. have suggested cryptosystem for increasing confidentially in cloud architecture. Cryptosystem usually operates with three distinguish algorithms for key generation, encryption, and decryption. The generated key is used to encryption data at the server end, which is decrypted at the client end. Thus, it increases the protection of big data.
Khan et al. suggested the dynamic construction of security keys to enhance security features. A combination of random keys, biometric attributes, and encryption techniques can be used as access control or authorization policies. Dynamic key generation cannot be predicted earlier, and therefore hackers cannot produce a possible key to penetrate the system. Thus, the possibility of attack is much reduced, and the efficacy of security is broadened. Usually, symmetric keys are generated by dynamic construction.
Data centric approach
Data centric approaches do not focus on network, database and server threats. The approach applies encryption and data security on healthcare applications. While data is being accessed by users, security attackers cannot infect the real data. Moreover, it also protects data while it is in transit and at rest. It allows specific privileges to the authorized user as they only can decrypt or unmask the data. Thus, it reduces the possibility of a data breach in a healthcare domain
Xu et al. proposed a digital signature on big data for increasing data integrity. It is designed with homographic signature and storage integrity function. Dual encryption also can be applied to dynamic security (Xu & Shi, 2016). It was tested on health records in a dynamic environment.
On 2015, Puthal et al. suggested real-time security verification on streaming data. It is developed with dynamic keys, which is used to enhance integrity as well as end-to-end security. The framework was tested on IoT devices (Puthal et al., 2015). Healthcare application was installed on the IoT device, which uses a dynamic key to secure data. Using cryptographic computations and certain algorithms, the authenticity of the data was evaluated. Hence, it was realized that applying dynamic security can enhance the security features of the IoT components.
Comparison
All three approaches discussed in the paper used to enhance data security. Moreover, all these mechanisms identify access patterns and apply certain policies to secure data. It is the similarity between the approaches.
On the contrary, dissimilarities are mentioned below:
Big data security analytics (BDSA)
Authentication scheme
Data centric approach
Methodology
Analysis
Data sources and user activities are analyzed before implementing certain security policies (Rao, Suma & Sunitha, 2015)
It does not involve any analysis
No analysis is performed
Process Type or duration
It is a continuous process as data is analyzed to a regular basis to identify potential threats
The security encryption is onetime policy during the creation or updating of passwords.
Specific encryption policies are applied to secure big data throughout its life cycle.
Applications
Sophisticated systems are required within the main server or database for BDSA.
The technique is applicable to small devices as well as large complex databases. It can also be applied in the cloud (Srinivas, Das, Kumar & Rodrigues, 2018)
The approach does not focus on any devices or network system. Thus, it can be applied in any system or network
Algorithms Used
It uses a series of algorithms such as Propagation algorithm, PageRank algorithm and Coordinate Descent algorithm for examining security issues.
Few specific encryption techniques such as CRA (Challenge Response based authentication) and DPA(device pairing based authentication)are applied for authentication purpose
It also uses various protocols to increase the safety of data.
Results
Data Encryption
It implements specific access control, dynamic analysis mechanism to prevents malicious codes in the system
It applies a specific encryption technique for authorization
It masks data and applies encryption so that only specific user can decrypt data
It develops validation policies and filtering mechanism for data collection
It does not filter data. However, based on specific privileges, it controls the user activities
The data filtering or validation techniques are not applied
In the future, healthcare clouds can use the approach to strengthen data security
All kinds of healthcare applications as well as could based database can use this approach
The approach can be used in any platform including financial, healthcare and other domains
The BDSA approach is the strongest method among the discussed techniques. As it examines big data regularly, the new threats be can be recognized easily. Besides when abnormal user actions are identified, the user can be restricted, and thus the security of the healthcare system can be enhanced. The privacy policies are evaluated and compared with the latest malware. It helps to develop appropriate security policies. BDSA approach can be very helpful in the future for secured access through healthcare application. The compatible version for IoT devices and cloud-based architecture should be developed so that user can access data securely
Security and privacy challenges of big data in Social media
Facebook, Twitter, YouTube Instagram, and Google Plus are some popular social media platforms. All these platforms use a large set of data such as documents, videos, pictures and graphics from various sources. As there are numerous users involved in social networking, unauthentic sources are shared on many occasions (Zhang, Zhu, Sun, & Fang, 2010). Such sources can contain highly infected data, which can corrupt the primary servers of these popular sites. Moreover, most users blindly share many posts and thereby they access malware or other viruses unknowingly in their system. For example, a user shared a video from social media, but not checked the original source and its authenticity. This video might contain linked worms or other viruses, and when the video is played or accessed by a system, it can easily breach through the firewall. As the user intentionally allowed the malware, it can contaminate the user data at any moment. Such activities can create a catastrophic impact on the confidentiality and integrity of those users’ accounts. The section will highlight the security issues in social media along with some specific solutions
Challenges
Resource sharing
Dustdar et al. stated that users share many resources over the cloud to highlight social relationship and related data. The media such as videos or photos are tagged and shared with many friends over social media. The security professionals had found the concept by examining the historical data of social sites as well as current trends (Dustdar, Tan, Blake, & Saleh, 2013). Most of the times, they share third party sources which have malicious components. The users do not check the authenticity of those resources, and thus it can create security issues. With the shared data, intrusion codes can enter in a secured system and thus data can be lost or infected.
Nobubele and Jabu defined that data is often shared from third party resources, as well as genuine resources. Such sharing can involve security attack during data communication. Most of the data are used for social interactions as well as advertisement purpose and whenever a user accesses such data, the corresponding system is infected with vulnerable codes (SHOZI & MTSWENI, 2017).
Tufekci (2014) states that all the data used in social media are public data. Thereby it can be accessed by any user from anywhere in the world unless there is proper restriction or blockade for that user. Such unauthorized access can severely damage private data (Tufekci, 2014).
User Awareness
Tufekci told that most the users are not aware of the consequences of their actions. It means they do not check the authenticity of these resources and as a result, the user devices and social media servers can be corrupted (Tufekci, 2014).
Jerzy Surma explained that many users do not know about the privacy setup in social media sites. Thereby any unknown user can view or use the data stored in that user’s profile. It is a major threat as these data can be used for unethical purposes streams (Surma, 2013).
Smith et al. described that many users simply view the posts out of their curiosity. However, in the process, they also access the linked threats due to lack of awareness. Such an issue is a severe threat to the big data in the social media platform and users must be educated regarding this (Smith, Szongott, Henne, & Voigt, 2013).
Connected data streams and devices
Surma states that many devices and computer systems are connected through the distributed network. Thereby infection in a system can create synchronized impact in the entire data network. The users must wisely choose the data so that it does not contaminate the personal data streams (Surma, 2013).
Stergiou et al. state that nowadays a number of youth and other people use social media platform for global communication. They spend a lot of time on the network and access various malicious data out of curiosity. When they attach such files and share data in the network, other connected devices are being affected by that malware or other suspicious applications (Stergiou, Psannis, Xifilidis, Plageras, & Gupta, 2018). This is a huge threat for social media users across the world.
Bourahla & Challal expressed that connected cloud and devices are an emerging threat to the security of big data. Clustered networks increase privacy risk and the probability of theft, as malicious users can use cookies of social networking sites to breach the security of IoT devices and crack the IP address. Confidential data can be lost in the process (Bourahla & Challal, 2017). However, dynamic authentication policies and cross domain analytics can reduce the risk of data attacks in the connected devices
Possible Solutions
Cross-domain analytics
The method allows social media and search engines to recognize authentic data by applying various security certificates and protocols. It performs a profound analysis before publishing and sharing data through social media pages. Cross-domain analysis can help to eliminate corrupted third party sources. In addition, the linked malware or other viruses are also scrubbed out from the social domain. Thus, various uncertified pages and advertisements are eliminated from social media (Chitransh, Mehrotra, & Singh, 2017).
Access control Policies
Access Control consists of multiple encryption protocols and security algorithms. A proficient access control policy can restrict malicious users from a system hack. The user can select certain policy through their social media accounts. It filters the other user’s activities, and thus many suspicious users can be avoided to enter into the secured data of the user. Moreover, the social networking organization can manage security policies and symmetric encryption keys to restrict users (Dustdar, Tan, Blake, & Saleh, 2013).
Service frameworks
Service reputation is a special type of software applied to recognize the user’s feedback or requirements. It contains multiple segments such as evidential reasoning and fuzzy engine, which are used to filter valid data. Moreover, the framework also stores certain patterns or activity logs for future reference. Thus, the technique can filter reasonable data in a cost effective way. The framework improves security features and stores them as informative data and executable data. Confidentiality and security are managed by developing certain social settings (Yuqing, 2017). Besides, it has some disadvantages like the professionals need profound knowledge on each aspect of the framework. Further, the modules of the framework need to be properly integrated.
Figure: Service Reputation Framework
Source: (Yuqing, 2017)
Comparison
Cross-domain analytics (CDA)
Access control Policies
Service Reputation Frameworks (SRF)
Methodology
Analysis
The method recognizes the authenticity of the source using certain policies. Unauthorized third-party sources are usually eliminated from appended data (Lv et al., 2017)
Social media sites have few inbuilt privacy policies, which restricts unauthorized access. Moreover, security settings and encryption protocols also can be applied.
It uses fuzzy control and evidential reasoning components to analyze and filter data.
Process Type
It is a regular process. Whenever a user appends data in the social media domain, CDA compares and validates that data.
Social sites have some predefined access control policies. These perform a continuous analysis. On the other hand, added policies can be modified or added at any stage.
Usually, SRF compares data and saves the results as historical data logs. Thus, it is not a continuous process and saves time.
Applications
It is available in the cloud, and thus specific hardware or software is not required. It is a popular method in all kinds of IoT devices.
It is available in the cloud as well as specific applications such as Twitter, Facebook, and Instagram (Lee, 2017).
The policy needs to be installed in a strong social media server with sophisticated hardware and software. However, it can be called from any IoT devices with minimum network bandwidth and software or hardware specification
Algorithms Used
CDCF or Cross-domain collaborative filtering can be used to filter data. Besides, LWLR (Locally Weighted Linear Regression) also can be implemented in the applications, which deal with local data of the user
Attribute based encryption is the most used algorithm for access control. However, two-factor authentication or biometric encryption is also applicable.
Most used reputation algorithms used are modified TNA-SL and EigenTrust. These algorithms compare the data and validate security. Suspicious and corrupted data are eliminated from the server.
Results
Data Encryption
It provides strong encryption for entire data cycle (data at rest, transit and process). A social media user cannot change his or her password without cross domain verification
P2P (Peer to Peer) encryption and BE (Broadcast Encryption) is applied to secure data
It applies a dynamic encryption policy for big data. Each time data is requested or accessed, SRF compares historical data logs and apply encryption technique accordingly (Manogaran, Thota, & Kumar, 2016).
Most of the social sites use it, and it certainly has a bright future.
It is a necessary method for Social media networking. However, it must be updated every time new malware is encountered.
It is the strongest mechanism among the stated methods. It can revolutionize big data security.
Service Reputation Framework is the most recommended method to strengthen the security features of big data in social networking domain. It uses various QoS parameters such as Availability, Response Time, Reliability, and Success to the ability to validate authentic data. Based on the QoS score, it deletes corrupted media files and malware from network or system. It saves a huge amount of time, as data comparisons are stored in log files. The encryption is applied instantly when the data is mined and analyzed. For cloud environment, this policy is very helpful and time convenient, and thus security professionals of social networking sites are enforcing this technology in all platforms. However, such technology cannot protect data alone. The users, as well as lawsuits, should take proactive steps to increase awareness among the public. People should be educated to distinguish between authentic data sources and infectious sources (Zhang, 2018). Thus, the ratio of security issues will be decreased radically.
Privacy and Security issue in the Finance domain
The finance industry is growing rapidly over the years. Most of the popular banks use a digital platform to perform financial transactions, as it reduces time and tracking transaction is convenient. However, the majority of users stores the password credentials in their mobiles, computers or other IoT devices and these cookies (stored files) has also increased security challenges, as it could be easily hacked by malicious users. Moreover, many customers are regularly being added to the global database of the banking industry, and thus the volume of big data has increased rapidly. It is reported that financial big data might reach up to 44 zettabytes (around 44 trillion GB) by 2020. Managing confidentiality and security of such as mammoth data is very difficult for the security professionals. Nowadays, people also use SMS banking to perform transactions (Win, Tianfield& Mair, 2017). Such messages can be easily hacked and used to steal a large amount of money from the bank. Besides, many applications are installed from unauthorized sources to perform financial transactions. Such applications could consist of malware, or they could have a vulnerability to viruses, which affect the user data as well as the original database. The section will show up various security challenges in the financial domain. Further, some resolutions will be described to mitigate financial risk.
Challenges
Integrity and availability
Chen and Zhang suggest that financial domain has humongous datasets, which are not easy to maintain. Salami attacks and session attacks are some familiar attacks to disintegrate data of financial domain. As a result, a significant amount of data can be corrupted. Moreover, due to such an issue, data availability can be affected. Besides, data are kept in segmented drives, which affect the availability (Chen & Zhang, 2014).
Kshetri states that a large amount of data is not stored in a single server and thus tallying data and examining security issues become difficult for IT professionals. Moreover, remote servers and network problem can create severe data related issues (Kshetri, 2016).
Fang and Zhang mention that maintaining trillions of gigabytes is very difficult for network professionals across the world. The availability and integrity are a serious challenge, as most data are stored in a distributed Hadoop database (Fang & Zhang, 2016). Some major disadvantages are processing very slow speed of Hadoop database. The database developed with Java, which is platform independent and thus hackers can easily develop an algorithm to breach data. Data are also not abstracted, which increases the chance of data loss.
Digital services
Kamaruddin and Ravi told that consumers use digital banking to carry out various financial transactions. They use IoT devices and applications to transfer funds, withdraw amounts or purchase valuable items, which operate through the internet. However, the devices might not have a proper firewall or antivirus protection, and thus financial data can be corrupted or destroyed by unauthorized users (Ravi & Kamaruddin, 2017).
Figure: Trend of digital services
Source: (Ravi & Kamaruddin, 2017)
Chumak, Ramzaev & Khaimovich stated that for the saving time people use mobile applications and e-commerce sites for item purchase and various financial dealings. People use Wi-Fi, LAN or internet to access such a service. Moreover, many advertisements are popped up in these applications, which often rose from corrupted sources. If such advertisements are clicked, malware could enter in the network and corrupt data. Even vital information can be stolen from the banking database, and thus numerous users can be affected. Such distributed access increases the threats of attack (Chumak, Ramzaev & Khaimovich, 2015).
On 2015, Gai et al. highlight that the introduction of the mobile cloud has increased security issues dramatically. In this digital era, many people use mobile for browsing as well as social networking. However, while they access data some suspicious software also enters into the systems, it could corrupt mobile data, and transit through the cloud to the bank’s server. It is a huge risk for data security and integrity (Gai et al., 2015).
The increment in third party applications
Kshetri mentioned that numerous unauthentic applications are available in cyberspace (such as Google Play store), which are downloaded for fund transfer or electronic purchases. Moreover, some applications are also used as a digital wallet. All these applications are linked with bank accounts, which increases the loss probability of financial data. Due to unsecured certificates, such applications can be easily hacked (Kshetri, 2016).
Chumak, Ramzaev & Khaimovich opined use of third party software had been increased radically over the past few years. These applications have very weak security features. Moreover, most of the devices used for financial operations also do not have strong protocols to deal with malware attacks (Chumak, Ramzaev & Khaimovich, 2015).
Fang and Zhang clarified that various third-party software is used to operate various financial activities that use public and private cloud services. Usually, dynamic or two-factor encryption is applied to protect such clouds and associated devices. However, biometric encryption also can be applied. Such encryption methods cannot be replicated from the hacker side, as all these methods require user authentication for data access. Thus, hackers or unauthorized users can easily alter or damage data (Fang & Zhang, 2016).
Possible Solutions
Big Data Security Analytics
BDSA is the most popular method used for increasing data security. The BDSA is applied to recognize certain patterns of transactions or other user activities. A series of methods can be used for data analytics, such as Predictive Security Analytics, Security Decision Analytics, and User Behavioural Analytics. PSA identifies the pattern of security attack, SDA develops a framework to tackle security threats and UBA monitors user activities (Patgiri & Majhi, 2018). Some of the popular algorithms used are PageRank, BeliefPropagation and Coordinate Descent. PageRank algorithm is used to detect patterns and correlation between activities. Coordinate Descent algorithm develops training or machine learning sets for BDSA operation. BeliefPropagation is also used for machine learning activities. It develops a relationship graph through which certain threat patterns can be identified. Effective implementation of BDSA and cautious use of web services can reduce security threat in the future. Eighty percent of global institutions believe that deploying Chabot with BDSA technology can increase business productivity and reduce security issues (Ravi & Kamaruddin, 2017).
Figure: Opportunity of chat bots with BDSA
Source: (Ravi & Kamaruddin, 2017)
Access Control Policies
Combination of encryption algorithms is applied to develop an Access Control mechanism. Various inbuilt policies are available in the applications, which protect financial data. However, these applications must be downloaded from authentic websites. Moreover, banking industries and various financial institutions can apply strong security policies to protect data (Dustdar, Tan, Blake, & Saleh, 2013). Besides, two factor or biometric password protection algorithms can also be applied to secure financial data.
Blockchain technology
Many popular financial institutions such as Bank of America and ICICI have started using this technology to strengthen their security policy (Ravi & Kamaruddin, 2017). It uses a hash function for data mining and analysis. A hash function is a special mathematical function that converts the financial data into encrypted datasets consists of letters and numbers. Each transaction generates a hash, and a series of records are considered as the spreadsheet, which is also called blocks. A certain number of blocks are called blockchain, which automatically updates every 10 minutes. Once a record is generated or updated, further forgery on this technology is almost impossible. Every node on the network has a copy of blockchain. Blockchain protocol uses private keys for confidential transactions and public key for the digital wallet. Moreover, the blockchain database is stored in a distributed network, which makes it difficult to breach (Win, Tianfield, & Mair, 2017).
Comparison
Big Data Security Analytics (BDSA)
Access Control Policies
Block Chain Technology
Methodology
Analysis
User activities and data sources are deeply analyzed prior to implementation of security policies
Most of the applications or web services used have predefined policies. Moreover, certain encryption protocols are also implemented.
It masks the binary data into an unreadable format. It uses a hash function for data mining and analysis.
Process Type
The process is repeated whenever a new transaction occurs in the network. Without proper validation, a user is not allowed to access data.
The method operates with predefined dataset and algorithms for data analysis. However, dynamic analysis is also performed in certain cases (Flood, Jagadish & Raschid, 2016).
Hash function generates record each time a transaction is performed and thereby it is a regular process. However, analyzed records are not reprocessed for validation (Ravi & Kamaruddin, 2017)
Applications
Complex hardware or software specification is required in the primary servers. However, simple devices can call upon the functions of BDSA.
The method is an inbuilt feature of financial applications. Moreover, the procedure or method is also available in the cloud, which can be called dynamically during specific transactions.
The users do not need sophisticated devices or software for using the technology. However, the banking servers need large space to store the blockchain records and related programming protocols.
Algorithms Used
PageRank, Coordinate Descent and BeliefPropagation are few popular algorithms used in BDSA (Win, Tianfield, & Mair, 2017)
Various encryption algorithms are applied to secure data with one time password, two-factor authentication or other security procedures
Various types of Consensus algorithms are used to organize data in the decentralized network
Results
Data Encryption
Dynamic analysis as well as encryption algorithm is applied to tackle against security threats.
Broadcast Encryption or Peer to Peer (P2P) encryption is enforced to secure data
Random key generator creates Private key for securing data. Moreover, the public key is also developed for wallet transactions.
Traffic filtering or data filtering is applied in finance domain to increase security.
Usually, transactions are not filtered. Encryptions are just applied to secure financial transactions.
Blockchain records are dynamically updated each time a transaction is performed. These records cannot be manipulated later. Thus, it has improved data security.
Blockchain technology is the suggested methodology for security enhancement of financial big data. Unauthorized users cannot replicate existing records. Moreover, the database records are stored in distributed cloud architecture, which is complex to hack. Random keys with a combination of numbers and strings are generated for each authorized transaction. Public keys are also generated by specific security protocols. All these features of the blockchain have increased security drastically. Therefore, most of the finance sectors will be encouraged to implement this technology.
Author
Technique or tools
Data source
Findings
Chen & Zhang
Survey on Data-intensive applications
Research paper
Various finance applications have a huge challenge of data attacks. The possible solutions to increase security
Chumak, Ramzaev, & Khaimovich
Economic research based on Big Data technology
Journal paper
Third-party applications and various unauthorized sources increase threats on big data in finance domain
Flood, Jagadish & Raschid
Secure data schema for the mobile cloud in the financial industry
Financial Stability Review of France
Mobile cloud is not safe. However, proper security firewall and encryption can provide a secure pathway.
Gai et al.
International Conference and survey were done by IEEE members
Journal paper published by IEEE
An attribute-based secure schema can increase the safety of cyberspace. However, the private key should be exchanged securely.
Kshetri
Research on financial services in China
International journal of information management
IoTs and online platform are major threats to big data. However, various frameworks can be implemented to secure these platforms and devices.
Ravi & Kamaruddin
Big Data Analytics Enabled Smart Financial Services
Journal published by Springer International Publishing
Smart devices have an increased probability of attacks. However, effective encryption and protocol frameworks like block chain can reduce the threat.
Win, Tianfield & Mair
Big Data Based Security Analytics in Cloud Computing
Journal paper published by IEEE
Distributed Haddop records and BDSA framework is very much effective for enhancing security
Recommendation
System admin faces many issues while maintaining such a vast amount of data. For enhancing data security, various steps can be followed.
Managing privileges and access: The organizations must set specific privileges to prevent unauthorised data accesses. Two or three responsible employees should be assigned to the highest privileges. Besides, others can offer lowest access controls (read-only privilege) so that data cannot be mismanaged. On the other hand, the clients also should be offered to read only controls. It will reduce the possibility of attack.
Recognize and secure the most crucial data: Most precious data should be recognized from the entire database (Joshi & Kadhiwala, 2017). Additional encryption should be added to such data. Further, the copy of data also can be kept in detached drives.
Develop improved data security plan or policies: It is recommended that IT professionals must regularly update the data policies after researching on emerging threats and modify policies accordingly
Implement stronger passwords: Robust passwords such as dual encryption and biometric encryption much difficult to hack. Therefore, Network security professionals should employ such password to augment bigdata security
Backup data regularly and keep in safe place: All organizational, as well as client data, should be regularly backed up in distinguished cloud server and separated hard drives. Besides, the professionals and monitoring steam should regularly backup data so that a potential attack cannot affect the business operations
Conclusion
The use of digital application has been increased radically over the fast fee years. Similarly, the associated database is also being enlarged, which is resulting in an increment of big data. However, the application users often use the unsecured network and unauthentic data sharing, which increases security risk. The data stored in healthcare, financial and social media domains can be affected due to such attacks. Besides, valuable information also can be exposed or corrupted by hackers. However, some security methodologies such as BDSA technique, authentication procedures, cross-domain analytics, and service reputation framework and blockchain technology can be implemented to secure the data. All these methodologies validate the user and data source to prevent entry of malware or virus. Besides, a certain pattern is also identified with these technologies, which helps the administrator to avert data attacks. Apart from the security enhancement, the users also should double-check the authenticity of the sources before sharing or accessing data. On the other hand, users should regularly update anti-virus and anti-spamware applications. It is recommended not to use open unsecured networks such as Wi-Fi in public places. If users do such activities, no security protocol or professionals can secure their data. Besides, the organization dealing with big data must implement either of the mentioned security methods to increase data security.
References
Blobel, B., Lopez, D. M., & Gonzalez, C. (2016). Patient privacy and security concerns on big data for personalized medicine. Health and Technology, 6(1), 75-81.
Bourahla, S., & Challal, Y. (2017). Social Networks Privacy Preserving Data Publishing. Blida: IEEE.
Broeders, D., Schrijvers, E., van der Sloot, B., van Brakel, R., de Hoog, J., & Ballin, E. H. (2017). Big Data and security policies: Towards a framework for regulating the phases of analytics and use of Big Data. Computer law & security review, 33(3), 309-323.
Chandra, S., Ray, S., & Goswami, R. T. (2017, January). Big Data Security: Survey on Frameworks and Algorithms. In Advance Computing Conference (IACC), 2017 IEEE 7th International (pp. 48-54). IEEE.
Chen, C. P., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques, and technologies: A survey on Big Data. Information sciences, 275, 314-347.
Chitransh, N., Mehrotra, C., & Singh, D. A. (2017). The risk for Big Data in the Cloud. Greater Noida: IEEE.
Chumak, V. G., Ramzaev, V. M., & Khaimovich, I. N. (2015). Challenges of data access in economic research based on Big Data technology. In Proceedings of Information Technology and Nanotechnology (ITNT-2015), CEUR Workshop Proceedings (Vol. 1490, pp. 327-337).
Dustdar, S., Tan, W., Blake, M. B., & Saleh, I. (2013). Social-Network-Sourced Big Data Analytics. Miami: IEEE Computer Society.
Fabian, B., Ermakova, T., & Junghanns, P. (2015). Collaborative and secure sharing of healthcare data in multi-clouds. Information Systems, 48, 132-150.
Fang, B., & Zhang, P. (2016). Big data in finance. In Big data concepts, theories, and applications (pp. 391-412). Springer, Cham.
Flood, M. D., Jagadish, H. V., & Raschid, L. (2016). Big data challenges and opportunities in financial stability monitoring. Banque de France, Financial Stability Review, 20.
Gai, K., Qiu, M., Thuraisingham, B., & Tao, L. (2015, August). Proactive attribute-based secure data schema for the mobile cloud in the financial industry. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems (pp. 1332-1337). IEEE.
Gao, Y., Fu, X., Luo, B., Du, X., & Guizani, M. (2015, December). Haddle: a framework for investigating data leakage attacks in Hadoop. In Global Communications Conference (GLOBECOM), 2015 IEEE (pp. 1-6). IEEE.
Joshi, N., & Kadhiwala, B. (2017, April). Big data security and privacy issues—A survey. In Power and Advanced Computing Technologies (i-PACT), 2017 Innovations in (pp. 1-5). IEEE.
Kshetri, N. (2016). Big data’s role in expanding access to financial services in China. International journal of information management, 36(3), 297-308
Lafuente, G. (2015). The big data security challenge. Network security, 2015 (1), 12-14.
Lee, I. (2017). Big data: Dimensions, evolution, impacts, and challenges. Business Horizons, 60(3), 293-303.
Lv, Z., Song, H., Basanta-Val, P., Steed, A., & Jo, M. (2017). Next-generation big data analytics: State of the art, challenges, and future research topics. IEEE Transactions on Industrial Informatics, 13(4), 1891-1899.
Manogaran, G., Thota, C., & Kumar, M. V. (2016). MetaCloudDataStorage architecture for big data security in cloud computing. Procedia Computer Science, 87, 128-133.
Olaronke, I., & Oluwaseun, O. (2016, December). Big data in healthcare: Prospects, challenges, and resolutions. In Future Technologies Conference (FTC) (pp. 1152-1157). IEEE.
Packiam, R., & Prakash, D. V. (2015). An Empirical Study on Text Analytics in Big Data. Tiruchirapalli: IEEE.
Patgiri, R., & Majhi, U. (2018, January 01). Big Data Security Analytics: Key Challenges. Retrieved February 02, 2019, from csce.ucmss.com: https://csce.ucmss.com/cr/books/2018/LFS/CSREA2018/ICD8052.pdf
Patil, H. K., & Seshadri, R. (2014, June). Big data security and privacy issues in healthcare. In Big Data (BigData Congress), 2014 IEEE International Congress on (pp. 762-765). IEEE.
Puthal, D., Nepal, S., Ranjan, R., & Chen, J. (2015, November). A dynamic key length based approach for real-time security verification of big sensing data stream. In International Conference on Web Information Systems Engineering (pp. 93-108). Springer, Cham.
Rao, S., Suma, S. N., & Sunitha, M. (2015, May). Security solutions for big data analytics in healthcare. In Advances in Computing and Communication Engineering (ICACCE), 2015 Second International Conference on (pp. 510-514). IEEE.
Ravi, V., & Kamaruddin, d. S. (2017). Big Data Analytics Enabled Smart Financial Services: Opportunities and Challenges. Hyderabad: Springer International Publishing AG.
ShashiRekha, H., Prakash, C., & Kavitha, G. (2014). Understanding Trust and Privacy of Big Data in Social Networks: A Brief Review. Mysore: IEEE.
SHOZI, N. A., & MTSWENI, J. (2017). Big Data Privacy in Social Media Sites. Pretoria: IST-Africa.
Smith, M., Szongott, C., Henne, B., & Voigt, G. v. (2013). Big Data Privacy Issues in Public Social Media. Hannover: IEEE.
Srinivas, J., Das, A. K., Kumar, N., & Rodrigues, J. (2018). Cloud Centric Authentication for Wearable Healthcare Monitoring System. IEEE Transactions on Dependable and Secure Computing.
Stergiou, C., Psannis, K. E., Xifilidis, T., Plageras, A. P., & Gupta, B. B. (2018). Security and Privacy of Big Data for Social Networking Services in Cloud. Macedonia: IEEE.
Surma, J. (2013). The Privacy Problem in Big Bata Applications: An Empirical Study on Facebook. Warsaw: IEEE.
Tufekci, Z. (2014). Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls. Carolina: Association for the Advancement of Artificial Intelligence.
Win, T. Y., Tianfield, H., & Mair, Q. (2017). Big Data Based Security Analytics for Protecting Virtualized Infrastructures in Cloud Computing. Cheltenham: IEEE.
Xu, L., & Shi, W. (2016). Security Theories and Practices for Big Data. In Big Data Concepts, Theories, and Applications(pp. 157-192). Springer, Cham.
Yuqing, L. (2017). Research on Personal Information Security on Social Network in Big Data Era. Henan: IEEE.
Zhang, C., Zhu, X., Sun, J., & Fang, Y. (2010). Privacy and Security for Online Social Privacy and Security for Online Social. Florida: IEEE Network.
Zhang, D. (2018, October). Big data security and privacy protection. In 8th International Conference on Management and Computer Science (ICMCS 2018). Atlantis Press.