Data Warehouse Architecture
A data warehouse is a storehouse that consists of past and commutative information that are originating from one or more sources. A data warehouse rationalizes reporting and business intelligence procedures of businesses. It functions like a relational database and undertakes scrutiny and querying. A data warehouse enables organizations to amalgamate information from several sources.
Components of a data warehouse
The major components of a data warehouse include:
- Data Warehouse Database: The fundamental element of warehousing architecture is a databank. The databank stores all enterprise data and makes it controllable for reporting. Therefore one must select the type of database that suits his or her needs. There are four types of databases, namely: Typical relational databases, which are row-based. Examples include SQL Server, Oracle, SAP, and IBM DB2. The second type is the Analytics database, which is established for the storage of data to sustain and control analytics. Examples include Teradata and Green plum. The third type is Data Warehouse applications, which provide software for data control and hardware for data storage. Finally, the fourth database is the cloud-based database cloud database that can be hosted and recovered on the cloud, thus eliminating the need for setting up a data warehouse.
- Extraction, Transformation and Loading Tools(ETL)
Extraction, Transformation, and Loading Tools are essential to a data warehouse. The tools assist with extracting data from various sources, changing it into an appropriate arrangement. The data is then loaded to a data warehouse. The ETL tool selected will determine the time spent in data extraction, move toward extraction of data, the type of transformation implemented and the simplicity involved, filing data that is mislaid, and finally defining the spread of information from the basic stock to BI applications.
- Metadata: Metadata explains the data warehouse and provides a framework for data that assists in constructing, handling, preserving, and using the data warehouse. It is dividing into two groups, namely: Technical metadata and business metadata. Technical metadata constrains information that can be utilized by developers and managers during the execution of warehouse development and administration assignments. Business Metadata contains information that provides a simply understandable point of view of the data stored in the warehouse.
- Data warehouse Access tools: Group of databases is utilized as a foundation in a data warehouse. Since corporate users cannot rely on database s directly, they require the help of many tools such as Query and reporting tools, application development tools, data mining tools, and OLAP tools. Query and reporting tools assist users in offering corporate reports for scrutiny. They can be in the form of calculations, spreadsheets, or interactive visuals. Application development tools assist in developing tailored reports and presenting them in interpretations for special reporting intentions. Data mining tools systemize e process of recognizing arrays and links in large amounts of data. It uses cutting edge statistics techniques. O LAP tools, on the other hand, assist in constructing a multi-dimensional data warehouse. It also enables the examination of enterprise data from several viewpoints.
- Data warehouse Bus: it outlines data flow in a data warehousing bus architecture. It consists of a data mart, which is an access level utilized in transferring data to users. It partitions data, which is then generated for a specific group.
Data transformation refers to the procedure of changing data from one format into another. Data for the warehouse need to be transformed using various forms. Some of the forms include smoothing where noise is removed from the data set through the use of algorithms, Aggregation, which involves storing or presenting the data in a summary format, Discretization where continuous data is converted into a set of small intervals, Attribute construction, which involves the creation of new characters, generalization which entails conversion n of low-level data characters into high-level characters, Normalization which entails the conversion of all information variable into a given range.
Currently, the majority of modern enterprises, organizational, and institutions depend on knowledge-based management systems. Therefore, they use a data warehouse as their main component since a data warehouse is a knowledge-based management system. Therefore in this organization, a data warehouse is constructed mainly for two reasons. One of the reasons is to integrate many heterogeneous, independent, and widespread data sources in an enterprise .the second reason are to offer a platform for advanced, difficult, and effective data examination.
Big Data
Big data refers to huge and complex data sets that must be processed and examined to obtain or uncover useful insights from them. The insights will benefit businesses and organizations, especially during decision making. The entities will be able to make informed decisions. The data sets keep increasing exponentially with time, and therefore they cannot be examined or processed using conventional data processing methods.
Big data is of different types, which include structured, unstructured, and semi-structured. Structured data refers to data that can be processed, stored, and recovered in a rigid format. The data is greatly organized; thus, it can be stored and accessed from a database by simple search engine algorithms. Unstructured data is data that is not in any format or particular structure. A lot of time s thus spent in processing and examining such kind of data. On the other hand, semi-structured data is data that has both formats that are structured and unstructured.
The characteristics of big data are velocity, variety, and volume. Velocity refers to the speed at which data is being formed in real-time. It includes the rate of change connecting incoming data sets at different speeds and activity bursts. Big data entails huge volumes of data produced from different sources daily. Examples of such sources include social media, business processes, and human interactions, among others. The huge data is kept in data warehouses. Variety refers to the structured, unstructured, and semi-structured data collected from many sources.
Big data can give companies useful insights to boost their products and services, and it combines appropriate data from many sources to generate highly actionable insights. It can be used in predictive analysis. Finally, companies using insights generated from big data will always be ahead of competitors. Therefore insights from big data make them gain a competitive advantage.
Personally, I have seen big data being used in academics. Big data is assisting a lot in improving education in the world today. Education I no longer restricted to the physical boundaries of the classroom. There are a lot of online courses to learn from. Academic institutions are investing in digital courses regulated by big data methodologies to assist the all-around establishment and budding learners’ growth. In the healthcare industry, medical professionals big data has enabled medical professionals to offer personalized healthcare services to individual patients.
Big data demands that the organization fully comprehend what it is and accept it. Is before embracing it. Without a clear understanding of big data, companies may end up waiting for many resources and consuming a lot of time on big data in vain or without benefiting from it in any way. Therefore big data must be appreciated by the top management first and finally, the rest of the organization. The information technology departments of an organization should organize many workshops and big data training to make this possible.
Big data technologies can be confusing. Therefore big data demands that organizations seek professional assistance to be able to identify the right technology. Since big data adoption entails many expenses, an organization should go for appropriate technology depending on their budget. Data required to originate from diverse sources, and organizations are required to analyze all that data. They require to clean the data, and to be able to do, so an appropriate model has to be in place. Therefore big data demands that organizations have appropriate models to analyze and deal with the large volumes and data from diverse sources.
Therefore, organizations are not going to implement big data in their organizations without understanding what it is, which technology is best for them; they can afford the selected technology. Therefore they require expert help as they cannot achieve all this independently, especially if they are new.
Green computing
Green computing is an environmentally accountable and eco-friendly utilization of computers and their resources. In green computing, computer devices are utilized and disposed of to minimize their environmental impact. An organization can turn its data centers green in various ways. Some of the ways are as follows:
- Installing VFDs: VFDS refers to variable frequency drives. They can be installed on air-cooled chillers to boost their efficiency by minimizing the rotational speed of a compressor in response to off-peak, lower load situations. Therefore, this means that the compressor does not have to function hard, making the chiller use minimal power during off peak conditions.
- Reference ASHRAE guidelines: This reefer to the American Society of Heating, Refrigerating, and Air Conditioning Engineers thermal guidelines. The guidelines assist a lot when it comes to the establishment of facilities. They provide important tips on maximizing interior temperatures while minimizing power utilization for cooling.
- Employ cold/hot aisle containment: under this practice, the physical blockade is used in minimizing the mixing of cold air in data center supply aisles while the hot air is still in the exhaust aisles. When the two are combined, as mentioned, energy use is reduced, and cooling efficiency boosted.
- Leverage Virtualization technology: Virtualization reduces the number of physical servers. It also reduces power use required to operate the information technology infrastructure of a data center. By enabling numerous server instances to operate on a single machine, the energy output is instantly reduced.
- Update PCS: Conventional desktops should be substituted with thin client PCS that use less power.
- Racking blanking panels: The panels are used in removing gaps in server racks. They also develop a contained server rack surrounding. This assists in expanding usable cooling unit capacity. It also reduces the efficiency and effectiveness of the cooling infrastructure.
- Taking advantage of the cool days by shutting down compressors in data centers during cooler days. The outside air will be used in this case to circulate coolness in the space.
- Making data centers advancements easy is crucial. For example, strip doors, floor tile cuts fitted with cold airlocks, among others.
An example of an organization that has embraced green computing is Intel. Intel is the world’s manufacturer of computer processors. It is leading when it comes to green energy. Green energy is derived from sources like solar, wind, hydro, and biomass. It has eighteen solar panels which produce electricity. The company uses green energy in driving electric operation needs in the processing and manufacturing of processors, including other computer accessories.
Conclusion
Green computing, Data warehouse, and Big data each have their own challenges regarding organizations embracing the technologies. However, organizations should embrace the technologies since they will greatly benefit, as discussed in this paper.