Data Mining and Warehousing
By: Edward • Research Paper • 2,735 Words • May 31, 2010 • 1,114 Views
Data Mining and Warehousing
Summary: This is a thirteen that tells the reader all about Data Mining and Data Warehousing the evolution and the software’s that are used and those that were used 10 years ago
INTRODUCTION: The technology that exists with Data Mining and Warehousing is comparatively a new term but the technology is not. Data Mining is the process of digging or gathering information from various databases. This includes data from point of sales transactions, credit card purchases, online forms which are just a few of the many things that some of the large companies dig to find out more about their clients. The information is used to find out how major of the clients shopping behavior, or what makes them irritated or simply how can they make the life of the client happier. Since gathering all this information is a necessity in order to increase sales and have a better relationship with clients, and with storage devices becoming cheaper, the idea of warehousing data came into being. This literally means that the data is collected in a central place where it is analyzed and sorted according to the company requirement.
Data mining is the search for relationships and global patterns that exist in large databases but are 'hidden' among the vast amount of data, such as a relationship between patient data and their medical diagnosis. These relationships represent valuable knowledge about the database and the objects in the database and, if the database is a faithful mirror, of the real world registered by the database (Holshemier & Siebes, 1994).
Data mining or knowledge discovery is a way of sifting through millions and millions of records that help the people who make decisions to better understand the needs of the customer. Although this technology is in its infancy state many industries are using this technology some of them to note are retailers, finance, health care, transportation and aerospace are just a noted few. These industries are already using the technology. By using complex mathematical and statistical techniques and pattern recognition techniques they get information that about 10 years ago would seem an enormous a job that would require months to process. Today these figures are processed at an amazingly high speed and with precision to the “T’. These figures and analysis help an analyst in recognizing the fact of relationship, trends, exception and anomalies that are sometimes missed out while analyzing data.
To understand how much data one talks about where storage capacity is concerned. There are trillions of point of sales transactions, credit card purchases, pictures (which are just some types of data that data mining applications pick up) all this are stored in large databases that are measured in bytes. Bytes are the measurement of storage devices. Eight bits make one byte. 1024 bytes make One Kilo Byte and it goes on and on. Today the size of databases is in gigabytes and terabytes so Gigabytes is equal to 1073741824 bytes. This is comparatively a lot of data One terabyte will be approximately equal to about 2 million books. Wow that’s a lot of data but that is the amount of data that is received from companies such as Wal-Mart. The data is collected by various methods. All this is stored in a central database that is powered by extremely powerful machines that are maintained by the company itself. The place where all this data is stored is known as a Data Warehouse. The data is accumulated in one place and sorted and arranged into a manner so that the user finds the information, which he/she wants.
The Million-dollar question that needs to be answered is, what is Data? Data are the facts and figures that are collected by various means and sources. This could come by various means. Organizations collect huge amounts on a daily basis. These are in different formats and in different databases, some of the types that these software’s collect are given below: 1. Operational data such as sales, cost, inventory, payroll and accounting
2. Non operational data such as industry sales, forecast data and macro economic data
3. Meta data: - data about the data itself such as logical database design or data dictionary definitions.
These are then collected and stored away in Data Warehouses where it will be analyzed for different reasons.
Today Data Mining can help businesses in many ways; it is used to discover patterns and relationships in the data to help make better business decisions. Data mining can also help to spot sales trends and make better marketing strategies and this could also be used for telling them which customers are loyal or not. Specific uses of data mining include.
1. Market Segmentation: Identify those customers, which buy the same products from any particular company.