Showing posts with label data mining. Show all posts
Showing posts with label data mining. Show all posts
on Wednesday, 28 August 2013
A Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with the assumption of independent (naive) strong. A more descriptive term for the probability model to be underlined is "independent feature model".

on Thursday, 8 August 2013

Overview

Text mining, refers to the process of taking high-quality information of text. High-quality information is usually obtained through a pattern forecasting and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of some among others, and subsequent insertion into a database), determine the pattern in structured data, and finally evaluate and interpret the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. 

on Monday, 5 August 2013
Consumers are a very important asset for a company. There will be no business without any prospects of relationship with the consumers who are loyal. This is the reason why the company should plan and use the strategy that is quite clear in treating consumers. Customer Relationship Management (CRM) has grown in recent decades to reflect the major role of consumers for setting corporate strategy. CRM includes all sizes to understand consumers and processes to exploit this knowledge to design and implement the marketing activities, production and supply chain of suppliers (suppliers). The following will CRM is defined some terms are taken from the literature, among others (Tama, 2009):

on Tuesday, 30 July 2013
Support Vector Machines (SVMs) are a set of supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis. Original SVM algorithm invented by Vladimir Vapnik and the current standard derivative (soft margin) was proposed by Corinna Cortes and Vladimir Vapnik (Cortes, C. and Vapnik, V, 1995).
on Friday, 19 July 2013
CRISP-DM (Cross-Industry Standard Process for Data Mining) is a consortium of companies which was established by the European Commission in 1996 and has been established as a standard process in data mining that can be applied in various industrial sectors. The image below describes the development life cycle of data mining has been defined in CRISP-DM.

on Thursday, 11 July 2013
The data warehouse is a collection of data from various sources stored in a data warehouse (repository) in a large capacity and used for decision-making process (Prabhu, 2007). According to William Inmon, the characteristics of the data warehouse are as follows:
on Wednesday, 3 July 2013

Data Integration


Data integration is one of the steps Data Preprocessing that involves combining data residing in different sources and providing users with a unified view of these data. It does merging data from multiple data stores (data sources).

How it works?
It fundamentally and essentially follows the concatenation operation in Mathematics and the theory of computation. The concatenation operation on strings is generalized to an operation on sets of strings as follows :

For two sets of strings S1 and S2, the concatenation S1S2 consists of all strings of the form vw where v is string from S1 and w is a string from S2



on Sunday, 30 June 2013

Data Cleaning

"Data cleaning is one of the three biggest problems in data warehousing - Ralph Kimball"
"Data cleaning is the number one problem in data warehousing - DCI survey"

Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set,table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data.

After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores. Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data.


on Friday, 28 June 2013

Overview


Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing.

Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural net works).

Why we need Data Preprocessing ?
  1. Data in the real world is dirty
  • incomplete : the value of attribute doesn't complete, attribute that must exist but it just not exist, or just aggregate data is available
  • noisy : contain error or outliers
  • inconsistent : there is discrepancies in coding and value
  • redundant data
     2.  No quality data, no quality mining results (garbage in, garbage out)
  • quality decisions must be based on quality data 
  • data warehouse needs a combination of data which is have a certain quality
     3.  Data extraction, cleaning, and transformation is an important part for data warehouse

on Thursday, 27 June 2013
Generally, task in Data Mining divided by 2 task :
  • Predictive
         uses multiple variables to predict the unknown value or future value of other variables
  • Descriptive
         determine the patterns that can be interpreted by who is person describe the data

And more details are :
  • Classification (predictive)
  • Grouping / Clustering (descriptive)
  • Association Rules (descriptive)
  • Sequential Pattern (descriptive)
  • Regressive (predictive)
  • Deviation Detection (predictive)

on Monday, 24 June 2013
Data mining ( the analysis step as to actually the knowledge discovery in databases method, or KDD ), an interdisciplinary subfield of laptop science, happens to firmly be the computational method of discovering patterns in giant data sets involving strategies with the intersection of artificial intelligence, machine learning, statistics, and database systems. the overall goal as to actually the data mining method often to extract information a data set and transform it into an understandable structure for more use. aside coming from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and on-line updating.

the notion of could be a buzzword, and is frequently misused to mean any sort of large-scale data or information processing ( collection, extraction, warehousing, analysis, and statistics ) however is additionally generalized to any more than a little laptop call support system, as well as artificial intelligence, machine learning, and business intelligence. in the correct use as to actually the word, the key term is discovery citation required, commonly defined as detecting anything new. even the popular book data mining : practical machine learning tools and modules with java ( that covers mostly machine learning material ) was originally as being named barely practical machine learning, and therefore the term data mining was no more than added for selling reasons. typically the additional general terms ( giant scale ) data analysis, or analytics – or when referring to actual strategies, artificial intelligence and machine learning – are additional appropriate.