Showing posts with label data mining. Show all posts
Showing posts with label data mining. Show all posts
Overview
Text mining, refers to the process of taking high-quality information of text. High-quality information is usually obtained through a pattern forecasting and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of some among others, and subsequent insertion into a database), determine the pattern in structured data, and finally evaluate and interpret the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness.
Consumers are a very important asset for a company. There will be no business without any prospects of relationship with the consumers who are loyal. This is the reason why the company should plan and use the strategy that is quite clear in treating consumers. Customer Relationship Management (CRM) has grown in recent decades to reflect the major role of consumers for setting corporate strategy. CRM includes all sizes to understand consumers and processes to exploit this knowledge to design and implement the marketing activities, production and supply chain of suppliers (suppliers). The following will CRM is defined some terms are taken from the literature, among others (Tama, 2009):
Support Vector Machines (SVMs) are a set of supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis. Original SVM algorithm invented by Vladimir Vapnik and the current standard derivative (soft margin) was proposed by Corinna Cortes and Vladimir Vapnik (Cortes, C. and Vapnik, V, 1995).
CRISP-DM (Cross-Industry Standard Process for Data Mining) is a consortium of companies which was established by the European Commission in 1996 and has been established as a standard process in data mining that can be applied in various industrial sectors. The image below describes the development life cycle of data mining has been defined in CRISP-DM.
Data Integration
Data integration is one of the steps Data Preprocessing that involves combining data residing in different sources and providing users with a unified view of these data. It does merging data from multiple data stores (data sources).
How it works?
It fundamentally and essentially follows the concatenation operation in Mathematics and the theory of computation. The concatenation operation on strings is generalized to an operation on sets of strings as follows :
For two sets of strings S1 and S2, the concatenation S1S2 consists of all strings of the form vw where v is string from S1 and w is a string from S2

Data Cleaning
"Data cleaning is one of the three biggest problems in data warehousing - Ralph Kimball"
"Data cleaning is the number one problem in data warehousing - DCI survey"
Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set,table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data.
After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores. Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data.
Overview
Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural net works).
Why we need Data Preprocessing ?
- Data in the real world is dirty
- incomplete : the value of attribute doesn't complete, attribute that must exist but it just not exist, or just aggregate data is available
- noisy : contain error or outliers
- inconsistent : there is discrepancies in coding and value
- redundant data
2. No quality data, no quality mining results (garbage in, garbage out)
- quality decisions must be based on quality data
- data warehouse needs a combination of data which is have a certain quality
3. Data extraction, cleaning, and transformation is an important part for data warehouse
Generally, task in Data Mining divided by 2 task :
- Predictive
- Descriptive
And more details are :
- Classification (predictive)
- Grouping / Clustering (descriptive)
- Association Rules (descriptive)
- Sequential Pattern (descriptive)
- Regressive (predictive)
- Deviation Detection (predictive)
Data mining
( the analysis step as to actually the knowledge discovery in databases method,
or KDD ), an interdisciplinary subfield of laptop science, happens to firmly be
the computational method of discovering patterns in giant data sets involving
strategies with the intersection of artificial intelligence, machine learning,
statistics, and database systems. the overall goal as to actually the data
mining method often to extract information a data set and transform it into an
understandable structure for more use. aside coming from the raw analysis step,
it involves database and data management aspects, data pre-processing, model
and inference considerations, interestingness metrics, complexity
considerations, post-processing of discovered structures, visualization, and
on-line updating.
the notion
of could be a buzzword, and is frequently misused to mean any sort of
large-scale data or information processing ( collection, extraction,
warehousing, analysis, and statistics ) however is additionally generalized to
any more than a little laptop call support system, as well as artificial
intelligence, machine learning, and business intelligence. in the correct use
as to actually the word, the key term is discovery citation required, commonly
defined as detecting anything new. even the popular book data mining :
practical machine learning tools and modules with java ( that covers mostly
machine learning material ) was originally as being named barely practical
machine learning, and therefore the term data mining was no more than added for
selling reasons. typically the additional general terms ( giant scale ) data
analysis, or analytics – or when referring to actual strategies, artificial
intelligence and machine learning – are additional appropriate.
Subscribe to:
Posts (Atom)