Ñàéò î ñæàòèè >> ARCTEST Ñðàâíèòåëüíûå òåñòû Àëüòåðíàòèâíûå òåñòû
|
Ñóïåðàäàïòèâíîå ñæàòèåÄëÿ ñæàòèÿ äàííûõ èñïîëüçóþòñÿ êàê ñòàòèñòè÷åñêèå, òàê è ýâðèñòè÷åñêèå àëãîðèòìû. Ñòàòèñòè÷åñêèå àëãîðèòìû (Õàôôìàíà, Øåííîíà-Ôàíî, àðèôìåòè÷åñêèå) òðåáóþò çíàíèÿ âåðîÿòíîñòåé ïîÿâëåíèÿ ñèìâîëîâ â òåêñòå. Êàê ïðàâèëî, ýòè âåðîÿòíîñòè íåèçâåñòíû. Äëÿ èõ îöåíêè èñïîëüçóþò ÷àñòîòû ñèìâîëîâ. Îäíàêî, ïðè îäíîïðîõîäíîé îáðàáîòêå ôàéëà, ýòè ÷àñòîòû òàêæå íåèçâåñòíû. Ïîýòîìó âñå ñòàòèñòè÷åñêèå àëãîðèòìû ìîæíî ðàçäåëèòü íà òðè êëàññà:
Ýâðèñòè÷åñêèå àëãîðèòìû ñæàòèÿ (òèïà LZ77, LZ78), êàê ïðàâèëî, èùóò ïîâòîðÿþùèåñÿ ñòðîêè â ôàéëå, ëèáî ñòðîÿò ñëîâàðü êàê óæå âñòðå÷àâøèõñÿ ôðàç, òàê è ôðàç, êîòîðûå íàèáîëåå âåðîÿòíî ìîãóò ïîÿâèòñÿ â òåêñòå. Îáû÷íî òàêèå àëãîðèòìû îáëàäàþò öåëûì ðÿäîì ñïåöèôè÷åñêèõ ïàðàìåòðîâ (ðàçìåð áóôåðà, ìàêñèìàëüíàÿ äëèíà ôðàçû, ðàçìåð ðàññìàòðèâàåìîãî êîíòåêñòà è ò.ï.), ïîäáîð êîòîðûõ çàâèñèò îò îïûòà àâòîðà àëãîðèòìà, è ýòè ïàðàìåòðû ïîäáèðàþòñÿ òàê, ÷òîáû äîáèòüñÿ îïòèìàëüíîãî ñî÷åòàíèÿ âðåìåíè ðàáîòû, êîýôôèöèåíòà ñæàòèÿ è øèðîòû êëàññà õîðîøî ñæèìàåìûõ ôàéëîâ. Ïðè ïîäáîðå ýòèõ ïàðàìåòðîâ ìîæíî çàìåòèòü, ÷òî äëÿ ðàçëè÷íûõ ôàéëîâ (òåêñòîâûå, äâîè÷íûå, áàçû äàííûõ) îïòèìàëüíû ðàçëè÷íûå ñî÷åòàíèÿ ïàðàìåòðîâ, íå ãîâîðÿ óæå î òîì, ÷òî ðàçíûå àëãîðèòìû îïòèìàëüíû äëÿ ðàçíûõ êëàññîâ èñõîäíûõ ôàéëîâ. Èäåÿ ñóïåðàäàïòèâíîãî ñæàòèÿ çàêëþ÷àåòñÿ â àäàïòèâíîñòè ïàðàìåòðîâ ñæèìàþùåãî àëãîðèòìà, ò.å. ïàðàìåòðû àëãîðèòìà, èëè æå ñàìè àëãîðèòìû ñæàòèÿ, ìîãóò ìåíÿòüñÿ â çàâèñèìîñòè îò òåêóùåãî ðàñïðåäåëåíèÿ ÷àñòîò ñèìâîëîâ â îáðàáàòûâàåìîì ôàéëå. Àëãîðèòì ïðè ýòîì îñòàåòñÿ îäíîïðîõîäíûì. Âñÿ òîíêîñòü çàêëþ÷àåòñÿ â ðåøåíèè î ïðèíàäëåæíîñòè äàííîãî ôàéëà ê òîìó èëè èíîìó êëàññó. Ìîæíî çàìåòèòü, ÷òî äëÿ ôàéëîâ òîãî èëè èíîãî êëàññà ñâîéñòâåííî îïðåäåëåííîå ðàñïðåäåëåíèå ÷àñòîò ñèìâîëîâ. Êîíå÷íî, äëÿ êàæäîãî ôàéëà ðàñïðåäåëåíèå ÷àñòîò óíèêàëüíî, íî ó ôàéëîâ îäíîãî êëàññà ýòè ðàñïðåäåëåíèÿ íå ñèëüíî ðàçëè÷àþòñÿ. Òàêèì îáðàçîì ïðîáëåìà ñâîäèòñÿ ê îïðåäåëåíèþ âèäà ðàñïðåäåëåíèÿ ñèìâîëîâ ê êîòîðîìó áëèæå âñåãî â äàííûé ìîìåíò ðàñïðåäåëåíèå ñèìâîëîâ îáðàáàòûâàåìîãî ôàéëà. Ìåðîé áëèçîñòè îäíîãî ðàñïðåäåëåíèÿ ÷àñòîò ñèìâîëîâ ê äðóãîìó ìîæåò ñëóæèòü êîëè÷åñòâî èíôîðìàöèè.  îòëè÷èå îò Øåííîíîâñêîé ýíòðîïèè, ñóùåñòâóåò íåñêîëüêî ðàçëè÷íûõ îïðåäåëåíèé êîëè÷åñòâà èíôîðìàöèè. Ýòî ñâÿçàíî ñ óñëîâèÿìè, êîòîðûå íàêëàäûâàþòñÿ íà íåå. Ýòèì óñëîâèÿì óäîâëåòâîðÿåò íåñêîëüêî ðàçëè÷íûõ ôóíêöèé. Íå âäàâàÿñü â ìàòåìàòè÷åñêèå ïîäðîáíîñòè, ñêàæåì ñàìîå ãëàâíîå: ýòà ôóíêöèÿ äîëæíà îáðàùàòüñÿ â íîëü ïðè ðàâåíñòâå ðàñïðåäåëåíèé è ïîëîæèòåëüíîé â äðóãèõ ñëó÷àÿõ. Ïðèìåðíàÿ ñõåìà ñóïåðàäàïòèâíîãî ñæàòèÿ:
Ðàçëè÷íûå âàðèàíòû îïðåäåëåíèÿ êîëè÷åñòâà èíôîðìàöèèÎáîçíà÷èì Pi è Qi ÷àñòîòû ñèìâîëîâ òåêñòà è ÷àñòîòû ñèìâîëîâ ïðè êîòîðûõ îïòèìàëåí íåêèé àëãîðèòì ñæàòèÿ. N N SUM Pi = SUM Qi = 1 i=1 i=1 Òîãäà êîëè÷åñòâî èíôîðìàöèè ìîæíî îïðåäåëèòü ñëåäóþùèìè ñïîñîáàìè: Îáîçíà÷åíèÿ:N Kullback SUM Qi * log(Qi/Pi) i=1 N Kullback SUM Pi * log(Pi/Qi) i=1 N Kullback SUM (Qi-Pi) * log(Qi/Pi) i=1 N 2 Matusita SQRT( SUM Pi * (SQRT(Qi/Pi)-1) ) i=1 1 N Kolmogoroff - SUM Pi * ABS(Qi/Pi-1) 2 i=1 N Bhattacharyya -log( SUM Pi * SQRT(Qi/Pi) ) i=1 N 2 Kagan SUM Pi * (1-Qi/Pi) i=1 Îáîáùåíèå N 2 íàèìåíüøèõ êâàäðàòîâ SQRT( SUM Qi * (Pi/Qi) - 1 ) i=1 * - îïåðàöèÿ óìíîæåíèÿ; log - ëîãàðèôì ïî
îñíîâàíèþ 2; SUM - îïåðàöèÿ
ñóììèðîâàíèÿ; SQRT - îïåðàöèÿ èçâëå÷åíèÿ
êâàäðàòíîãî êîðíÿ; ABS - ìîäåëü ÷èñëà; Íèæå ïðèâîäÿòñÿ çíà÷åíèÿ ðàçëè÷íûõ êîëè÷åñòâ èíôîðìàöèè äëÿ ôàéëîâ ðàçëè÷íûõ òèïîâ. Íàèáîëåå óäîáíû äëÿ ïðàêòè÷åñêîé ðåàëèçàöèè âàðèàíòû Kullbackà è Kaganà. Òàêæå ïðèâîäèòñÿ çíà÷åíèå ýíòðîïèè ñèìâîëà èññëåäóåìûõ ôàéëîâ. +++++ File: t.zip +++++ Shannon's Hentropy = 7.9980 Kullback's information gain = 0.0020 Kullback's reversed information gain = 0.0020 Kullback's divergency = 0.0040 Matusita information gain = 0.0262 Kolmogoroff's information gain = 0.0000 Bhattacharyya information gain = 0.0005 Kagan's information gain = 0.0027 MNS generalisation = 0.0527 +++++ File: DEFCAT.DB +++++ Shannon's Hentropy = 3.4174 Kullback's information gain = 4.5826 Kullback's reversed information gain = 2.3148 Kullback's divergency = 6.8974 Matusita information gain = 0.9575 Kolmogoroff's information gain = 0.3652 Bhattacharyya information gain = 0.8847 Kagan's information gain = 88.4309 MNS generalisation = 2.2684 +++++ File: FILELIST.DOC +++++ Shannon's Hentropy = 4.7476 Kullback's information gain = 3.2524 Kullback's reversed information gain = 0.2492 Kullback's divergency = 3.5016 Matusita information gain = 0.7056 Kolmogoroff's information gain = 0.3633 Bhattacharyya information gain = 1.2889 Kagan's information gain = 20.1490 MNS generalisation = 2.8623 +++++ File: l12.prt +++++ Shannon's Hentropy = 5.0039 Kullback's information gain = 2.9961 Kullback's reversed information gain = 1.2768 Kullback's divergency = 4.2729 Matusita information gain = 0.7841 Kolmogoroff's information gain = 0.3398 Bhattacharyya information gain = 1.0078 Kagan's information gain = 17.1672 MNS generalisation = 3.4860 +++++ File: sources.c +++++ Shannon's Hentropy = 5.6431 Kullback's information gain = 2.3569 Kullback's reversed information gain = 0.4507 Kullback's divergency = 2.8076 Matusita information gain = 0.6489 Kolmogoroff's information gain = 0.2871 Bhattacharyya information gain = 0.8291 Kagan's information gain = 10.1830 MNS generalisation = 1.5086 +++++ File: hi.exe +++++ Shannon's Hentropy = 6.6845 Kullback's information gain = 1.3155 Kullback's reversed information gain = 0.9956 Kullback's divergency = 2.3111 Matusita information gain = 0.5948 Kolmogoroff's information gain = 0.1777 Bhattacharyya information gain = 0.2808 Kagan's information gain = 7.8968 MNS generalisation = 1.5113 DEFCAT.DB - áàçà äàííûõ Paradox FILELIST.DOC - ñïèñîê ôàéëîâ ïàêåòà BC++ 3.1 (engl.) l12.prt - ôîðìàòèðîâàííûé òåêñò íà ðóññêîì ÿçûêå. Ìàêñèì Çàõàðîâ
Ñàéò î ñæàòèè
>>
ARCTEST
>>
Ñðàâíèòåëüíûå òåñòû
|
Àëüòåðíàòèâíûå òåñòû
|
Ãðàôè÷åñêèå òåñòû
|
Íîâîñòè
|
Óòèëèòû
|
Ôàéë'ìåíåäæåðû
|
Îïèñàíèÿ
|
Ëèíêè
|
Necromancer's DN
|
Ïîääåðæêà
Ñàéò î ñæàòèè >> Íîâèíêè | Î ñåðâåðå | Ñòàòèñòèêà Êíèãà "Ìåòîäû ñæàòèÿ äàííûõ" >> Óíèâåðñàëüíûå | Èçîáðàæåíèé | Âèäåî Ðàçäåëû >> Download (ñòàòüè+èñõîäíèêè) | Ññûëêè | Ru.compress | Arctest | Âèäåî | Êàòàëîã ññûëîê | Ôîðóì Ïðîåêòû >> Ä.Âàòîëèíà | À.Ðàòóøíÿêà | Ì.Ñìèðíîâà | Â.Þêèíà | Å.Øåëâèíà | À.Ôèëèíñêîãî | Ä.Øêàðèíà | Ñ.Îñíà÷à |