Citat:
ntojzan: @MajorFatal:
Da bi kompresovao random-like podatke, moras prvo definisati sta su to random-like podaci.
Nista ja ne moram :) A i zasto bih kad ima pametnijih i pismenijih ljudi od mene? (mada imao bih neke svoje ideje). Ako pregledas ovih par tema o kompresiji nacices nekoliko razlicitih vidjenja i tumacenja termina "random" i "random-like" cak si i ti dao jednu koja uopste nije losa, cak sta vise:
ntojzan: "Inace razlika izmedju random-like i random podataka jeste sto random podaci podrazumevaju SVE moguce kombinacije, dok se pod random-like obicno podrazumevaju kombinacije koje se ne mogu kompresovati statistickim metodama.
U trenutku dok ovo pisem "random-like" nije definisano (ili opisano) na wikipediji a i za termin "random" nije bas najsjajnija situacija:
http://en.wikipedia.org/wiki/Random
Randomness has somewhat disparate meanings as used in several different fields. It also has common meanings which may have loose connections with some of those more definite meanings. The Oxford English Dictionary defines "random" thus:
Having no definite aim or purpose; not sent or guided in a particular direction; made, done, occurring, etc., without method or conscious choice; haphazard.
Closely connected, therefore, with the concepts of chance, probability, and information entropy, randomness implies a lack of predictability. Randomness is a concept of non-order or non-coherence in a sequence of symbols or steps, such that there is no intelligible pattern or combination.
Tj. u prevodu: "Bez svrhe, nije poslato u odredjenom smeru, napravljeno bez metoda odluke, hazarderski, nedostatak predvidljivosti, bez reda ili inteligentnog obrasca ponasanja"? Pa prilicno negativna definicija? Vise sta nije, nego sta jeste?
"Iskreno receno terminologiju "random-like" sam samo preuzeo sa comp-compress arhiva jer je najvise licilo na nesto cime se bavim vec duze vreme.
Citat:
ntojzan
Potom moras ustanoviti koliki je procenat tih random-like podataka u odnosu na ukupan broj podataka da bi ustanovio da li je uopste moguce napisati program koji ce moci da smanji random-like podatke a poveca ostale. Tek posto je taj uslov zadovoljen, mozes krenuti u izradu bilo kakvog algoritma.
Kao sto rekoh, nista ja ne moram, ima i pametnijih i spretnijih od mene a sasvim sigurno koji bolje programiraju.
Citat:
ntojzan
Inace uslov za to jeste da manje od 50% ukupnih kombinacija cine random-like podaci.
E da sam znao da je Shannon ne bi se toliko zamislio svojevremeno nad ovim sto si napisao, uz sve duzno postovanje Vi ponekad nekriticki prenosite neke stvari koje same po sebi nisu netacne ali ne mora da znaci da ne bi mogle da budu prosirene ili nadopunjene ili kritikovane ili gledane iz nekog drugog ugla, recimo. Ceo citat od Shannona:
The ratio of the entropy of a source to the maximum value it could have while still restricted to the same
symbols will be called its relative entropy. This is the maximum compression possible when we encode into
the same alphabet. One minus the relative entropy is the redundancy.
The redundancy of ordinary English,
not considering statistical structure over greater distances than about eight letters, is roughly 50%. This
means that when we write English half of what we write is determined by the structure of the language and
half is chosen freely. The figure 50% was found by several independent methods which all gave results in
-this neighborhood. One is by calculation of the entropy of the approximations to English. A second method
is to delete a certain fraction of the letters from a sample of English text and then let someone attempt to
restore them. If they can be restored when 50% are deleted the redundancy must be greater than 50%. A
third method depends on certain known results in cryptography.
Two extremes of redundancy in English prose are represented by Basic English and by James Joyce’s
book “FinnegansWake”. The Basic English vocabulary is limited to 850 words and the redundancy is very
high. This is reflected in the expansion that occurs when a passage is translated into Basic English. Joyce
on the other hand enlarges the vocabulary and is alleged to achieve a compression of semantic content.
The redundancy of a language is related to the existence of crossword puzzles. If the redundancy is
zero any sequence of letters is a reasonable text in the language and any two-dimensional array of letters
forms a crossword puzzle. If the redundancy is too high the language imposes too many constraints for large
crossword puzzles to be possible. A more detailed analysis shows that if we assume the constraints imposed
by the language are of a rather chaotic and random nature, large crossword puzzles are just possible when
the redundancy is 50%. If the redundancy is 33%, three-dimensional crossword puzzles should be possible,
etc.
"Inace uslov za to jeste da manje od 50% ukupnih kombinacija cine random-like podaci." - dakle to se odnosi na "standardan (svakodnevni) Engleski" jezik, koji se za moj ukus isuvise cesto pojavljuje i kod Shannona i kod McKoj-a (Dejvida Mekkeja (David MacKay), (ona treca knjiga sto mi je filmil preporucio je i dalje malo nedostupna on-line i i dalje je malo skupa), I sta sad? Da svi citamo Jojsa? (Koji je za svoje vreme bio kao super, a ovih dana je kao bas zastareo i nije vise toliko atraktivan, kazu kriticari?).
The redundancy of a language is related to the existence of crossword puzzles. If the redundancy is
zero any sequence of letters is a reasonable text in the language and any two-dimensional array of letters
forms a crossword puzzle. If the redundancy is too high the language imposes too many constraints for large
crossword puzzles to be possible. A more detailed analysis shows that if we assume the constraints imposed
by the language are of a rather chaotic and random nature, large crossword puzzles are just possible when
the redundancy is 50%. If the redundancy is 33%, three-dimensional crossword puzzles should be possible,
etc.
3D ukrstenice ne postoje jer bi bile necitljive, kako da citas slova po dubini ako ih ona ispred zaklanjaju? (mada na danasnjim racunarima, ko zna?) ali zato postoje osmosmerke kako na srpskom tako i na engleskom, nesto kao 3D ukrstenica prilagodjena za 2D engine (papir, ekran). Inace redundansa "svakodnevnog" srpskog je oko 70% bar tako kazu nasi strucnjaci za jezik?
Citat:
ntojzan
Ako se budes malo vise posvetio izucavanju teme, shvatices da preko 99% kombinacija cine upravo random-like podaci, znaci svi pokusaji da se ti podaci kompresuju su unapred osudjeni na propast. Uopste nije bitno koji je algoritam u pitanju.
Ako bi mi verovao da sam se bas dosta do sada posvetio proucavanju teme i takoreci sagoreo, bas vise ne mogu, rekao bih i da sto se velicina fajla povecava da i random-like i random podaci zauzimaju jos vise mesta ali i da veoma optimizovan fajl (kao src, bin, exe) veoma lici na veoma random fajl tj. nema ili ima veoma malo redundanse (ponavljanja) nizova bita, ali cemu, lakse je reci "dokazano je da je nemoguce"? i kraj i tacka i ostalo...
Nemoj da pricas?