Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>>Even if we agreed with the statistics, the "one in a thousand" would be relevant if we had some other reason to suspect Szabo. He could simply be that one in a thousand who matches.

Except we aren't dealing with a massive amount of people who may use those phrases just a couple of people. If we already have reason to suspect something and we are given independent persuasive evidence this should be taken together to make it very unlikely that our hypothesis is not true.

The great thing about textual stylistic analysis is that everyone has a specific way of communicating that can be identified unless one takes steps to actively hide it. This happens in all languages and can usually identify authorship regardless if someone tried to copy someone else's style.



> "The great thing about textual stylistic analysis is that everyone has a specific way of communicating that can be identified "

How scientifically well supported is this assertion?


Stylometry's a real thing. It gets harder the shorter the sample is, and it's much easier to say "this text matches X's verbal tics better than it matches Y's" than to confidently point at a specific person like this post is doing. Fishing for a similarity measure that supports your desired conclusion is a risk. Identification from style (phrases, quirks, vocab size) is a thing, though.

The program or more-raw data would be interesting.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: