The expected functionality of chardet (detect the unicode encoding) is kind of fixed - apart from edge cases and new additions to unicode, you'd expect the original and new implementations to largely pass the same tests, and have a lot of similar code such as for "does this start with a BOM".
The fact that the JPlag shows such a low %overlap for an implementation of "the same interface" is convincing evidence for me that it's not just plagiarised.
The expected functionality of chardet (detect the unicode encoding) is kind of fixed - apart from edge cases and new additions to unicode, you'd expect the original and new implementations to largely pass the same tests, and have a lot of similar code such as for "does this start with a BOM".
The fact that the JPlag shows such a low %overlap for an implementation of "the same interface" is convincing evidence for me that it's not just plagiarised.