0

What data type should use in a MySQL database to store 2 text files of code. If I intend to compare similarity later.

It's a MySQL database running on my Windows machine.

Also can you recommend an API that can compare code for me.

2
  • 1
    TEXT maybe? Though VARCHAR can hold 65,535 characters from MySQL v5.0.3 onwards - The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. Commented Jan 23, 2016 at 14:36
  • when you hold a string in a php variable does it matter what type that variable was in mysql (varchar or text etc)? Because I'm doing that and if it doesn't matter then the php API I use to eventually compare code/strings will be indifferent to what I originally stored it as in mysql. Commented Jan 23, 2016 at 14:49

1 Answer 1

1

As per MySQL documentation

Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

...

Values in CHAR and VARCHAR columns are sorted and compared according to the character set collation assigned to the column.

So, VARCHAR is stored inline with the table, whilst BLOB and TEXT types are stored off the table with the database holding the location of the data. Depending on how long your text is, TEXT might be defined as TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT, the only difference is the maximum amount of data it holds.

  • TINYTEXT 256 bytes
  • TEXT 65,535 bytes
  • MEDIUMTEXT 16,777,215 bytes
  • LONGTEXT 4,294,967,295 bytes

To compare the two strings stored in TEXT (or any other string column) you might want to use STRCMP(expr1,expr2)

STRCMP() returns 0 if the strings are the same, -1 if the first argument is smaller than the second according to the current sort order, and 1 otherwise.

If you specify the desired output of the comparison, I might edit the answer.

EDIT

To compare two strings and calculate the difference percentage, you might want to use similar_text. As the official documentation states:

This calculates the similarity between two strings as described in Programming Classics: Implementing the World's Best Algorithms by Oliver (ISBN 0-131-00413-1). Note that this implementation does not use a stack as in Oliver's pseudo code, but recursive calls which may or may not speed up the whole process. Note also that the complexity of this algorithm is O(N**3) where N is the length of the longest string.

Sign up to request clarification or add additional context in comments.

1 Comment

the whole point of this is that I need to specify the percentage similarity between to different codes stored in a mysql database for a project. The logic will be done in PHP. The size of the string is irrelevant as they won't be many lines. I'm wondering if the type of the mysql matters but I see that now that it doesn't.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.