Why does the last element on my HashSet always have a different HashCo - Enhance your coding expertise with NicoDorito on @onlycoders.net

1 year ago

#359151

NicoDorito

Why does the last element on my HashSet always have a different HashCode? Contains doesn't work properly

I'm making a simple ortography verification tool in Unity(though this shouldn't be relevant, as my problem is a C# one). The basic logic is:

Import a text file containing a words list in the language I want(10-20k items)
Turn that text file into a HashSet, separating elements by a comma
Take the input from the user, separate each word into a HashSet(or list). Example "Hello how are you".
Using a foreach loop, compare each entry from the user's input against the dictionary HashSet. If some word is not present in the HashSet, return a bool saying there is a mistake.

Note: I'm not sure if that would already be the case, but it is important that if I have something like "hello,", it would still be considered a correct case against a "hello" entry in the dictionary.

But I've noticed a problem. Every last word of the user's input is considered wrong. In the previous example, "you" would be wrong, even if in another sentence like "you are cool", it would be right. I've investigated and it seems like the hashcode of the last word is always different than it would be if the same word was in a different position.

I suspect I have to override Equals and GetHashCode, as I've seen on other forums. But I have no idea how to implement that. I'm open to suggestions of other libraries with less tricky, efficient arrays, as well. Below is my code:

public class SpellChecker : MonoBehaviour
{
    [SerializeField] private TextAsset dictionary;
    [SerializeField] private bool useCase = false;
    private HashSet<string> dictionaryHash = new HashSet<string>();



    void Start()
    {

        // Creates the hashset data set at the start of the game, by separating the .txt files using commas as a separator.
        var words = dictionary.text.Split(new[] { "," }, StringSplitOptions.RemoveEmptyEntries);
        for (int i = 0, count = words.Length; i < count; i++)
        {
            //Each word is added to the HashSet
            words[i].Replace(" ", string.Empty);
            dictionaryHash.Add(words[i].ToLower());
        }

    }

    //The bool I refer to in order to check if the inputted word is a valid entry in the dictionary's hashset
    public bool VerifyOrtography(string word)
    {
        //checks if selected string is a word
        if (dictionary && dictionaryHash.Contains(wordToCheck))
        {
            return true;
        }
        else
            return false;
    }

    //Main function called from an external input script, \u0020 represents whitespace
    public void VerifyOrtography(TextMeshProUGUI text)
    {
        bool hasInvalidWord = false;
        HashSet<string> inputHash = new HashSet<string>();
        var textInputted = text.text.Split(new[] { "\u0020" }, StringSplitOptions.RemoveEmptyEntries);
        for (int i = 0, count = textInputted.Length; i < count; i++)
        {
            inputHash.Add(textInputted[i]);
        }

        foreach (string item in inputHash)
        {
            //in the case the verified word is not present in the dictionary
            if (!VerifyOrtography(item))
            {
                Debug.Log("Found invalid word : \'" + item + "\'. Hashcode " + item.GetHashCode());
                hasInvalidWord = true;

            }
            else Debug.Log("Valid word : \'" + item + "\', hashcode " + item.GetHashCode());
        }

        if (hasInvalidWord)
        {
            Debug.Log("Text did not pass.");
        }
    }
}

I'm surprised that this is as complicated as it's shown to be. When it comes to the System library, I'm not proficient at all. Please let me know if you have any ideas on how to solve this problem. Thank you!

What I've tried:

Changing the user input hashset into a list, same results. I tried duplicating the last entry from the user input for the verification and deleting it later, both entries showed up as wrong. I tried verifying from the text asset directly. I tried Visual Studio's automatic override of Equals and GetHashCode. I tried adding a random string after everything else in the input hashset, like ".", and it kind of worked, but I want to solve this the right way.

EDIT: Log dump.

equals

contains

hashset

gethashcode

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs

Why does the last element on my HashSet always have a different HashCode? Contains doesn&#39;t work properly

Why does the last element on my HashSet always have a different HashCode? Contains doesn't work properly