By definition Q-Learning works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. A policy is a rule that the agent follows in selecting actions, given the state it is in.
This might sounds very “crazy” but actually this is an awesome model/algorithm to find the best way to go from point A to point B and even the model can learn it and found the best way itself very quickly. Q-Learning used in every day world for instance to find the best fly connections or find the path on a map for you when you get a direction. But this is much more like these because Machine Learning allows to lear and getting better and better to provide more quality information.
Q-Learning has a quick learning curve and can provide quality information very quickly.