You are essentially looking for patterns in text, your solutions looks not ideal and it takes a lot of time to develop vs just using some linux tools and piping output from one another. 10 millions line is nothing... (commenting based in your gist)
would go even further and say that you could easily have installed something like https://oracle.github.io/opengrok/ in three commands for your organization and extract a lot more value while resolving the issue.