C# Syntax Highlighter V1.0
Regular expressions are a pain in the neck. Unfortunately, they're also very useful. So I bullied my uncooperative brain into studying regular expressions and found myself hooked â once I got out of the "Why the @&#$*! doesn't this work' phase. Anyway, I always wanted to make a Syntax Highlighter and I finally got around to doing it. Here's the code for anybody who might want to learn more about regular expressions or anybody who just wants yet another syntax highlighter in their toolbox. ####Why You Should Highlight Code
Because you're a nice person. Because it will make your code more readable. Because your readers will love you. Any more questions?
Without further ado, then, here's the code for Syntax Highlighter V1.0, highlighted and formatted, I might add, using the Syntax Highlighter:
using System;using System.Text;using System.Text.RegularExpressions;public class CsharpHighlighter{public string Highlight(string code){StringBuilder patterns = new StringBuilder();//Regular expression for single-line commentspatterns.Append(@"(/(?!//)/[^ ]*)|");//Regular expression for formal documentation commentspatterns.Append(@"(///[^ ]*)|");//Regular expression for matching multi-line commentspatterns.Append(@"(/*.*?*/)|");//Regular expression for matching double-quote stringpatterns.Append(@"((?<!@)"[^ ]*?(?<!\)")|");//Regular expression for matching hard quotes stringpatterns.Append(@"(@".*?(?<!\)")|");//Regular expression for matching single-quote stringpatterns.Append(@"('[^ ]*?(?<!\)')|");//Keywordspatterns.Append(GetKeywords());Regex all = new Regex(patterns.ToString(), RegexOptions.Singleline);code = all.Replace(code, new MatchEvaluator(HandleMatch));Regex line = new Regex(@"^.*?$", RegexOptions.Multiline);code = line.Replace(code, new MatchEvaluator(HandleLines));//Turn tabs and spaces into sRegex tabsToSpaces = new Regex(@"<li> * *", RegexOptions.Singleline);code = tabsToSpaces.Replace(code, new MatchEvaluator(HandleTabs));//Break multi-line comments into lines properlyRegex mlcToLines = new Regex(@"/*.*?*/", RegexOptions.Singleline);code = mlcToLines.Replace(code, new MatchEvaluator(HandleMLC));//Break hard strings properlyRegex hardStrToLines = new Regex(@"@".*?(?<!\)"", RegexOptions.Singleline);code = hardStrToLines.Replace(code, new MatchEvaluator(HandleSTR));return "<ol class = "code"> " + code + "</ol> ";}private string HandleMatch(Match m){//Single-line commentsif(m.Groups[1].Success){return "<span class = "slc">" + m.Value + "</span>";}//Formal documentation commentselse if (m.Groups[2].Success){return "<span class = "fdc">" + m.Value + "</span>";}//Multi-line commentselse if (m.Groups[3].Success){return "<span class = "mlc">" + m.Value + "</span>";}//Stringelse if (m.Groups[4].Success || m.Groups[5].Success || m.Groups[6].Success){return "<span class = "str">" + m.Value + "</span>";}else if (m.Groups[7].Success){return "<span class = "kwd">" + m.Value + "</span>";}else{return String.Empty;}}private string HandleLines(Match m){//Add to empty lines so they show upif (m.Value.Trim().Length < 1){return "<li> </li>";}else{//If we don't get rid of the new line character, the <li>//ends up on a, umm, new line â the HTML source code looks//somewhat ugly.return "<li>" + m.Value.TrimEnd(â ', â ') + "</li>";}}private string HandleMLC(Match m){StringBuilder value = new StringBuilder(m.Value);value.Replace("<li>", "<li><span class = "mlc">");value.Replace("</li>", "</span></li>");return value.ToString();}private string HandleSTR(Match m){StringBuilder value = new StringBuilder(m.Value);value.Replace("<li>", "<li><span class = "str">");value.Replace("</li>", "</span></li>");return value.ToString();}private string HandleTabs(Match m){StringBuilder space = new StringBuilder();space.Append("<li>");//We're simply going to convert each tab into 4 spacesfor (int i = 0; i < m.Value.Length - 4; i++)space.Append(" ");return space.ToString();}private string GetKeywords(){StringBuilder kwds = new StringBuilder(@"b(abstract|as|base|bool|boolean|break|byte|case|catch|char|checked|class|const|continue|decimal|default|delegate|do|double|else|enum|event|explicit|extern|false|finally|fixed|float|for|foreach|get|goto|if|implements|implicit|in|instanceof|int|interface|internal|is|length|lock|long|namespace|native|new|null|object|operator|out|override|package|params|private|protected|public|readonly|ref|return|sbyte|sealed|set|short|sizeofstackalloc|static|string|struct|super|switch|synchronized|this|threadsafe|throw|throws|true|try|typeof|uint|ulong|unchecked|unsafe|ushort|using|virtual|void|while)b");kwds.Replace(" ", "");kwds.Replace(" ", "");kwds.Replace(" ", "");return kwds.ToString();}}
What's Right About Version of Syntax Highlighter
It uses a list rather than thetag, which is just plain awesome. I can't tell you how much I hate the
tag. I've become especially miserable with it in the recent days. I post a lot of code to this blog, but I really don't want to spend time wrestling with lines that are too long.
Each line is numbered, which is also very nice. For one thing, it's easier to tell the readers to "insert [new code] at line 243" than it is to say "insert [new code] after the line in the Highlight function after we make the regex pattern for matching double-quote strings.' For another thing, line numbers just make it so much easier to read the code.
Changes to be Made Still
Turn the string of  s; into padding-lefts.Right now, this is a C#-only highlighter.Read the keywords from an XML file.
Look for these changes in Version 2.0!