Converting Names to Mixed Case/Camel Case in SQL Server

I was recently given the task to convert names in a database to mixed case.  Many of them were already in mixed case, so I wanted to leave those alone and just focus on the ones that were in all caps or all lowercase.  No automated conversion could be perfect, but this got me pretty close.  To view a post on determining mixed case in Sql Server, click here.

Here’s my strategy:

  • If the name is already in mixed case, trust that it’s right.
  • If the name is not in mixed case, then do the following:
    • Trim up the name to eliminate white space
    • Account for the names that start with “Mc” like “McDavid”
    • Account for names with apostrophes like O’Reilly
    • Account for hyphenated names (married names) “Anderson-Johnson”
    • Account for multiple word names like “La Russa”
    • Make sure suffixes included in the names field are capitalized appropriately

Here’s the code:

CREATE FUNCTION [dbo].[GetCamelCaseName]
(
	@Name varchar(50)
)
RETURNS VARCHAR(50) WITH SCHEMABINDING
AS
BEGIN				
			-- Capitalize the first letter and make the rest lowercase
			SELECT @NameCamelCase =	UPPER(SUBSTRING(@Name, 1, 1)) +	LOWER(SUBSTRING(@Name, 2, 50))

			-- McDavid, McReynolds, etc. --> 1st character Uppercase, 2nd Lower, 3rd Uppercase, the rest lower
			IF (@NameCamelCase LIKE 'mc%')
				SELECT @NameCamelCase =	UPPER(SUBSTRING(@Name, 1, 1)) + LOWER(SUBSTRING(@Name, 2, 1)) + UPPER(SUBSTRING(@Name, 3, 1))  + LOWER(SUBSTRING(@Name, 4, 50))

			-- Hyphenated Names "Anderson-White" --> Capitalize after the hyphen
			IF (@NameCamelCase LIKE '%-%')
				SELECT @NameCamelCase =	SUBSTRING(@NameCamelCase, 1, CHARINDEX('-', @NameCamelCase)) + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX('-', @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX('-', @NameCamelCase) + 2, 50)

			-- O'Reilly, O'Neil, etc. --> 1st and 3rd upper case, the rest lower
			IF (@NameCamelCase LIKE '%''%')
				SELECT @NameCamelCase = SUBSTRING(@NameCamelCase, 1, CHARINDEX('''', @NameCamelCase)) + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX('''', @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX('''', @NameCamelCase) + 2, 50)

			-- I found some compound last names in parentheses in my sample --> capitalize after '('
			IF (@NameCamelCase LIKE '%(%')
				SELECT @NameCamelCase = SUBSTRING(@NameCamelCase, 1, CHARINDEX('(', @NameCamelCase) - 1) + '('  + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX('(', @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX('(', @NameCamelCase) + 2, 50)


			-- Hyphenated Names "De La Hoja" --> Capitalize after each space (do it twice for names with three words)
			--     Change the space to '?' each time (so you actually get to the second space), then replace the '?' with ' '
			IF (@NameCamelCase LIKE '% %')
				SELECT @NameCamelCase =	SUBSTRING(@NameCamelCase, 1, CHARINDEX(' ', @NameCamelCase) - 1) + '?'  + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX(' ', @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX(' ', @NameCamelCase) + 2, 50)
			IF (@NameCamelCase LIKE '% %')
				SELECT @NameCamelCase =	SUBSTRING(@NameCamelCase, 1, CHARINDEX(' ', @NameCamelCase) - 1) + '?'  + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX(' ', @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX(' ', @NameCamelCase) + 2, 50)

			SELECT @NameCamelCase = REPLACE(@NameCamelCase, '?', ' ')

			-- Suffixes and other miscellaneous things			
			SELECT @NameCamelCase = REPLACE(@NameCamelCase, ' sr', ' Sr')			
			SELECT @NameCamelCase = REPLACE(@NameCamelCase, ' jr', ' Jr')			
			SELECT @NameCamelCase = REPLACE(@NameCamelCase, ' ii', ' II')
			SELECT @NameCamelCase = REPLACE(@NameCamelCase, ' iii', ' III')
			SELECT @NameCamelCase = REPLACE(@NameCamelCase, ' DE ', ' de ')
			SELECT @NameCamelCase = REPLACE(@NameCamelCase, 'macdonald', 'MacDonald')


			if (@NameCamelCase LIKE '% iv') -- just do names that end in 'IV' so you don't change "Ivy" to "IVy"
				SELECT @NameCamelCase = REPLACE(@NameCamelCase, ' iv', ' IV')

			if ((@NameCamelCase = 'i') or (@NameCamelCase = 'ii') 
or (@NameCamelCase = 'iii') or (@NameCamelCase = 'iv'))
				SELECT @NameCamelCase = UPPER(@NameCamelCase)

			-- Return the result of the function
			RETURN ISNULL(@NameCamelCase, '')
			
		END

	RETURN ISNULL(@NameCamelCase, '')

END

Advertisements

SQL Select Multiple Columns into Comma-Separated List

I needed to select a field from multiple rows in one table into one column in another table.  I wanted the one column to be a comma-separated list as well.

One way to do this would be to use cursors, but the following post showed an easier way using XML Path:  http://bytes.com/topic/sql-server/answers/431513-getting-data-multiple-rows-into-one-column.

I ended up doing something like this:


SELECT  State.Name,
       (SELECT  City.Name AS [text()]
        FROM City
        WHERE City.StateID = State.StateID  && City.Population > 100000
        FOR XML Path(")) AS LargeCities
FROM State

Of course, this just gives me the list of city names all run together like this:  “DenverBoulderPuebloColorado Springs”.

I wanted to separate these by commas so I changed the query to “Select City.Name + ‘, ‘ AS [text()], but this gives me an extra comma on the end like this: “Denver, Boulder, Pueblo, Colorado Springs, “.

Of course, I could have used SUBSTRING(“the inner select statement”, 0, LEN(“the inner select statement”) – 2) and been fine, but I didn’t want to rerun the query and it would be harder to ready anyway.

What I ended up doing was inverting the positioning of the “City.Name” and the “, “.  This gave me the following:  “, Denver, Boulder, Pueblo, Colorado Springs”.

Then, all I had to do with the SUBSTRING command was this:  SUBSTRING(“the inner select statement”, 2, 10000).  The “2” as the starting position got me past the first comma, and there was no trailing comma to worry about, since each city was preceded by a comma rather than followed by a comma.  So, the final query looks like this:


SELECT  State.Name,
        SUBSTRING((SELECT ', ' +  City.Name AS [text()]
                   FROM City
                   WHERE City.StateID = State.StateID  && City.Population > 100000
                   FOR XML Path("))), 2, 10000) AS LargeCities
FROM State

and would give the following results:

State Name      LargeCities
Colorado               Denver, Boulder, Pueblo, Colorado Springs

This strategy could be applied to regular code as well when building a comma-separated list.


string list = "";
for (int i = 0; i < items.count; i++)
{
   list += ", " + items[i].Name;
}
list = list..Substring(2);

Unexpected benefits of Precompilation of LINQ

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine

I once had a manager who told me – I can solve any maintenance problem by adding a layer of abstraction.  I can solve any performance problem by removing a layer of abstraction.

I think LINQ to SQL is a wonderful way to abstract the persistence layer elegant, easy to use, easy to manipulate, and easy to maintain lines of code.  Instead of writing SQL which amounts to “how to retrieve” the data – you manipulate an expression tree that gets closer to specifying “what data I want”.  The upside of this is huge – you can change the expression tree at any level of your code, and let .NET decide how to best write the SQL at the last possible moment – which effectively gits rid of dealing with intermediate results and inefficiently written SQL.  Unfortunately, this abstraction does indeed cause a performance hit – the translation/compilation of the tree to SQL – and it’s probably much bigger than you would think.  See http://peterkellner.net/2009/05/06/linq-to-sql-slow-performance-compilequery-critical/ to see what I mean.  In my analysis (using ANTS Profiler), when using uncompiled LINQ – the performance hit is usually about 80% compilation and only 20% retrieving the data!  Thankfully, .NET does allow you to precompile a LINQ query and save the compilation to use over and over again.

Your natural tendency when hearing those kind of numbers might be to precompile every single LINQ query you write.  There’s a big downside to doing that, though – you lose the ability to manipulate the compiled query in other parts of your code.  Another downside is that the precompilation code itself is fairly ugly and hard to read/maintain.

I’m a big believer in avoiding “premature optimization”.  What happens if you precompile everything, and in a version or two Microsoft resolves the issue and caches compilations for you behind the scenes?  You have written a ton of ugly code that breaks a major benefit of LINQ to SQL and is totally unnecessary.

Instead, I recommend you go after the low hanging fruit first – precompile the most frequently accessed queries in your application and the ones that gain no benefit from manipulating the expression tree.  In the applications I work on – there is a perfect case that fits both of these – the “get” method that returns the LINQ object representation of a single row in the database.  These are hit quite often – and there is absolutely no case where the expression tree is further refined.

The old way it was written:

	public static Item Get(int itemid) {
		return (from i in DataContext.Items
			where i.ItemID == itemid
		select i).First();
	}

The new way with Precompiled LINQ:

	private static Func<ModelDataContext, int, Item>
		GetQuery = CompiledQuery.Compile(
			(ModelDataContext DataContext, int itemid) =>

				(from i in DataContext.Items
					where i.ItemID == itemid
				select i).First()

				);

	public static Item Get(int itemid) {
		return GetQuery.Invoke(DataContext,itemid);
	}

Applying this fairly simple change to the application, I’d estimate we got 80%+ of the benefits of compiled LINQ, at the expense of a few extra lines of code per object/table and absolutely no loss of the tree manipulation.