Monday, February 27, 2006

Re: st: RE: help cleaning string variable

Great! That's exactly what I needed. Thank you so much Jennifer. Best, Mario

---- Original message ---- >Date: Mon, 27 Feb 2006 16:48:03 -0800 >From: "Marino, Jennifer" <> >Subject: st: RE: help cleaning string variable >To: <> > >I don't know if it's necessary in Stata 9 - might have been put into the >official egen package if it was used enough - but for Stata 8 the >fabulous ado package -egenmore-, by Dr. Cox, has a tailor- made option >for egen called "sieve": > >Excerpt from the helpfile: > >sieve(strvar) , { keep(classes) | char(chars) | omit (chars) } > selects characters from strvar according to a specified criterion > and generates a new string variable containing only those >characters. > This may be done in three ways. First, characters are classified >using > the keywords alphabetic (any of a-z or A-Z), numeric (any of 0-9), > space or other. keep() specifies one or more of those classes: > keywords may be abbreviated by as little as one letter. Thus keep(a >n) > selects alphabetic and numeric characters and omits spaces and other > > characters. Note that keywords must be separated by spaces. >Alternatively, > char() specifies each character to be selected or omit () specifies >each > character to be omitted. Thus char(0123456789.) selects numeric > characters and the stop (presumably as decimal point); omit(" ") >strips > spaces and omit(`"""') strips double quotes. (Stata 7 required.) > >Hope that helps. >Jen > > >-----Original Message----- >From: >[] On Behalf Of Mario Macis >Sent: Monday, February 27, 2006 1:44 PM >To: >Subject: st: help cleaning string variable > > >Dear statalist users, >I need to clean a string variable containing the names of a large number >of firms (over 30,000). In many cases these names contain extra >characters that I would like to eliminate, such as % or " or ^. These >characters always come at the beginning of the name. I know that Stata >has a command (trim) that eliminates leading and trailing blank spaces >from string variables. Is there a similar command to eliminate leading >"undesired" characters? Thank you so much for your help. Best, Mario > >-- >Mario Macis >PhD Candidate >Department of Economics >University of Chicago >* >* For searches and help try: >* >* >* > >* >* For searches and help try: >* >* >*

-- Mario Macis PhD Candidate Department of Economics University of Chicago


