Wednesday, October 06, 2004


Avoiding Spam

Avoiding spam-bots
by Ralph Arvesen

Spam-bots scan the web and harvest email addresses from web pages, news groups, and other sources. This article shows you a simple technique you can use in web pages to avoid spam-bots. The idea is used in the FotoVision sample I created, but I thought it would be useful to discuss this particular piece outside of the FotoVision sample. The idea is pretty simple; instead of storing the real email address in the HTML, an encoded version of the address is stored and decoded on the client when necessary.

By BrainJar

CAPTCHA stands for "completely automated public Turing test to tell computers and humans apart." What it means is, a program that can tell humans from machines using some type of generated test. A test most people can easily pass but a computer program cannot.

You've probably encountered such tests when signing up for an online email or forum account. The form might include an image of distorted text, like that seen above, which you are required to type into a text field.

The idea is to prevent spammers from using web bots to automatically post form data in order to create email accounts (for sending spam) or to submit feedback comments or guestbook entries containing spam messages. The text in the image is usually distorted to prevent the use of OCR (optical character reader) software to defeat the process. Hotmail, PayPal, Yahoo and a number of blog sites have employed this technique.

This article demonstrates how to create such an image and employ it within an ASP.NET web form.

A Movable Type plugin to eradicate
comment and trackback spam

Learning Movable Type: Concerning Spam

Update September 1, 2004: Jay Allen's MT-Blacklist now works with MT3.1, though not all of the kinks are completely worked out of it.

Spammers have discovered bloggers and sooner or later if you allow comments, trackback pings, or the Movable Type send-entry form on your weblog you will get spammed.

Weblog spam appears in many flavors:

1) Basic comment spam. The spammer leaves a short uneventful message in a comment field in one of your entries. The spam comes from the URL placed in the comments URL field. These URLs link back to every conceivable scam. The spammers leave URLs here to create a link from your site to theirs, thus increasing their Google ranking. Spammers are also now linking to legitimate sites that have not cleared their pages of comment spam, thus increasing the Google rank of those spam links. This all goes to show you that you really do need to check the links of anyone who leaves a comment on your site.

2) Trackback Spam. Spammers have discovered how to take advantage of Trackback. TrackBack spam is very similar to comment spam. The spammer sends TrackBack pings to your site that direct viewers to a totally unrelated URL.

3) Comment flooding. The spammer uses an automated computer bot to flood your blog with spam messages, up to hundreds in an hour. The spammer doesn't necessarily leave a URL, but leaves garbage messages, almost like a graffiti artist.

4) Referral spam. The spammer links to your site from their site, and then pings your site through their link, thus creating a reference and link to their site on the statistics referral log of your website. When you are reviewing your stats and see the reference to an odd site (ex. Paris Hilton), clicking on the link takes you to their site. Many people list "referrals" on their site publicly, so by spamming referral logs, not only does the spammer get a link on your referral log (which is picked up by Google) but may even get a link on your main page.

5) Send-mail spam. If you are using MT's "send entry" — a form to send an email of your entry to a friend — the spammer uses your mt_send_entry.cgi script to send spam or viruses to others using your email address in the return field. You can tell that this might be happening if you start getting rejected emails bounced back to you that you never sent in the first place. There was a vulnerability in earlier versions of MT that allowed this to happen.

How do you fight spam on your blog?

MT3 offers Typekey authentication that gives you more control over who can comment on your blog. If you are using MT2.661 or an earlier version however, Typekey is not an option. Movable Type does offer the ability to ban comments from certain IP addresses. But most spammers use dynamic addresses that change all the time. However, spam can be fought a variety of different ways.

Highly Recommended Measures:

1) MT-Blacklist Jay Allen's MT-Blacklist Plugin is your first form of defense. Once installed, MT-blacklist checks comments and trackbacks against a known list of spam URLs. If a comment or trackback contains one of these URLs, the comment is blocked before it ever appears on your site. If you get a comment with spam that is not already listed, when you receive an email notification of the comment, you can click a link to invoke MT-blacklist, remove the comment, and add the commenter's URL to your blacklist. There is a master blacklist that is maintained by Jay Allen and contributed to by hundreds of MT bloggers. You can update your own blacklist with the listings from the community blacklist. You can also use MT-blacklist to screen content and block comments for use of foul language. See Jay's special instructions for dealing with trackback spam if you are using Blacklist version 1.64.

The plugin is easy to install and use. If you are using MT3.1, you can get the Blacklist plugin from the Plugin Pack. An earlier version of the blacklist, MT-Blacklist v1.64, will work with MT version 2.661 and you can get it from Jay Allen's website.

2) Rename mt-comments.cgi. Spammers find MT sites by searching for mt-comments.cgi in Google. Rename the file and your site will be harder for them to find. To do this, copy the mt-comments.cgi script to a new name, ending with the .cgi suffix. Edit the mt.cfg file to reflect the new name. To edit the mt.cfg file, find these lines of code:

# CommentScript
# TrackbackScript
# SearchScript
# XMLRPCScript
# ViewScript
Remove the # pound sign from the CommentScript line and replace "" with your renamed comments .cgi file. Save and upload your adjusted mt.cfg file. Then delete the old script. Remember to set the permissions of the new .cgi file to 755. Make sure any references to the old cgi file are updated on your templates. Rebuild your blog files.

Repeat this process with mt-tb.cgi to protect against TrackBack spam.

3) Don't use popup comments. Don't link to the comment scripts from the main page of your site. In the default MT2.x templates, the MT main index template links to the comment popup script with the text Comments (n), which is pretty easy to scan for by spammers. Instead of invoking a comment popup, which is the default in the main index:

a href="<$MTCGIPath$><$MTCommentScript$>?entry_id=<$MTEntryID$>" onclick="OpenComments(this.href); return false">Comments (<$MTEntryCommentCount$>)/a>
replace that code with the following:

a href="<$MTEntryPermalink$>#comments">Comments (<$MTEntryCommentCount$>)/a>
and link to the comments section of the individual entry.

Shifting your trackbacks from popups to inline can help with trackback spam. See the LMT note on Trackback Spam for details.

4)Change the name of mt-send-entry.cgi. If you are using MT2.x, make sure you have upgraded to MT2.661. There is a vulnerability that was addressed in later versions of MT, but I understand that it is still a good idea to change the name of the cgi. Most spammers who are spoofing your email address are using automated bots to find this script name. If you change the name of the script, you dissuade 99% of these spammers. In your CGI bin change the name of this file to something else, also ending with the .cgi suffix. In your individual entry archive template, or any template where you have the send entry form, change the name of this line

form method="post" action="<$MTCGIPath$>mt-send-entry.cgi">

form method="post" action="<$MTCGIPath$>the-new-name-of-this-file.cgi">
and rebuild your files.

Additional Measures:

5) IP Banning. Movable Type allows you to ban comments from identified IP addresses. In the Edit Comment window, above the author box you can find the author's IP address. Copy this address and paste it into the IP Banning section of your Weblog Configuration editing window. Note that many spammers and trolls use dynamic IP addresses, in which case this method won't necessarily be effective. However, you can use this method to prevent repeat spams from more unsophisticated posters. See the MT Manual Section on IP Banning for more information. The downside of IP banning is that by banning one IP address (AOL's for example) you may be banning many legitimate posters. See Jay Allen's discussion of this here.

6) Force "preview" before allowing comment submissions. Forcing site visitors to preview their comments before submitting them will not only give you more error-free comments, but will put yet another hurdle up against automatic comment spam bots. Just remove this line of code:

input style="font-weight: bold;" type="submit" name="post" value=" Post " />
from your Individual Entry Archive and your Comment Listing Template.

7) Use a "Captcha". A captcha is a security code that a commenter must enter in order for her comment to load. The benefit is that it screens out automated comment spam bots. The downside is that it keeps visually disabled people from contributing a comment. James Seng has posted a captcha security plugin for Movable Type.

8) Require approval before a comment posts. One way to ensure that your readers never have to see a spam message is that you personally approve comments before they are posted. If you have a low comment volume site, this may be viable option. For MT 2.661 users, Scripty goddess has posted a script/MT hack to do this. MT3 has sophisticated comment moderation features built right in.

9) Fight referral spam by ammending .htaccess file. Referral spam is annoying, but it doesn't affect the public display of your site unless you are publishing your referral log. If it bothers you enough that spam companies are benefiting by creating backlinks to their sites on your referral logs, you can ammend your .htaccess file (see What is .htaccess?) with the following code:

SetEnvIfNoCase Referer ".*(casino|gambling|poker|porn|sex|nude|xxx|hilton|pics|video).*" BadReferrer order deny,allow deny from env=BadReferrer

10) Close old comments. David Raynes' plugin allows you to close entries to comments for any entries older than a given number of days (defaults to five).

Breaking a Visual CAPTCHA

This is the homepage of the Shape Contexts based approach to break Gimpy, the CAPTCHA test used at Yahoo! to screen out bots. Our method can successfully pass that test 92% of the time. See EZ-Gimpy in action at Yahoo! The approach we take uses general purpose algorithms that have been designed for generic object recognition. The same basic ideas have been applied to finding people in images, matching handwritten digits, and recognizing 3D objects.

Mitigating Denial of Service Attacks with Web Resources


Another interesting security discussion from our on-going security push and the saga around finding a solution that's worth sharing (and hopefully reading)... this time it's about Web Resources which I had blogged about here.

As background for this, recall that the URL for a Web resource takes the form: WebResource.axd?a=MyControls&r=Bold.gif&t=632059604175183419

The interesting parameter is "t". This parameter is the timestamp of the assembly containing the resource. We add this parameter to the URL, so that rebuilding the assembly generates a new URL, causing the browser to ignore what it has cached, and fetch the new data. We aggressively cache for best performance by telling the browser to cache the data and never check for an updated version. Then we vary the output cache by the "t" parameter on the server.

Unfortunately however, this means that some malicious person out there could be simulating requests with random "t" values, and we'd just fill up the cache. Therein lay the denial of service (DOS) attack. Though these are somewhat hard to deal with, we didn't want to contribute to the problem.

So the first thing that we could do was validate that the incoming "t" value matched the timestamp of the assembly or returning an error status code otherwise. This takes care of the DOS attack, but introduces a new problem. The timestamp of the assembly on each machine on a Web farm isn't the same. So this simply doesn't work.

The next solution was to drop the "t" and put in assembly version. This actually got checked in... but it actually breaks the development scenario (oops!), where the assembly version isn't updated every build (nor should it be). In fact, assembly versions typically don't change even when a patch is put out. For example, the assembly version of System.Web doesn't change between QFEs. So this change obviously had to be backed out, but before making anymore changes, we had to ensure we had a solution that worked.

To lay down the requirements: We needed a "t"-like parameter so the browser's cache model would work. This also enabled the control development scenario really well, wherein changes were automatically picked up when the resource was modified. Pretty compelling and not something to give up on just yet. But we also didn't want to vary the cache per "t", since that led to the security hole.

A little discussion actually led us to realize a solution that was actually pretty easy, and was probably staring at us for the longest time. Just needed a few folks from the team to gather around and have a focused discussion on the solution. By default we were storing the cached data in disk-output cache, whose lifetime went beyond the lifetime of an app-domain. The "t" was allowing us to work against the disk-based output cache. What we simply needed to do was turn off disk-based output cache, and rely on in-memory cache alone (we still get the full benefits of the cache, including kernel-mode caching, so we aren't losing much). Now, whenever the bin directory was updated causing a restart or the application was explicitly restarted to pick up GAC changes, the in-memory cache would be cleaned up, and new content would be read from the assembly resource on-demand. To finish off the fix, we just needed to drop the version parameter, and bring back "t" with its prior semantics sans the vary-by behavior.

That takes care of denial of service attacks and retains performance. Just what we were looking for!

As a side note, watch out what you're caching in your own application as well, and make sure you're not susceptible to attacks that make you flood the cache with spurious entries.

An ASP.NET Framework for Human Interactive Proofs
Stephen Toub

To summarize, we have spear and we have shield. What we can do is to watch their evolution!


By 蝈蝈俊.net






1980年,为了使每一个汉字有一个全国统一的代码,我国颁布了第一个汉字编码的国家标准: GB2312-80《信息交换用汉字编码字符集》基本集,简称GB2312,这个字符集是我国中文信息处理技术的发展基础,也是国内所有汉字系统的统一标准。到了后来又公布了国家标准GB18030-2000《信息交换用汉字编码字符集基本集的扩充》,简称GB18030,编程时如果涉及到编码和本地化的朋友应该对GB18030很熟悉。这是是我国继GB2312-1980和GB13000-1993之后最重要的汉字编码标准,同时也是未来我国计算机系统必须遵循的基础性标准之一。


其实这两个表是同一回事,只不过一个使用十六进制分区表示,一个使用区位所在的数字位置表示。 例如"好"字的十六进制区位码是ba c3,前两位是区域,后两位代表位置,ba处在第26区,"好"处在此区汉字的第35位也就是c3位置,所以数字代码就是2635。这就是GB2312汉字区位原理。根据《汉字区位码表 》我们可以发现第15区也就是AF区以前都没有汉字,只有少量符号,汉字都从第16区B0开始,这就是为什么GB2312字符集都是从16区开始的。


Encoding.GetBytes ()方法将指定的 String 或字符数组的全部或部分内容编码为字节数组
Encoding.GetString ()方法将指定字节数组解码为字符串。


Encoding gb=System.Text.Encoding.GetEncoding("gb2312");
object[] bytes=gb.Encoding.GetBytes ("好");


string lowCode = System.Convert.ToString(bytes[0], 16); //取出元素1编码内容(两位16进制)
string hightCode = System.Convert.ToString(bytes[1], 16);//取出元素2编码内容(两位16进制)

因此我们就可以随机生成一个长度为2的十六进制字节数组,使用GetString ()方法对其进行解码就可以得到汉字字符了。不过对于生成中文汉字验证码来说,因为第15区也就是AF区以前都没有汉字,只有少量符号,汉字都从第16区B0开始,并且从区位D7开始以后的汉字都是和很难见到的繁杂汉字,所以这些都要排出掉。所以随机生成的汉字十六进制区位码第1位范围在B、C、D之间,如果第1位是D的话,第2位区位码就不能是7以后的十六进制数。在来看看区位码表发现每区的第一个位置和最后一个位置都是空的,没有汉字,因此随机生成的区位码第3位如果是A的话,第4位就不能是0;第3位如果是F的话,第4位就不能是F。


using System;
using System.Text;

namespace ConsoleApplication
class ChineseCode
public static void Main()
Encoding gb=Encoding.GetEncoding("gb2312");

object[] bytes=CreateRegionCode(4);

string str1=gb.GetString((byte[])Convert.ChangeType(bytes[0], typeof(byte[])));
string str2=gb.GetString((byte[])Convert.ChangeType(bytes[1], typeof(byte[])));
string str3=gb.GetString((byte[])Convert.ChangeType(bytes[2], typeof(byte[])));
string str4=gb.GetString((byte[])Convert.ChangeType(bytes[3], typeof(byte[])));

   Console.WriteLine(str1 + str2 +str3 +str4);

public static object[] CreateRegionCode(int strlength)
string[] rBase=new String [16]{"0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f"};

Random rnd=new Random();

object[] bytes=new object[strlength];

for(int i=0;i {
int r1=rnd.Next(11,14);
string str_r1=rBase[r1].Trim();

rnd=new Random(r1*unchecked((int)DateTime.Now.Ticks)+i);//更换随机数发生器的种子避免产生重复值
int r2;
if (r1==13)
string str_r2=rBase[r2].Trim();

rnd=new Random(r2*unchecked((int)DateTime.Now.Ticks)+i);
int r3=rnd.Next(10,16);
string str_r3=rBase[r3].Trim();

rnd=new Random(r3*unchecked((int)DateTime.Now.Ticks)+i);
int r4;
if (r3==10)
else if (r3==15)
string str_r4=rBase[r4].Trim();

byte byte1=Convert.ToByte(str_r1 + str_r2,16);
byte byte2=Convert.ToByte(str_r3 + str_r4,16);
byte[] str_r=new byte[]{byte1,byte2};



return bytes;

实现了随机生成汉字后,就可以使用.NET GDI来绘制自己需要的验证码图形了。具体的怎样生成验证码图片,以及改变其中字符的长和宽等效果网上已经有很多相关的文章,这里由于篇幅就不再介绍了。不过有一点要说明的是以上代码在中文版的Windows下才能运行,因为它带有GB的字符集,如果你是其他语言的操作系统,就需要安装GB字符集了。

2.1 以一个不常被看到的字体显示内容,
2.2 字体随机倾斜显示
2.3 每个字随机显示不同颜色
2.4 内容随机显示在不同位置。
2.5 文字采用就渐变颜色,同一个字就有好几种颜色组成。



相关文章 验证码(C#)

/* Copyright all(c) 2005 ZhongFeng, */
public class ValidateCode : System.Web.UI.Page
private void Page_Load(object sender, System.EventArgs e)

#region Web 窗体设计器生成的代码
override protected void OnInit(EventArgs e)
// CODEGEN: 该调用是 ASP.NET Web 窗体设计器所必需的。

/// 设计器支持所需的方法 - 不要使用代码编辑器修改
/// 此方法的内容。

private void InitializeComponent()
this.Load += new System.EventHandler(this.Page_Load);

private string GenerateCheckCode()
int number;
char code;
string checkCode = String.Empty;

System.Random random = new Random();

for(int i=0; i<5; i++)
number = random.Next();

if(number % 2 == 0)
code = (char)('0' + (char)(number % 10));
code = (char)('A' + (char)(number % 26));

checkCode += code.ToString();

Response.Cookies.Add(new HttpCookie("CheckCode", checkCode));

return checkCode;

private void CreateCheckCodeImage(string checkCode)
if(checkCode == null || checkCode.Trim() == String.Empty)

System.Drawing.Bitmap image = new System.Drawing.Bitmap((int)Math.Ceiling((checkCode.Length * 12.5)), 22);
Graphics g = Graphics.FromImage(image);

Random random = new Random();


for(int i=0; i<25; i++)
int x1 = random.Next(image.Width);
int x2 = random.Next(image.Width);
int y1 = random.Next(image.Height);
int y2 = random.Next(image.Height);

g.DrawLine(new Pen(Color.Silver), x1, y1, x2, y2);

Font font = new System.Drawing.Font("Arial", 12, (System.Drawing.FontStyle.Bold | System.Drawing.FontStyle.Italic));
System.Drawing.Drawing2D.LinearGradientBrush brush = new System.Drawing.Drawing2D.LinearGradientBrush(new Rectangle(0, 0, image.Width, image.Height), Color.Blue, Color.DarkRed, 1.2f, true);
g.DrawString(checkCode, font, brush, 2, 2);

for(int i=0; i<100; i++)
int x = random.Next(image.Width);
int y = random.Next(image.Height);

image.SetPixel(x, y, Color.FromArgb(random.Next()));

g.DrawRectangle(new Pen(Color.Silver), 0, 0, image.Width - 1, image.Height - 1);

System.IO.MemoryStream ms = new System.IO.MemoryStream();
image.Save(ms, System.Drawing.Imaging.ImageFormat.Gif);
Response.ContentType = "image/Gif";

  假如以上验证码生成器页面名为:CheckCode.aspx,那么在登录页面中使用“” 这个 HTML 元素来显示生成的验证码图片:

private void btnLogin_Click(object sender, System.Web.UI.ImageClickEventArgs e)
if(Request.Cookies["CheckCode"] == null)
lblMessage.Text = "您的浏览器设置已被禁用 Cookies,您必须设置浏览器允许使用 Cookies 选项后才能使用本系统。";
lblMessage.Visible = true;

if(String.Compare(Request.Cookies["CheckCode"].Value, txtCheckCode.Text, true) != 0)
lblMessage.Text = "验证码错误,请输入正确的验证码。";
lblMessage.Visible = true;

/***** 其他代码 *****/


相关文章:ASP 生成 彩色 可变长 验证码程序

Asp.Net 动态生成验证码

象 DEV-Club 那样的彩色校验码



region 实现一个验证码的类




Another short code segment

(from me: all these code segements are simple and quite strightforward)

<< Home

This page is powered by Blogger. Isn't yours?