areal's profileiamcrfBlogLists Tools Help

Blog


    June 25

    快速排序vs.冒泡排序

    最近两个星期的时间,我被chinese gigaword corpus的处理难住了:我原来的程序居然不能正确处理超过一亿汉字的文本,每次都在几分钟甚至几个小时后崩溃。硬邦邦的linux只会报告segment fault。不知道问题出在哪里。
    昨天的时候灵机一动,会不会是数据太大,导致表示指标的数据类型越界了?看了下,果然是用的是int型,改成unsigned long型,居然跑通了几个。继续检查剩下的几个segment fault,发现问题出在快速排序那一段,因为用了递归函数,不知道哪里有问题,今天临时改为冒泡排序,再也没有segment fault了。但是出了新问题:到现在为止,一个原来15分钟quick sort掉的样本集(包含1.2亿汉字),几个小时都没有冒泡完。

    Comments (7)

    Please wait...
    Sorry, the comment you entered is too long. Please shorten it.
    You didn't enter anything. Please try again.
    Sorry, we can't add your comment right now. Please try again later.
    To add a comment, you need permission from your parent. Ask for permission
    Your parent has turned off comments.
    Sorry, we can't delete your comment right now. Please try again later.
    You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
    Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
    Complete the security check below to finish leaving your comment.
    The characters you type in the security check must match the characters in the picture or audio.

    To add a comment, sign in with your Windows Live ID (if you use Hotmail, Messenger, or Xbox LIVE, you have a Windows Live ID). Sign in


    Don't have a Windows Live ID? Sign up

    Qing Leiwrote:
    归并排序
    July 18
    lilywrote:
    可不可以把您的blog更新得频繁一些啊!
    July 13
    lilywrote:
    支持areal!
    July 13
    script语言可以加快文字处理,但是最大的问题就是会变懒。
    June 29
    沫南 李wrote:
    排序本身也是格式转换的一部分啊,晕。使用script语言可以在某些任务,可以极大的提升开发效率
    June 26
    arealwrote:
    不说了吗,排序,不是格式问题。另外,我不喜欢任何script语言。
    June 26
    沫南 李wrote:
    你要做什么啊?如果仅仅是语料的格式转换,推荐python,应该没这个问题
    June 26

    Trackbacks

    The trackback URL for this entry is:
    http://cwseg.spaces.live.com/blog/cns!379FC86001B7891D!450.trak
    Weblogs that reference this entry
    • None