在Google查看时见到这一统计字数网址,感觉有意思来玩下SQL方法
https://www.ifreesite.com/wordcount/
在这个官网查询字时速表范畴~
https://blog.miniasp.com/post/2019/01/02/Common-Regex-patterns-for-Unicode-characters
获知要的字码范畴以下:
汉语路段 19968~40959
英文大写 65~90
英语小写字母 97~122
数据 48~57
懒得理你用哪一个SQL@@~我是应用MSSQL
declare @Tmp table( StrTmp nvarchar(max) ) insert into @Tmp values(N"你清楚吗?中秋佳节是2021/09/21(二),但2021/09/20(一)也是放假了日唷") ,(N"Do you know that Mid-Autumn Festival is 2021/09/21 (Tuesday), but 2021/09/20 (Monday) is also a holiday.")
我的方法是那样~
用Unicode序号查范畴~
select CharType ,Count(0) Num from ( select ( case when UNICODE(SUBSTRING(StrTmp,Sort,1)) between 65 and 90 or UNICODE(SUBSTRING(StrTmp,Sort,1)) between 97 and 122 then "英语" when UNICODE(SUBSTRING(StrTmp,Sort,1)) between 19968 and 40959 then "汉语" when UNICODE(SUBSTRING(StrTmp,Sort,1)) between 48 and 57 then "数据" else "别的" end ) CharType from @Tmp ,( select Row_Number()Over(order by [number]) as Sort from master..spt_values ) as k where Sort <= len(StrTmp) ) k group by CharType order by Count(0) desc
後来见到一级屠猪士用规范化~我也改了一下SQL查看方法~
但汉语的规范化带不出来= =||
select CharType ,Count(0) Num from ( select ( case when PATINDEX("[a-zA-Z]",SUBSTRING(StrTmp,Sort,1)) > 0 then "英语" when UNICODE(SUBSTRING(StrTmp,Sort,1)) between 19968 and 40959 then "汉语" when PATINDEX("[0-9]",SUBSTRING(StrTmp,Sort,1)) > 0 then "数据" else "别的" end ) CharType from @Tmp ,( select Row_Number()Over(order by [number]) as Sort from master..spt_values ) as k where Sort <= len(StrTmp) ) k group by CharType order by Count(0) desc