关于golang 单测

Posted on 2021/01/072021/01/07 by Arts

# 单测定义：
1 .原则：

- 单元测试文件名必须以 xxx_test.go 命名
- 方法必须是 TestXxx 开头，建议风格保持一致：驼峰，XXX标识需要测试的函数名
- 方法参数必须 t *testing.T
- 测试文件和被测试文件必须在一个包中
- 优先核心函数热点工具类函数
- 写明每个单测的注释，单测作用，比如：
      测试用例 1：输入 4，输出 2。
      测试用例 2：输入-1，输出 0。

2 .框架使用
- GoConvey 和其他框架的兼容性较好，可直接在终端窗口和浏览器上使用，自带大量的标准断言函数，可以管理和运行测试用例
- goMonkey 在运行时通过汇编语句重写可执行文件，将待打桩函数或方法的实现跳转到桩实现，原理和热补丁类似。
      通过 Monkey，我们可以解决函数或方法的打桩问题，但 Monkey 不是线程安全的，不要将 Monkey 用于并发的测试中
      可以为全局变量、函数、过程、方法打桩，同时避免了gostub对代码的侵入
      特性列表：
      支持为一个函数打一个桩
      支持为一个成员方法打一个桩
      支持为一个接口打一个桩
      支持为一个全局变量打一个桩
      支持为一个函数变量打一个桩
      支持为一个函数打一个特定的桩序列
      支持为一个成员方法打一个特定的桩序列
      支持为一个函数变量打一个特定的桩序列
      支持为一个接口打一个特定的桩序列
缺陷：
      对inline函数打桩无效
      不支持多次调用桩函数（方法）而呈现不同行为的复杂情况

- GoMock 是由 Golang 官方开发维护的测试框架，实现了较为完整的基于 interface 的 Mock 功能，能够与 Golang 内置的 testing 包良好集成，也能用于其它的测试环境中。
      GoMock 测试框架包含了 GoMock 包和 mockgen 工具两部分，其中 GoMock 包完成对桩对象生命周期的管理，mockgen 工具用来生成 interface 对应的 Mock 类源文件
缺陷：
      只有以接口定义的方法才能mock，需要用mockgen生成源文件，然后用gomock去实现自己想要的数据，用法稍重。

- gostub 可以为全局变量、函数、过程打桩，比gomock轻量，不需要依赖接口
缺陷：
      对项目源代码有侵入性，即被打桩方法必须赋值给一个变量，只有以这种形式定义的方法才能别打桩，gostub 由于方法的mock 还必须声明出 variable 才能进行mock，即使是 interface method 也需要这么来定义，不是很方便



3 .使用goconvey+gomonkey进行测试

- 外层框架——goconvey。项目代码很多逻辑比较复杂，需要编写不同情况下的测试用例，用goconvey组织的测试代码逻辑层次比较清晰，有着较好的可读性和可维护性。断言方面感觉convey和testify功能差不多。不过convey没有testify社区活跃度高，后续使用convey时碰到一些问题，都不太容易找到解决办法
- 函数mock——gomonkey。项目代码基本都不是基于interface实现的，所以不太方便使用gomock，项目目前运行稳定，所以也不想因为单元测试重构原来的代码，所以也不太方便gostub,基本符合我们对函数打桩的需求。

- 持久层mock——sqlmock。我们持久层的框架是gorm。当时考虑2种方法进行mock，一种是使用gomonkey对gorm的函数进行mock，另一种则是选用sqlmock。如果使用gomonkey的话需要对连续调用的gorm函数都进行mock，过于繁杂。而用sqlmock的话只需匹配对应的sql语句即可

4 .使用
 
&lt;pre class="lang:zsh decode:true " >
安装
- go get github.com/smartystreets/goconvey
- go install github.com/smartystreets/goconvey

运行：

./goconvey.exe

页面访问： http://127.0.0.1:8080
&lt;/pre> 


样例：
&lt;pre class="lang:go decode:true " >
package goconvey

import (
    "errors"
)

func Add(a, b int) int {
    return a + b
}

func Subtract(a, b int) int {
    return a - b
}

func Multiply(a, b int) int {
    return a * b
}

func Division(a, b int) (int, error) {
    if b == 0 {
        return 0, errors.New("被除数不能为 0")
    }
    return a / b, nil
}


package goconvey

import (
    "testing"

    . "github.com/smartystreets/goconvey/convey"
)

func TestAdd(t *testing.T) {
    Convey("将两数相加", t, func() {
        So(Add(1, 2), ShouldEqual, 3)
    })
}

func TestSubtract(t *testing.T) {
    Convey("将两数相减", t, func() {
        So(Subtract(1, 2), ShouldEqual, -1)
    })
}

func TestMultiply(t *testing.T) {
    Convey("将两数相乘", t, func() {
        So(Multiply(3, 2), ShouldEqual, 6)
    })
}

func TestDivision(t *testing.T) {
    Convey("将两数相除", t, func() {

      //patch

        Convey("除以非 0 数", func() {
            num, err := Division(10, 2)
            So(err, ShouldBeNil)
            So(num, ShouldEqual, 5)
        })

        Convey("除以 0", func() {
            _, err := Division(10, 0)
            So(err, ShouldNotBeNil)
        })
    })
}
&lt;/pre>

5 .断言函数

&lt;pre class="lang:go decode:true " >
General Equality //通用比较

So(thing1, ShouldEqual, thing2)                 //相等
So(thing1, ShouldNotEqual, thing2)        //不等
So(thing1, ShouldResemble, thing2)        // a deep equals for arrays, slices, maps, and structs
So(thing1, ShouldNotResemble, thing2)     //深度比较不相等
So(thing1, ShouldPointTo, thing2)         //地址指向
So(thing1, ShouldNotPointTo, thing2)      //地址不是指向
So(thing1, ShouldBeNil)                         //等于 nil
So(thing1, ShouldNotBeNil)                      //不等于 nil
So(thing1, ShouldBeTrue)                        //等于true
So(thing1, ShouldBeFalse)                       //等于false
So(thing1, ShouldBeZeroValue)             //等于0值

Numeric Quantity comparison   //数值比较
So(1, ShouldBeGreaterThan, 0)                         //大于
So(1, ShouldBeGreaterThanOrEqualTo, 0)                //大于等于
So(1, ShouldBeLessThan, 2)                                  //小于
So(1, ShouldBeLessThanOrEqualTo, 2)                   //小于等于
So(1.1, ShouldBeBetween, .8, 1.2)                     //区间内
So(1.1, ShouldNotBeBetween, 2, 3)                     //不在区间内
So(1.1, ShouldBeBetweenOrEqual, .9, 1.1)        //区间取上下线
So(1.1, ShouldNotBeBetweenOrEqual, 1000, 2000)  //不再区间
So(1.0, ShouldAlmostEqual, 0.99999999, .0001)   // 容差比较，允许多的误差 tolerance is optional; default 0.0000000001
So(1.0, ShouldNotAlmostEqual, 0.9, .0001)       //容差比较，不允许多少的误差

Collections       //内建类型比较
So([]int{2, 4, 6}, ShouldContain, 4)                  //包含
So([]int{2, 4, 6}, ShouldNotContain, 5)               //不包含
So(4, ShouldBeIn, ...[]int{2, 4, 6})                  //在列表内
So(4, ShouldNotBeIn, ...[]int{1, 3, 5})               //不在列表内
So([]int{}, ShouldBeEmpty)                                  //空列表
So([]int{1}, ShouldNotBeEmpty)                              //非空列表
So(map[string]string{"a": "b"}, ShouldContainKey, "a")      //map 包含key
So(map[string]string{"a": "b"}, ShouldNotContainKey, "b")   //map不包含key
So(map[string]string{"a": "b"}, ShouldNotBeEmpty)                 //非空map
So(map[string]string{}, ShouldBeEmpty)                                  //空列表
So(map[string]string{"a": "b"}, ShouldHaveLength, 1) //长度 supports map, slice, chan, and string

Strings                                                                 //字符串比较
So("asdf", ShouldStartWith, "as")                     //以某字符开头
So("asdf", ShouldNotStartWith, "df")                  //不是以某字符串开头
So("asdf", ShouldEndWith, "df")                             //以某字符串结尾
So("asdf", ShouldNotEndWith, "df")                    //不是以某字符串结尾
So("asdf", ShouldContainSubstring, "sd")        //包含子串
So("asdf", ShouldNotContainSubstring, "er")           //不包含子串
So("adsf", ShouldBeBlank)                                   //空字符
So("asdf", ShouldNotBeBlank)                          //非空字符

panic                                                                   //panic断言
So(func(), ShouldPanic)                                     //发送panic
So(func(), ShouldNotPanic)                                  //没有发生panic
So(func(), ShouldPanicWith, "")           //以什么报错发什么 panic or errors.New("something")
So(func(), ShouldNotPanicWith, "")  //不是以某错发生panic or errors.New("something")

Type checking                                               //类型判断
So(1, ShouldHaveSameTypeAs, 0)                        //是否类型相同
So(1, ShouldNotHaveSameTypeAs, "asdf")          //是否类型不相同

time.Time (and time.Duration)                   //时间判断
So(time.Now(), ShouldHappenBefore, time.Now())  //发生前
So(time.Now(), ShouldHappenOnOrBefore, time.Now())    //发生前或者当前时间
So(time.Now(), ShouldHappenAfter, time.Now())         //发生后
So(time.Now(), ShouldHappenOnOrAfter, time.Now())     //发生在之后或者当前时间
So(time.Now(), ShouldHappenBetween, time.Now(), time.Now()) //在某个时间区间
So(time.Now(), ShouldHappenOnOrBetween, time.Now(), time.Now())   //在区间内，并且取边界
So(time.Now(), ShouldNotHappenOnOrBetween, time.Now(), time.Now())      //不相等或者不再区间内
So(time.Now(), ShouldHappenWithin, duration, time.Now())    //以某个时间间隔固定发生
So(time.Now(), ShouldNotHappenWithin, duration, time.Now()) //不是以某时间间隔发生
&lt;/pre>

6 .Mock 方法
- ApplyFunc mock常规函数
&lt;pre class="lang:go decode:true " >
patches := ApplyFunc(GetCmdbInsts, func(dims *models.DimsInfo) ([]Endpoint, error) {
                        return endpointList, nil
                  })
                  defer patches.Reset()
&lt;/pre>

- ApplyMethod mock方法函数
&lt;pre class="lang:go decode:true " >
      var test *ConsistentHashRing
      patches.ApplyMethod(reflect.TypeOf(test),"GetNode", func(_ *ConsistentHashRing,pk string) (string, error) {
            return "", errors.New("get judge node fail")
      })
      defer patches.Reset()
&lt;/pre>


- ApplyGlobalVar mock全局变量
&lt;pre class="lang:go decode:true " >
patches := ApplyGlobalVar(&amp;num, 150)
defer patches.Reset()
&lt;/pre>

- ApplyFuncSeq mock 函数序列桩
&lt;pre class="lang:go decode:true " >
      patches := ApplyFuncSeq(fake.ReadLeaf, outputs)

      defer patches.Reset()
      output, err := fake.ReadLeaf("")
      So(err, ShouldEqual, nil)
      So(output, ShouldEqual, info1)
      output, err = fake.ReadLeaf("")
      So(err, ShouldEqual, nil)
      So(output, ShouldEqual, info2)

&lt;/pre>

- ApplyFuncVar mock 函数变量
&lt;pre class="lang:go decode:true " >
patches := ApplyFuncVar(&amp;fake.Marshal, func (_ interface{}) ([]byte, error) {
      return []byte(str), nil
})// fake.Marshal是函数变量
defer patches.Reset()
&lt;/pre>

- ApplyFuncVarSeq 函数变量序列
&lt;pre class="lang:go decode:true " >
patches := ApplyFuncVarSeq(&amp;fake.Marshal, outputs)
defer patches.Reset()
bytes, err := fake.Marshal("")
So(err, ShouldEqual, nil)
So(string(bytes), ShouldEqual, info1)
bytes, err = fake.Marshal("")
So(err, ShouldEqual, nil)
So(string(bytes), ShouldEqual, info2)
&lt;/pre>

- ApplyMethodSeq mock 成员方法打序列桩
&lt;pre class="lang:go decode:true " >
patches := ApplyMethodSeq(reflect.TypeOf(e), "Retrieve", outputs)
defer patches.Reset()
output, err := e.Retrieve("")
So(err, ShouldEqual, nil)
So(output, ShouldEqual, info1)
&lt;/pre>

- mock 接口打桩,同接口打桩
 
&lt;pre class="lang:go decode:true " >
e := &amp;fake.Etcd{}
info := "hello interface"
patches.ApplyMethod(reflect.TypeOf(e), "Retrieve",
      func(_ *fake.Etcd, _ string) (string, error) {
            return info, nil
      })
output, err := db.Retrieve("")
So(err, ShouldEqual, nil)

&lt;/pre> 


7 .参考链接
- https://mp.weixin.qq.com/s/eAptnygPQcQ5Ex8-6l0byA
- https://www.cnblogs.com/youhui/articles/11265947.html
- https://knapsackpro.com/testing_frameworks/difference_between/goconvey/vs/go-testify
- https://github.com/smartystreets/goconvey
- https://github.com/stretchr/testify/
- https://studygolang.com/topics/2992
- https://geektutu.com/post/quick-gomock.html gomock 的使用
- https://blog.marvel6.cn/2020/01/test-and-mock-db-by-xorm-with-the-help-of-convey-and-sqlmock/ 参考测试XORM
- https://github.com/dche423/dbtest/blob/master/pg/repository_test.go 参考测试gorm

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

# 单测定义：

1 .原则：

- 单元测试文件名必须以 xxx_test.go 命名

- 方法必须是 TestXxx 开头，建议风格保持一致：驼峰，XXX标识需要测试的函数名

- 方法参数必须 t *testing.T

- 测试文件和被测试文件必须在一个包中

- 优先核心函数热点工具类函数

- 写明每个单测的注释，单测作用，比如：

测试用例 1：输入 4，输出 2。

测试用例 2：输入-1，输出 0。

2 .框架使用

- GoConvey 和其他框架的兼容性较好，可直接在终端窗口和浏览器上使用，自带大量的标准断言函数，可以管理和运行测试用例

- goMonkey 在运行时通过汇编语句重写可执行文件，将待打桩函数或方法的实现跳转到桩实现，原理和热补丁类似。

通过 Monkey，我们可以解决函数或方法的打桩问题，但 Monkey 不是线程安全的，不要将 Monkey 用于并发的测试中

可以为全局变量、函数、过程、方法打桩，同时避免了gostub对代码的侵入

特性列表：

支持为一个函数打一个桩

支持为一个成员方法打一个桩

支持为一个接口打一个桩

支持为一个全局变量打一个桩

支持为一个函数变量打一个桩

支持为一个函数打一个特定的桩序列

支持为一个成员方法打一个特定的桩序列

支持为一个函数变量打一个特定的桩序列

支持为一个接口打一个特定的桩序列

缺陷：

对inline函数打桩无效

不支持多次调用桩函数（方法）而呈现不同行为的复杂情况

- GoMock 是由 Golang 官方开发维护的测试框架，实现了较为完整的基于 interface 的 Mock 功能，能够与 Golang 内置的 testing 包良好集成，也能用于其它的测试环境中。

GoMock 测试框架包含了 GoMock 包和 mockgen 工具两部分，其中 GoMock 包完成对桩对象生命周期的管理，mockgen 工具用来生成 interface 对应的 Mock 类源文件

缺陷：

只有以接口定义的方法才能mock，需要用mockgen生成源文件，然后用gomock去实现自己想要的数据，用法稍重。

- gostub 可以为全局变量、函数、过程打桩，比gomock轻量，不需要依赖接口

缺陷：

对项目源代码有侵入性，即被打桩方法必须赋值给一个变量，只有以这种形式定义的方法才能别打桩，gostub 由于方法的mock 还必须声明出 variable 才能进行mock，即使是 interface method 也需要这么来定义，不是很方便

3 .使用goconvey+gomonkey进行测试

- 外层框架——goconvey。项目代码很多逻辑比较复杂，需要编写不同情况下的测试用例，用goconvey组织的测试代码逻辑层次比较清晰，有着较好的可读性和可维护性。断言方面感觉convey和testify功能差不多。不过convey没有testify社区活跃度高，后续使用convey时碰到一些问题，都不太容易找到解决办法

- 函数mock——gomonkey。项目代码基本都不是基于interface实现的，所以不太方便使用gomock，项目目前运行稳定，所以也不想因为单元测试重构原来的代码，所以也不太方便gostub,基本符合我们对函数打桩的需求。

- 持久层mock——sqlmock。我们持久层的框架是gorm。当时考虑2种方法进行mock，一种是使用gomonkey对gorm的函数进行mock，另一种则是选用sqlmock。如果使用gomonkey的话需要对连续调用的gorm函数都进行mock，过于繁杂。而用sqlmock的话只需匹配对应的sql语句即可

4 .使用

安装

- go get github.com/smartystreets/goconvey

- go install github.com/smartystreets/goconvey

运行：

./goconvey.exe

页面访问： http://127.0.0.1:8080

</pre>

样例：

package goconvey

import (

"errors"

)

func Add(a, b int) int {

return a + b

}

func Subtract(a, b int) int {

return a - b

}

func Multiply(a, b int) int {

return a * b

}

func Division(a, b int) (int, error) {

if b == 0 {

return 0, errors.New("被除数不能为 0")

}

return a / b, nil

}

package goconvey

import (

"testing"

. "github.com/smartystreets/goconvey/convey"

)

func TestAdd(t *testing.T) {

Convey("将两数相加", t, func() {

So(Add(1, 2), ShouldEqual, 3)

})

}

func TestSubtract(t *testing.T) {

Convey("将两数相减", t, func() {

So(Subtract(1, 2), ShouldEqual, -1)

})

}

func TestMultiply(t *testing.T) {

Convey("将两数相乘", t, func() {

So(Multiply(3, 2), ShouldEqual, 6)

})

}

func TestDivision(t *testing.T) {

Convey("将两数相除", t, func() {

//patch

Convey("除以非 0 数", func() {

num, err := Division(10, 2)

So(err, ShouldBeNil)

So(num, ShouldEqual, 5)

})

Convey("除以 0", func() {

_, err := Division(10, 0)

So(err, ShouldNotBeNil)

})

}

</pre>

5 .断言函数

General Equality //通用比较

So(thing1, ShouldEqual, thing2) //相等

So(thing1, ShouldNotEqual, thing2) //不等

So(thing1, ShouldResemble, thing2) // a deep equals for arrays, slices, maps, and structs

So(thing1, ShouldNotResemble, thing2) //深度比较不相等

So(thing1, ShouldPointTo, thing2) //地址指向

So(thing1, ShouldNotPointTo, thing2) //地址不是指向

So(thing1, ShouldBeNil) //等于 nil

So(thing1, ShouldNotBeNil) //不等于 nil

So(thing1, ShouldBeTrue) //等于true

So(thing1, ShouldBeFalse) //等于false

So(thing1, ShouldBeZeroValue) //等于0值

Numeric Quantity comparison //数值比较

So(1, ShouldBeGreaterThan, 0) //大于

So(1, ShouldBeGreaterThanOrEqualTo, 0) //大于等于

So(1, ShouldBeLessThan, 2) //小于

So(1, ShouldBeLessThanOrEqualTo, 2) //小于等于

So(1.1, ShouldBeBetween, .8, 1.2) //区间内

So(1.1, ShouldNotBeBetween, 2, 3) //不在区间内

So(1.1, ShouldBeBetweenOrEqual, .9, 1.1) //区间取上下线

So(1.1, ShouldNotBeBetweenOrEqual, 1000, 2000) //不再区间

So(1.0, ShouldAlmostEqual, 0.99999999, .0001) // 容差比较，允许多的误差 tolerance is optional; default 0.0000000001

So(1.0, ShouldNotAlmostEqual, 0.9, .0001) //容差比较，不允许多少的误差

Collections //内建类型比较

So([]int{2, 4, 6}, ShouldContain, 4) //包含

So([]int{2, 4, 6}, ShouldNotContain, 5) //不包含

So(4, ShouldBeIn, ...[]int{2, 4, 6}) //在列表内

So(4, ShouldNotBeIn, ...[]int{1, 3, 5}) //不在列表内

So([]int{}, ShouldBeEmpty) //空列表

So([]int{1}, ShouldNotBeEmpty) //非空列表

So(map[string]string{"a": "b"}, ShouldContainKey, "a") //map 包含key

So(map[string]string{"a": "b"}, ShouldNotContainKey, "b") //map不包含key

So(map[string]string{"a": "b"}, ShouldNotBeEmpty) //非空map

So(map[string]string{}, ShouldBeEmpty) //空列表

So(map[string]string{"a": "b"}, ShouldHaveLength, 1) //长度 supports map, slice, chan, and string

Strings //字符串比较

So("asdf", ShouldStartWith, "as") //以某字符开头

So("asdf", ShouldNotStartWith, "df") //不是以某字符串开头

So("asdf", ShouldEndWith, "df") //以某字符串结尾

So("asdf", ShouldNotEndWith, "df") //不是以某字符串结尾

So("asdf", ShouldContainSubstring, "sd") //包含子串

So("asdf", ShouldNotContainSubstring, "er") //不包含子串

So("adsf", ShouldBeBlank) //空字符

So("asdf", ShouldNotBeBlank) //非空字符

panic //panic断言

So(func(), ShouldPanic) //发送panic

So(func(), ShouldNotPanic) //没有发生panic

So(func(), ShouldPanicWith, "") //以什么报错发什么 panic or errors.New("something")

So(func(), ShouldNotPanicWith, "") //不是以某错发生panic or errors.New("something")

Type checking //类型判断

So(1, ShouldHaveSameTypeAs, 0) //是否类型相同

So(1, ShouldNotHaveSameTypeAs, "asdf") //是否类型不相同

time.Time (and time.Duration) //时间判断

So(time.Now(), ShouldHappenBefore, time.Now()) //发生前

So(time.Now(), ShouldHappenOnOrBefore, time.Now()) //发生前或者当前时间

So(time.Now(), ShouldHappenAfter, time.Now()) //发生后

So(time.Now(), ShouldHappenOnOrAfter, time.Now()) //发生在之后或者当前时间

So(time.Now(), ShouldHappenBetween, time.Now(), time.Now()) //在某个时间区间

So(time.Now(), ShouldHappenOnOrBetween, time.Now(), time.Now()) //在区间内，并且取边界

So(time.Now(), ShouldNotHappenOnOrBetween, time.Now(), time.Now()) //不相等或者不再区间内

So(time.Now(), ShouldHappenWithin, duration, time.Now()) //以某个时间间隔固定发生

So(time.Now(), ShouldNotHappenWithin, duration, time.Now()) //不是以某时间间隔发生

</pre>

6 .Mock 方法

- ApplyFunc mock常规函数

patches := ApplyFunc(GetCmdbInsts, func(dims *models.DimsInfo) ([]Endpoint, error) {

return endpointList, nil

})

defer patches.Reset()

</pre>

- ApplyMethod mock方法函数

var test *ConsistentHashRing

patches.ApplyMethod(reflect.TypeOf(test),"GetNode", func(_ *ConsistentHashRing,pk string) (string, error) {

return "", errors.New("get judge node fail")

})

defer patches.Reset()

</pre>

- ApplyGlobalVar mock全局变量

patches := ApplyGlobalVar(&num, 150)

defer patches.Reset()

</pre>

- ApplyFuncSeq mock 函数序列桩

patches := ApplyFuncSeq(fake.ReadLeaf, outputs)

defer patches.Reset()

output, err := fake.ReadLeaf("")

So(err, ShouldEqual, nil)

So(output, ShouldEqual, info1)

output, err = fake.ReadLeaf("")

So(err, ShouldEqual, nil)

So(output, ShouldEqual, info2)

</pre>

- ApplyFuncVar mock 函数变量

patches := ApplyFuncVar(&fake.Marshal, func (_ interface{}) ([]byte, error) {

return []byte(str), nil

})// fake.Marshal是函数变量

defer patches.Reset()

</pre>

- ApplyFuncVarSeq 函数变量序列

patches := ApplyFuncVarSeq(&fake.Marshal, outputs)

defer patches.Reset()

bytes, err := fake.Marshal("")

So(err, ShouldEqual, nil)

So(string(bytes), ShouldEqual, info1)

bytes, err = fake.Marshal("")

So(err, ShouldEqual, nil)

So(string(bytes), ShouldEqual, info2)

</pre>

- ApplyMethodSeq mock 成员方法打序列桩

patches := ApplyMethodSeq(reflect.TypeOf(e), "Retrieve", outputs)

defer patches.Reset()

output, err := e.Retrieve("")

So(err, ShouldEqual, nil)

So(output, ShouldEqual, info1)

</pre>

- mock 接口打桩,同接口打桩

e := &fake.Etcd{}

info := "hello interface"

patches.ApplyMethod(reflect.TypeOf(e), "Retrieve",

func(_ *fake.Etcd, _ string) (string, error) {

return info, nil

})

output, err := db.Retrieve("")

So(err, ShouldEqual, nil)

</pre>

7 .参考链接

- https://mp.weixin.qq.com/s/eAptnygPQcQ5Ex8-6l0byA

- https://www.cnblogs.com/youhui/articles/11265947.html

- https://knapsackpro.com/testing_frameworks/difference_between/goconvey/vs/go-testify

- https://github.com/smartystreets/goconvey

- https://github.com/stretchr/testify/

- https://studygolang.com/topics/2992

- https://geektutu.com/post/quick-gomock.html gomock 的使用

- https://blog.marvel6.cn/2020/01/test-and-mock-db-by-xorm-with-the-help-of-convey-and-sqlmock/ 参考测试XORM

- https://github.com/dche423/dbtest/blob/master/pg/repository_test.go 参考测试gorm

prometheus 监控一

Posted on 2019/07/05 by Arts

关于prometheus 我们都知道它当前是一个开源的监控软件，社区活跃，使用的人也非常多。
今天主要就是针对当前我在配置prometheus的时候遇到的一些点，然后针对配置做一个简单的介绍，下面我去官网下载了一个二进制包，然后直接拿官网的prometheus.yml来说，如下：

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

# my global config

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# scrape_timeout is set to the global default (10s).

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

# - "first_rules.yml"

# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

- job_name: 'prometheus'

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:

- targets: ['localhost:9090']

其中我们可以看到
scrape_interval：1m 表示抓取周期
evaluation_interval:30s 表示计算prometheus 告警规则的周期
scrape_timeout:15s 表示抓取超时时间

注意以上三个项基本定义了抓取数据和计算规则，抓取数据的周期和计算规则的周期是不一样的，抓取数据单独的周期去1m抓取一次，然后计算规则会按照计算规则的周期去30s计算一次。

alerting:
alertmanagers:
– static_configs:
– targets:
# – alertmanager:9093 表示alertmanager的地址
这里prometheus主要负责抓取数据，然后计算规则，产生告警，一旦产生告警后，将会通知到alertmanager做对应的规则路由

rule_files:
# – “first_rules.yml”
# – “second_rules.yml”
表示具体的规则定义，这里我们可以定义多级别的目录，然后使用*.yml这样匹配所有的告警的文件

scrape_configs:
– job_name: ‘prometheus’
# metrics_path defaults to ‘/metrics’
# scheme defaults to ‘http’.
static_configs:
– targets: [‘localhost:9090’]

job_name 我理解主要是分组，然后默认static_configs 这里我们是采用静态的配置，也就是我们写好配置文件后，只能重启服务或者掉接口热加载才会生效。
targets：表示具体的ip+port 去拉去监控的数据，然后默认的路径就是/metrics,也就是： http://localhost:9090/me tics

注意，这里我们可以在静态target的地方打上相应的标签，然后在拉取数据上来后，就会打上相应的标签在数据上
比如：
static_configs:
– targets: [‘localhost:9090’]
labels:
endpoint:”test”

由于agent 采集的数据很多，这个时候我们为了节省一部分空间，或者少看一些数据，我们可以使用metric_relabel_configs 去做一些筛选

– job_name: host
metric_relabel_configs:
– source_labels: [__name__]
regex: (node_cpu.*|node_disk.*)
action: keep
比如，我就可以通过这样的操作去保留cpu, disk的数据，这里是根据监控项metric 去做正则表达式匹配保留,不在上面的我们就可以丢弃了。然后针对action我们可以定义保留keep 或者丢弃drop

有时候我们可能会想prometheus自动检测文件的变化，然后改动完文件即自动拉取对应的target
那么我们可以通过如下配置配上：

– job_name: host
metric_relabel_configs:
– source_labels: [__name__]
regex: (node_cpu.*|node_disk.*)
action: keep
file_sd_configs:
– files: [‘host.yml’]

注意，这里只能监控拉取的target的变化，但是不能监控规则的变化。

基础TCP&HTTP微服务架构设计

Posted on 2018/04/08 by Arts

针对当前的物联网服务的大力使用，以及业务层的微服务化，我们在处理大多出NB-IOT的物联网服务，我们经常遇到硬件端采用的是TCP链接，链接到服务器，然后自定义协议来与服务器通信，完成硬件到后台服务器的链接，同时我们可能又需要部署多节点，多台机器的集群.采用k8s集群管理和部署。

针对HTTP层，自然都采用以模块化的形式输出类似用户中心或者购物车中心之类的微服务，同样也是采用k8s的集群化管理。

那么对于HTTP层到TCP层的控制，可能我们最重要的就是发部分控制指令来控制硬件的开关或者配置。

这时候的难点就在于，我们两边都是集群的方式，我们如何去找到某个SN的硬件链接的TCP服务器，并且正确的处理完业务逻辑后，封包发送到TCP层硬件锁链接的node节点？

对于目前的业务形态，可能我们就需要分层来处理，每一层的需求，一下是我根据我目前的经验分层：

1.TCP层，只负责处理TCP层的链接，收发完数据后，把数据发送到相应的worker业务端处理后端逻辑

2.worker层，只负责处理业务逻辑，封包逻辑，封包完后，发送到相应的TCP端，TCP端负责发送到硬件。或者是收到硬件的上报的数据，解析处理，处理完成后，针对不同的需求做推送，或者更新存储的数据。同时处理上层HTTP端发来的请求，封包发送到硬件，完成基本的配置。

3.HTTP层，负责提供处接口，供来自网页端，或者APP端，微信这类的第三方的调用，然后完成签名校验后，发送消息到worker端，worker负责封包发给硬件，完成上层业务到硬件端的配置调用。

4.所有的层与层之间的通信，可以通过MQ集群来实现异步的通信过程

5.worker端功效路由的问题，我目前是采用了redis的存储来共享路由信息，worker端处理完成后，通过redis读取路由信息，然后根据相应的路由信息推消息到相应的TCP端监听的MQ队列。TCP 的MQ队列收到消息后，将最后的封包发送到对应的硬件，完成整个链路的通信处理。

使用或者转载本问相关内容，请注明出处，并告最原始作者。

技术上构建微服务应对需求的不断变更

Posted on 2017/10/29 by Arts

针对现在的大部分企业，都是网站，后台服务，APP的模式。要是谁家开科技公司特么没一个APP估计才是怪事了，这时候我们一定会面临的问题就是页面和APP的需求不一致，对后台服务提供的数据要求不一致。当然还有需求不断变更这个永恒的话题，在我们不断开发新功能的时候，却永远都要维护以前的版本功能可用，就像是腾讯的QQ一样，你至今拿起古董级别的QQ，它依旧是可用的。那么我们如何解决这样的问题呢？

我简单的就我目前工作的经验简单谈下此问题，也简单的记录下，期望今后如果创业的时候还能翻起，用于给予我部分提点。

目前的软件走的速度就是谁先上线谁得天下的时代，所以大部分的公司都会以快作为第一要素，996是常事。但是又不面因为快给后续的开发带来很多的兼容问题，就好比V1版本的结构定义的数据处理模式可能是结构{A,B,C},但是目前需求要求必须加一个字段变为{A,B,C,D}，那么如何兼容以前的接口呢，可能这时候最方便的做法就是再加一个接口给新的APP或者网站使用，然后再重写一份代码。那么有什么好的解决办法呢？

理想状态应该是，一个APP或者网站，它是需要什么数据就拿什么数据的状态，而不管服务器会提供什么数据，当然提供的数据它都可以拿，但是拿多少，拿什么字段，应该是客户段决定的。这样就能很好的解决APP的兼容旧借口，旧的APP只需要ABC三个字段，那么在发布出去的时候它就拿ABC三个字段，之后的需求变更增加字段也不会影响到旧版本的APP的使用，因为它还是拿原来的三个字段，数据还是保持原来的格式。而新版本的APP，它也可以根据自己的需求多拿D字段的内容。

那么有什么好的解决方案呢？

其实呢，Facebook有一套很好的解决方案，就是graphql，可能很多人知道reset api，但是不一定知道graphql，因为我们目前的开发技术选型Go, 所以针对此类问题，我们选型在：graphql

项目地址：https://github.com/graphql-go/graphql

它属于一种模式定于语言吧，DSL, 同样支持int,bool,string,list这类基本数据类型，可以通过模式的定义形成数据结构。

它可以通过定义query 作为查询，定义Mutation用作更新数据操作，同时定义数据结构时可以加上对数据字段的描述，服务启动后会自动生成doc，这样在APP端开发时即可通过直接访问服务器便可以查看相关的字段的定义文档，免去了部分的沟通时间。

graphql.NewSchema(
   graphql.SchemaConfig{
      Query: graphql.NewObject(
         graphql.ObjectConfig{
            Name:        "Query",
            Description: "查询所有User相关的信息",
            Fields: graphql.Fields{
               "user": &graphql.Field{
                  Type: types.GLUser,
                  Args: graphql.FieldConfigArgument{
                     "userid": &graphql.ArgumentConfig{
                        Type:        graphql.NewNonNull(graphql.String),
                        Description: "通过user id获得所有user关联信息",
                     },
                  },
                  Resolve: queryUser,
               },
            },
         }),

graphql.NewSchema(

graphql.SchemaConfig{

Query: graphql.NewObject(

graphql.ObjectConfig{

Name: "Query",

Description: "查询所有User相关的信息",

Fields: graphql.Fields{

"user": &graphql.Field{

Type: types.GLUser,

Args: graphql.FieldConfigArgument{

"userid": &graphql.ArgumentConfig{

Type: graphql.NewNonNull(graphql.String),

Description: "通过user id获得所有user关联信息",

Resolve: queryUser,

}),

这样我们即可定义相关的查询接口，APP端即可通过userid 用户ID查询用户的相关信息，并且我们还可以通过不断的增加Args来扩充我们的字段，提供更多的新功能给新版本的app使用，同时依旧兼容旧版本的APP。同时通过Description 又很好的描述了字段的定义，提供了相应的开发文档。

针对变更部分

Mutation: graphql.NewObject(
	graphql.ObjectConfig{
		Name:        "Mutation",
		Description: "更新用户信息"
		Fields: graphql.Fields{
		"updateUser": &graphql.Field{
			Type: types.GLStructUser,
			Args: graphql.FieldConfigArgument{
				"userid": &graphql.ArgumentConfig{
				       Type:        graphql.String,
				       Description: "用户编号",
					},
					"phone": &graphql.ArgumentConfig{
					 Type:        graphql.String,
					Description: "用户手机号",
					},
					"nickname": &graphql.ArgumentConfig{
					Type:        graphql.String,
					Description: "用户昵称",
					},
					},
					Resolve: updateUser,
				},
				},
	}),

Mutation: graphql.NewObject(

graphql.ObjectConfig{

Name: "Mutation",

Description: "更新用户信息"

Fields: graphql.Fields{

"updateUser": &graphql.Field{

Type: types.GLStructUser,

Args: graphql.FieldConfigArgument{

"userid": &graphql.ArgumentConfig{

Type: graphql.String,

Description: "用户编号",

"phone": &graphql.ArgumentConfig{

Type: graphql.String,

Description: "用户手机号",

"nickname": &graphql.ArgumentConfig{

Type: graphql.String,

Description: "用户昵称",

Resolve: updateUser,

}),

这样我们便可以定义相关的修改和更新接口，同样的参数可以不断的变更，我们通过resolve定义的方法处理用户的请求，更新用户的数据。

那么返回什么数据呢？

var GLUserConfig = graphql.ObjectConfig{
	Name: "User",
	Fields: graphql.Fields{
		"id": &graphql.Field{
			Type:        graphql.String,
			Description: "用户编号",
			Resolve:     id.IDResolve,
		},
		"nickname": &graphql.Field{
			Type:        graphql.String,
			Description: "用户昵称",
		},
		"avatar": &graphql.Field{
			Type:        graphql.String,
			Description: "头像信息“，
		"phone": &graphql.Field{
			Type:        graphql.String,
			Description: "电话号码",
		},
		"sex": &graphql.Field{
			Type:        GLUserSex,
			Description: "性别",
		},
         },

var GLUserConfig = graphql.ObjectConfig{

Name: "User",

Fields: graphql.Fields{

"id": &graphql.Field{

Type: graphql.String,

Description: "用户编号",

Resolve: id.IDResolve,

"nickname": &graphql.Field{

Type: graphql.String,

Description: "用户昵称",

"avatar": &graphql.Field{

Type: graphql.String,

Description: "头像信息“，

"phone": &graphql.Field{

Type: graphql.String,

Description: "电话号码",

"sex": &graphql.Field{

Type: GLUserSex,

Description: "性别",

我们可以通过定义相应的返回结构，返回相应的数据即可，当然如果我们有新的需求，我们也可以增加相应的字段作为返回数据。

那么关键点在于graphql定义了这些模式后，它的查询语言了，无论是查询或者查询返回的数据，你都可以根据你的需求获取相应的字段用作自己的APP服务。
定义完了数据结构，我们就可以通过http的请求用语请求所有想要的业务数据
1. 可能我们某次查询只需要昵称和ID，那么我们就可以这样取我们需要的数据

{
  user(id: 3500401) {
    id,
    nickname,
  }
}

{

user(id: 3500401) {

id,

nickname,

}

2. 如果某次我们的新版APP需要更多的数据，那么我们就可以增加相应的字段，取更多APP需要的数据，而接口缺不需要做任何变更：

{
  user(id: 3500401) {
    id,
    nickname,
    phone，
    sex，
  }
}

{

user(id: 3500401) {

id,

nickname,

phone，

sex，

}

同理修改用户数据的接口也是一样的用法。

总结：
在增加新需求的时候，旧的接口可重用，但是又不影响旧的业务，同时支持新的业务。
在当前的业务不断变化的软件后台服务开发中，我们就可以通业务的分离，不断的形成微服务的形式，通过微服务的互相协作，应对新的需求，同时也保证旧版本的服务，支持更多的新服务和业务需求。

（有说得不到位的不对的，欢迎拍砖。如有转载，请增加原文链接，署名原作者。

简单的爬虫实验

Posted on 2017/08/162017/08/16 by Arts

说下背景，起因是因为公司业务上有一块，功能出现了问题，用户的设备端在去请求一个xml 文件的时候，发现从服务器总是下载出错，或者下载超级的慢，因为现有的环境是国外的客户的设备全部都链接到了国内的阿里云的服务器，然后导致下载异常的慢，所以现在想让过外的客户在下载文件的时候，可以判断如果设备端在国外，那么就重定向去新加坡的阿里云OSS 下载，否则国内的IP地址的用户就重定向到国内的阿里云OSS地址下载.

1. 首先，设备端是通过http请求来下载文件的，所以我唯一可以知道的是设备端连接过来的remoteaddr 地址.
2. 设个时候我就可以通过remoteaddr 地址去判断用户的设备到底处于国内还是国外，因为用户的设备有可能是移动的
3. 这个时候就找到了一个淘宝的IP地址库查询接口：http://ip.taobao.com/ipSearch.php
4. 就根据这个区请求查询IP地址的位置，然后做相应的地址重定向
5. 存在的问题，淘宝IP地址库的查询请求是有频率限制的，所以会存在频繁查询查询失败的情况，这个时候是默认跳新加坡的，因为我们主要的客户在国外

好了一下说正事~

淘宝的IP库，看起来就是给出了比较详细的信息：

{
    "code": 0,
    "data": {
        "country": "中国",
        "country_id": "CN",
        "area": "华南",
        "area_id": "800000",
        "region": "广东省",
        "region_id": "440000",
        "city": "广州市",
        "city_id": "440100",
        "county": "",
        "county_id": "-1",
        "isp": "电信",
        "isp_id": "100017",
        "ip": "14.215.177.38"
    }
}

{

"code": 0,

"data": {

"country": "中国",

"country_id": "CN",

"area": "华南",

"area_id": "800000",

"region": "广东省",

"region_id": "440000",

"city": "广州市",

"city_id": "440100",

"county": "",

"county_id": "-1",

"isp": "电信",

"isp_id": "100017",

"ip": "14.215.177.38"

}

这个时候，我就像做一个我自己的地址库，然后让别人来查，看了下，上面有的信息，这时候网上了查了下，好像可行，就开始动手了.
需要的信息：
1. 国家，国家代码
2. 省，省代码
3. 市，市代码
4. isp
5. IP地址库
相应的地址，在代码里面有，需要的可以看代码的请求地址

去网上搜索了一下，好像这些大概都可以找到,接下来就去爬下来就好了（无奈IP地址库的信息，现在我只找到了省级以上的地址库的信息，最后也没找全.

国家代码再维基百科上爬的

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import urllib2
import time
import re
from bs4 import BeautifulSoup


class HtmlDownloader(object):
    header = {'Cookie': 'AD_RS_COOKIE=20083363',
              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) \
              AppleWeb\Kit/537.36 (KHTML, like Gecko)\
              Chrome/58.0.3029.110 Safari/537.36'}
    def download(self,url):
        if url is None:
            raise Exception('url is None')
        # print url
        request = urllib2.Request(url,None,HtmlDownloader.header)
        try:
            resp = urllib2.urlopen(request)
            # print resp.getcode()
            if resp.getcode()!= 200:
                time.sleep(5)
                return self.download(url)
            else:
                return resp.read()
        except urllib2.URLError,e:
            print e
            time.sleep(5)
            return self.download(url)
    def readhtml(self,filename):
        file_object = open(filename)
        try:
            all_the_text = file_object.read()
        finally:
            file_object.close()
        return all_the_text



class HtmlParser(object):
    def has_tag(self,tag):
        return tag.has_attr('span')
    def region_parser(self,html_content):
        if html_content is None:
            raise Exception('html is None')
        soup = BeautifulSoup(html_content,'html.parser')
        for tag in soup.find_all(class_="MsoNormal"):
            # print tag.get_text()
            id = tag.get_text().split(" ")[0].strip()
            name = tag.get_text().split(" ")[1].strip()
            print id+"-->"+name
    def contry_parse(self,html_content):
        if html_content is None:
            raise Exception('html is None')
        soup = BeautifulSoup(html_content, 'html.parser')
        for tag in soup.find_all(class_="wikitable sortable"):
            # print tag
            # print tag.select('td')
            i = 0
            for td in tag.select('td'):
                if i % 5 == 0:
                    print "id-->"+td.get_text().strip()
                elif i % 5 == 4:
                    print "name-->"+td.get_text().strip()
                # print str(i)+"----"+td.get_text().strip()
                i = i + 1
            # code = tag.get_text().split(" ")[0].strip()
            # name = tag.get_text().split(" ")[3].strip()
            # print code + "-->" + name
    def contry_ipaddrlink_parse(self,html_content):
        if html_content is None:
            raise Exception('html is None')
        soup = BeautifulSoup(html_content, 'html.parser')
        for tag in soup.find_all(href=re.compile(u'http://ipblock.chacuo.net/view/.*')):
            print  tag.get_text()+"-->"+tag.get('href')
            # html_content = html_downloader.download(tag.get('href'))
            # print tag

    def ipaddress_parse(self,html_content):
        if html_content is None:
            raise Exception('html is None')
        soup = BeautifulSoup(html_content, 'html.parser')
        for tag in soup.find_all('pre'):
            # print tag.get_text()+"-->"
            return tag.get_text()

    def ipaddress_parse_text(self,html_content):
        if html_content is None:
            raise Exception('html is None')
        soup = BeautifulSoup(html_content, 'html.parser')
        for tag in soup.find_all(href=re.compile(u'http://ipblock.chacuo.net/view/.*')):
            # print re.sub(r'view/c_', "down/t_txt=c_", tag.get('href'))
            content = html_downloader.download(re.sub(r'view/c_', "down/t_txt=c_", tag.get('href')))
            try:
                print tag.get_text()
                contentstr = self.ipaddress_parse(content)
                # print contentstr
                for ipdata in contentstr.split('\r\n'):
                    data = ipdata.split('\t')
                    # print data[1]
                    # print data[0]
                    # print '-->'+ipdata
                    # print ipdata.split('\t')[0]
                    # print data[0].strip()
                    if len(data) >3:
                        print '--->ip:'+data[0]+'--->mask:'+data[1]+'-->mask/len:'+data[2]+'-->num:'+data[3]
            except Exception,e:
                print "no data"

    def s_ipaddress_parse_text(self, html_content):
        if html_content is None:
            raise Exception('html is None')
        soup = BeautifulSoup(html_content, 'html.parser')
        for tag in soup.find_all(href=re.compile(u'http://ips.chacuo.net/view/.*')):
            # print re.sub(r'view/c_', "down/t_txt=c_", tag.get('href'))
            content = html_downloader.download(re.sub(r'view/s_', "down/t_txt=p_", tag.get('href')))
            # print tag.get('href')+"--->"
            try:
                print tag.get_text()
                contentstr = self.ipaddress_parse(content)
                # print contentstr
                for ipdata in contentstr.split('\r\n'):
                    # print ipdata
                    data = ipdata.split('\t')
                    # print data
                #     # print data[1]
                #     # print data[0]
                #     # print '-->'+ipdata
                #     # print ipdata.split('\t')[0]
                #     # print data[0].strip()
                    if len(data) > 2:
                        print '--->ip:' + data[0] + '--->mask:' + data[1] + '-->num:' + data[2]
            except Exception, e:
                print "no data"
    def isp_ipaddress_parse_text(self, html_content):
        if html_content is None:
            raise Exception('html is None')
        soup = BeautifulSoup(html_content, 'html.parser')
        for tag in soup.find_all(href=re.compile(u'http://ipcn.chacuo.net/view/.*')):
            # print re.sub(r'view/c_', "down/t_txt=c_", tag.get('href'))
            content = html_downloader.download(re.sub(r'view/i_', "down/t_txt=c_", tag.get('href')))
            # print tag.get('href')+"--->"
            try:
                print tag.get_text()
                contentstr = self.ipaddress_parse(content)
                # print contentstr
                for ipdata in contentstr.split('\r\n'):
                    # print ipdata
                    data = ipdata.split('\t')
                    # print data
                #     # print data[1]
                #     # print data[0]
                #     # print '-->'+ipdata
                #     # print ipdata.split('\t')[0]
                #     # print data[0].strip()
                    if len(data) > 2:
                        print '--->ip:' + data[0] + '--->mask:' + data[1] + '-->num:' + data[2]
            except Exception, e:
                print "no data"

if __name__ == '__main__':
    html_downloader = HtmlDownloader()

    #region
    # html_content = html_downloader.download('http://www.stats.gov.cn/tjsj/tjbz/xzqhdm/201703/t20170310_1471429.html')
    # html_parser = HtmlParser()
    # html_parser.region_parser(html_content)

    #contry
    # html_content = html_downloader.readhtml('ISO3166-1.html')
    # html_parser = HtmlParser()
    # html_parser.contry_parse(html_content)

    #contry ip address link parse
    # html_content = html_downloader.download('http://ipblock.chacuo.net')
    # html_parser = HtmlParser()
    # html_parser.contry_ipaddrlink_parse(html_content)

    #contry ip address parse
    # html_content = html_downloader.download('http://ipblock.chacuo.net/down/t_txt=c_AO')
    # html_parser = HtmlParser()
    # html_parser.ipaddress_parse(html_content)

    # contry ip address parse to text
    # html_content = html_downloader.download('http://ipblock.chacuo.net')
    # html_parser = HtmlParser()
    # html_parser.ipaddress_parse_text(html_content)

    #cn s ipaddress parse
    html_content = html_downloader.download('http://ips.chacuo.net/')
    html_parser = HtmlParser()
    html_parser.s_ipaddress_parse_text(html_content)

    # cn s ipaddress parse
    html_content = html_downloader.download('http://ipcn.chacuo.net/')
    html_parser = HtmlParser()
    html_parser.isp_ipaddress_parse_text(html_content)

    # print html_content

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

#!/usr/bin/env python

# -*- coding:utf-8 -*-

import urllib2

import time

import re

from bs4 import BeautifulSoup

class HtmlDownloader(object):

header = {'Cookie': 'AD_RS_COOKIE=20083363',

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) \

AppleWeb\Kit/537.36 (KHTML, like Gecko)\

Chrome/58.0.3029.110 Safari/537.36'}

def download(self,url):

if url is None:

raise Exception('url is None')

# print url

request = urllib2.Request(url,None,HtmlDownloader.header)

try:

resp = urllib2.urlopen(request)

# print resp.getcode()

if resp.getcode()!= 200:

time.sleep(5)

return self.download(url)

else:

return resp.read()

except urllib2.URLError,e:

print e

time.sleep(5)

return self.download(url)

def readhtml(self,filename):

file_object = open(filename)

try:

all_the_text = file_object.read()

finally:

file_object.close()

return all_the_text

class HtmlParser(object):

def has_tag(self,tag):

return tag.has_attr('span')

def region_parser(self,html_content):

if html_content is None:

raise Exception('html is None')

soup = BeautifulSoup(html_content,'html.parser')

for tag in soup.find_all(class_="MsoNormal"):

# print tag.get_text()

id = tag.get_text().split(" ")[0].strip()

name = tag.get_text().split(" ")[1].strip()

print id+"-->"+name

def contry_parse(self,html_content):

if html_content is None:

raise Exception('html is None')

soup = BeautifulSoup(html_content, 'html.parser')

for tag in soup.find_all(class_="wikitable sortable"):

# print tag

# print tag.select('td')

i = 0

for td in tag.select('td'):

if i % 5 == 0:

print "id-->"+td.get_text().strip()

elif i % 5 == 4:

print "name-->"+td.get_text().strip()

# print str(i)+"----"+td.get_text().strip()

i = i + 1

# code = tag.get_text().split(" ")[0].strip()

# name = tag.get_text().split(" ")[3].strip()

# print code + "-->" + name

def contry_ipaddrlink_parse(self,html_content):

if html_content is None:

raise Exception('html is None')

soup = BeautifulSoup(html_content, 'html.parser')

for tag in soup.find_all(href=re.compile(u'http://ipblock.chacuo.net/view/.*')):

print tag.get_text()+"-->"+tag.get('href')

# html_content = html_downloader.download(tag.get('href'))

# print tag

def ipaddress_parse(self,html_content):

if html_content is None:

raise Exception('html is None')

soup = BeautifulSoup(html_content, 'html.parser')

for tag in soup.find_all('pre'):

# print tag.get_text()+"-->"

return tag.get_text()

def ipaddress_parse_text(self,html_content):

if html_content is None:

raise Exception('html is None')

soup = BeautifulSoup(html_content, 'html.parser')

for tag in soup.find_all(href=re.compile(u'http://ipblock.chacuo.net/view/.*')):

# print re.sub(r'view/c_', "down/t_txt=c_", tag.get('href'))

content = html_downloader.download(re.sub(r'view/c_', "down/t_txt=c_", tag.get('href')))

try:

print tag.get_text()

contentstr = self.ipaddress_parse(content)

# print contentstr

for ipdata in contentstr.split('\r\n'):

data = ipdata.split('\t')

# print data[1]

# print data[0]

# print '-->'+ipdata

# print ipdata.split('\t')[0]

# print data[0].strip()

if len(data) >3:

print '--->ip:'+data[0]+'--->mask:'+data[1]+'-->mask/len:'+data[2]+'-->num:'+data[3]

except Exception,e:

print "no data"

def s_ipaddress_parse_text(self, html_content):

if html_content is None:

raise Exception('html is None')

soup = BeautifulSoup(html_content, 'html.parser')

for tag in soup.find_all(href=re.compile(u'http://ips.chacuo.net/view/.*')):

# print re.sub(r'view/c_', "down/t_txt=c_", tag.get('href'))

content = html_downloader.download(re.sub(r'view/s_', "down/t_txt=p_", tag.get('href')))

# print tag.get('href')+"--->"

try:

print tag.get_text()

contentstr = self.ipaddress_parse(content)

# print contentstr

for ipdata in contentstr.split('\r\n'):

# print ipdata

data = ipdata.split('\t')

# print data

# # print data[1]

# # print data[0]

# # print '-->'+ipdata

# # print ipdata.split('\t')[0]

# # print data[0].strip()

if len(data) > 2:

print '--->ip:' + data[0] + '--->mask:' + data[1] + '-->num:' + data[2]

except Exception, e:

print "no data"

def isp_ipaddress_parse_text(self, html_content):

if html_content is None:

raise Exception('html is None')

soup = BeautifulSoup(html_content, 'html.parser')

for tag in soup.find_all(href=re.compile(u'http://ipcn.chacuo.net/view/.*')):

# print re.sub(r'view/c_', "down/t_txt=c_", tag.get('href'))

content = html_downloader.download(re.sub(r'view/i_', "down/t_txt=c_", tag.get('href')))

# print tag.get('href')+"--->"

try:

print tag.get_text()

contentstr = self.ipaddress_parse(content)

# print contentstr

for ipdata in contentstr.split('\r\n'):

# print ipdata

data = ipdata.split('\t')

# print data

# # print data[1]

# # print data[0]

# # print '-->'+ipdata

# # print ipdata.split('\t')[0]

# # print data[0].strip()

if len(data) > 2:

print '--->ip:' + data[0] + '--->mask:' + data[1] + '-->num:' + data[2]

except Exception, e:

print "no data"

if __name__ == '__main__':

html_downloader = HtmlDownloader()

#region

# html_content = html_downloader.download('http://www.stats.gov.cn/tjsj/tjbz/xzqhdm/201703/t20170310_1471429.html')

# html_parser = HtmlParser()

# html_parser.region_parser(html_content)

#contry

# html_content = html_downloader.readhtml('ISO3166-1.html')

# html_parser = HtmlParser()

# html_parser.contry_parse(html_content)

#contry ip address link parse

# html_content = html_downloader.download('http://ipblock.chacuo.net')

# html_parser = HtmlParser()

# html_parser.contry_ipaddrlink_parse(html_content)

#contry ip address parse

# html_content = html_downloader.download('http://ipblock.chacuo.net/down/t_txt=c_AO')

# html_parser = HtmlParser()

# html_parser.ipaddress_parse(html_content)

# contry ip address parse to text

# html_content = html_downloader.download('http://ipblock.chacuo.net')

# html_parser = HtmlParser()

# html_parser.ipaddress_parse_text(html_content)

#cn s ipaddress parse

html_content = html_downloader.download('http://ips.chacuo.net/')

html_parser = HtmlParser()

html_parser.s_ipaddress_parse_text(html_content)

# cn s ipaddress parse

html_content = html_downloader.download('http://ipcn.chacuo.net/')

html_parser = HtmlParser()

html_parser.isp_ipaddress_parse_text(html_content)

# print html_content

个人经验要点：
1. 爬基本的信息的时候，如果遇到整页信息的，其实可以不用http 请求，特别是想国外网站的（维基百科）,不科学上网还请求不下来，这个时候就直接手动复制一下，然后读入解析就好了，我维基百科就是直接辅助文件，然后解析文件的

2. 遇到二级或者三级页面的时候，可以自己手动点击一下，然后看看页面的跳转，因为大批量类似页面的时候，有时候可能只需要改变页面的一个字符就可以直接请求了

3. 关键点在解析部分，这里我用的是python + BeautifulSoup 爬的，之前我想用go爬，却发现做正则表达式匹配的时候非常困难，然后爬了一个就改为用python了
BeautifulSoup 好像可以直接过滤掉&nbsp这类的字符，然后有很多的接口可以直接调用，获取到title 之类的html 标签，很方便

4. 当爬到纯文本的时候，这个时候要读取行或者列的时候，用字符串的分隔，分成数组，来挑选其中需要的项，我觉得这样是比较方便的。

5. 注意请求头要加一下一些基本的http 请求头信息，否则有的网站会识别，然后不会回应你.

6. 封装请求html 下载页面内容的方法，再解析想要的内容，存入数据库即可。

7. 服务部分就可以直接写服务，读取相应的数据库，查询，提供服务即可。

1 2 3 4 下一页 »

^画※哲^

互联网让世界没有陌生人，只有还没认识的小伙伴～

随笔

关于golang 单测

prometheus 监控一

基础TCP&HTTP微服务架构设计

技术上构建微服务应对需求的不断变更

简单的爬虫实验

一	二	三	四	五	六	日
« 1月
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31