你所不知道的C语言:指针篇(续1)

Pointers vs. Arrays

  • array vs. pointer

in declaration

extern, 如 extern char x[]; => 不能变更为pointer 的形式

definition/statement, 如 char x[10] => 不能变更为 pointer 的形式

parameter of function, 如 func(char x[]) => 可变更为 pointer 的形式 => func(char *x)

in expression

array 与 pointer 可互换

<code>int main() {
  int x[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  printf("%d %d %d %d\n", x[4], *(x + 4), *(4 + x), 4[x]);
}/<code>
  • 在 The C Programming Language 第 2 版,Page 99 写道:

As formal parameters in a function definition,

补充:formal parameters就是我们常讲的形参。关于这点在《The C Programming Language》 K&R Page25有很明确的定义

We will generally use parameter for a variable named in the parenthesized list in a function definition, and argument for the value used in a call of function. The terms formal argument and actual argument are sometimes used for the same distinction.

We will generally use parameter for a variable named in the parenthesized list in a function definition, and argument for the value used in a call of function. The terms formal argument and actual argument are sometimes used for the same distinction.

  • Page 100 則写char s[]; and char *s are equivalent.

这就是悲剧的由來,人們往往忘了前一页

x[i] 总是被编译器改写为 *(x + i) <== in expression

  • C 提供操作多维数组的机制 (C99 [6.5.2.1] Array subscripting),但实际上只有一维数据的数据存取
  1. 对应到线性内存
  2. Successive subscript operators designate an element of a multidimensional array object. If E is an n-dimensional array (n ≥ 2) with dimensions i × j × ... × k, then E (used as other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with dimensions j × ... × k. If the unary * operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the pointed-to (n − 1)-dimensional array, which itself is converted into a pointer if used as other than an lvalue. It follows from this that arrays are stored in row-major order (last subscript varies fastest)
  3. Consider the array object defined by the declaration int x[3][5]; Here x is a 3 × 5 array of ints; more precisely, x is an array of three element objects, each of which is an array of five ints. In the expression x[i], which is equivalent to (*((x)+(i))), x is first converted to a pointer to the initial array of five ints. Then i is adjusted according to the type of x, which conceptually entails multiplying i by the size of the object to which the pointer points, namely an array of five int objects. The results are added and indirection is applied to yield an array of five ints. When used in the expression x[i][j], that array is in turn converted to a pointer to the first of the ints, so x[i][j] yields an int.
  • array subscripting 在编译时期只能做以下两件事:

得知 size

Obtain a pointer to element 0

  • 前两者以外的操作,都通过 pointer

array subscripting => syntax sugar

syntax sugar:语法糖,关于语法糖的含义,读者可以自行网络搜索。

  • 数组声明:
<code>int a[3];
struct { double v[3]; double length; } b[17];
int calendar[12][31];/<code>

那么...

<code>sizeof(calendar) = ? sizeof(b) = ?/<code>

善用GDB,没必要的过多的使用printf(),并能动态分析: (下面以GNU/Linux x86_64作为演示平台)

  • 有时会遇到程序逻辑和结果正确,但因为printf()的输出格式没用对,而误以为自己程序没写好的状况
<code>(gdb) p sizeof(calendar)
$1 = 1488
(gdb) print 1488 / 12 / 31
$2 = 4
(gdb) p sizeof(b)
$3 = 544/<code>

还可以分析类型:

<code>(gdb) whatis calendar
type = int [12][31]
(gdb) whatis b[0]
type = struct {...}
(gdb) whatis &b[0]
type = struct {...} */<code>

更可直接观察和修改内存内容:

<code>(gdb) x/4 b
0x601060 : 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) p b
$4 = {{
  v = {0, 0, 0},
  length = 0
} }
/<code>

终于可以来做实验!

<code>(gdb) p &b
$5 = (struct {...} (*)[17]) 0x601060 
(gdb) p &b+1
$6 = (struct {...} (*)[17]) 0x601280 /<code>

上一行&b + 1指向的地址,就是int a[3]的所在处?!确认一下:

<code>(gdb) p &a[0]
$7 = (int *) 0x601280 /<code>

那 &b[0] + 1 指向哪呢?

<code>(gdb) p &b[0]+1
$8 = (struct {...} *) 0x601080 
(gdb) p sizeof(b[0]()
$9 = 32/<code>

原来&b[0] + 1的+1就是后移一个b[0]占用的空间

提高输出的可读性

<code>(gdb) set print pretty/<code>

继续观察内存内容:

<code>(gdb) p &b[0]
$10 = (struct {...} *) 0x601060 
(gdb ) p (&b[0])->v
$11 = {0, 0, 0}/<code>

p 命令不仅能print,可以拿来变更内存内容:

<code>(gdb) p (&b[0])->v = {1, 2, 3}
$12 = {1, 2, 3}
(gdb) p b[0]
$13 = {
  v = {1, 2, 3},
  length = 0
}/<code>

还记得前面提到(float) 7.0和(int) 7的不同吗?我们来观察执行时期的表现:

<code>(gdb) whatis (&b[0])->v[0]
type = double
(gdb) p sizeof (&b[0])->v[0]
$14 = 8/<code>

Linux x86_64采用LP64 data model,double依据C语言规范,至少要64-bit长度。现在试着强制转换类型:

<code>(gdb) p &(&b[0])->v[0]
$15 = (double *) 0x601060 
(gdb) p (int *) &(&b[0])->v[0]
$16 = (int *) 0x601060 
(gdb) p *(int *) &(&b[0])->v[0]
$17 = 0/<code>

然后竟然变成0了?!

因为:

<code>(gdb) p sizeof(int)
$18 = 4/<code>

我们只取出v[0]开头的4 bytes,转型为int后,内容就是0。打印内存来观察:

<code>(gdb) x/4 (int *) &(&b[0])->v[0]
0x601060 : 0x00000000 0x3ff00000 0x00000000 0x40000000/<code>

GDB强大之处不只如此,你甚至在动态时期可调用函数(改变执行顺序),比方说memcpy:

<code>(gdb) p calendar
$19 = {{0 } }
(gdb) p memcpy(calendar, b, sizeof(b[0]))
$20 = 6296224
(gdb) p calendar
$21 = {{0, 1072693248, 0, 1073741824, 0, 1074266112, 0 }, {0 } }/<code>

现在calendar[][]内容已改变。前述输出有个数字6296224,到底是什么呢?试着观察:

<code>(gdb) p (void *) 6296224
$22 = (void *) 0x6012a0 /<code>

原来就是memcpy的目的地址,符合man page memcpy(3)描述。

从calendar把{1, 2, 3}内容取出该怎么作呢?

<code>(gdb) p *(double *) &calendar[0][0]
$23 = 1
(gdb) p *(double *) &calendar[0][2]
$24 = 2
(gdb) p *(double *) &calendar[0][4]
$25 = 3/<code>
  • 指针可能恰好指向数组的元素

我们可以使用该指针获取指向该数组的下一个/上一个元素的指针

该指针加/减1

<code>int a[10];
	...
int *p;
p = a; /* take the pointer to a[0] */
p++; /* next element */
p--; /* previous element *//<code>
  • 向指针添加整数不同于将整数添加到指针的位表示形式
<code>int *p;
p = p + 1; /* this advances p's value (pointed-address) by sizeof(int) which is usually not 1 *//<code>
  • 数组的名称与每个上下文中指向该数组的指针相同,但一个
<code>int a[3];
int *p;
p = a; /* p points to a[0] */
*a = 5; /* sets a[0] to 5 */
*(a+1) = 5; /* sets a[1] to 5 *//<code>

唯一的区别是sizeof:

<code>sizeof(a) /* returns the size of the entire array not just a[0] *//<code>
  • 假设我们要将字符串s和字符串t连接成一个字符串
<code>char *r;
strcpy(r, s);
strcat(r, t);/<code>

不起作用,因为r没有指向任何地方。

让我们将r制成一个数组-现在它指向100个字符

<code>char r[100];
strcpy(r, s);
strcat(r, t);/<code>

只要s和t指向的字符串不太大,此方法就起作用。

  • 我们想写一些类似的东西:
<code>char r[strlen(s) + strlen(t)];
strcpy(r, s); strcat(r, t);
/<code>

但是C要求我们将数组的大小声明为常数。

<code>char *r = malloc(strlen(s) + strlen(t));
strcpy(r, s); strcat(r, t);/<code>

失败有以下三个原因:

  1. malloc() 可能无法分配所需的内存
  2. 释放分配的内存很重要
  3. 我们没有分配足够的内存

正确的代码将是:

<code>char *r = malloc(strlen(s) + strlen(t) + 1); // use sbrk; change progrram break
if (!r) exit(1); /* print some error and exit *//<code>
<code>strcpy(r, s); strcat(r, t);

/* later */
free(r);
r = NULL; /* Try to reset free’d pointers to NULL */

`int main(int argc, char *argv[], char *envp[])` 的奧祕

#include 
int main(int argc, char (*argv)[0])
{
  puts(((char **) argv)[0]);
  return 0;
}
/<code>

使用 gdb

<code>(gdb) b main
(gdb) r
(gdb) print *((char **) argv)
$1 = 0x7fffffffe7c9 "/tmp/x"/<code>

这里符合预期,但接下来:

<code>(gdb) x/4s (char **) argv
0x7fffffffe558: "\311\347\377\377\377\177"
0x7fffffffe55f: ""
0x7fffffffe560: ""
0x7fffffffe561: ""/<code>

看不懂了,要换个方式:

<code>(gdb) **x/4s ((char **) argv)[0]**
0x7fffffffe7c9: "/tmp/x"
0x7fffffffe7d0: "LC_PAPER=zh_TW"
0x7fffffffe7df: "XDG_SESSION_ID=91"
0x7fffffffe7f1: "LC_ADDRESS=zh_TW"/<code>
  • 后三项是envp (environment variable)
  1. C array 大概是C 语言中数一数二令人困惑的部分。根据你使用的地方不同,Array 会有不同的意义:

如果是用在expression,array 永远会被转成一个pointer

用在function argument 以外的declaration 中它还是一个array,而且「不能」被改写成pointer

function argument 中的array 会被转成pointer

  • 若现在这里有一个global
<code>char a[10];/<code>
  1. 在另一个文件中,我不能够用extern char *a来操作原本的a,因为实际上会对应到不同的指令(instruction)。但若你的 a declaration是在function argument中,那么:

void function(char a[])与void function(char * const a)是等价的。且真实的type会是pointer。因此你不能用sizeof来抓它的大小!(array是unmodifiable l-value,所以除了被转成pointer,它还会是一个不能再被赋值的pointer,因此需要加上const修饰。)

2.最后,用在取值时,array的行为与pointer几乎相同,但array会是用两步取值,而pointer是三步。(array的地址本身加上offset,共两步,而使用pointer时,cpu需先载入pointer地址,再用pointer的值当作地址并加上offset取值)

你所不知道的C语言:指针篇(续1)


分享到:


相關文章: