mySQL Hints: Tips, Suggestions, and Findings: mysql

Showing posts with label mysql. Show all posts

28 January 2016

MySQL Swap Insanity: Solved

Yes, it has been a while since my last post. Still in the midst of MySQL ecosystem, and enjoying it tremendously. I thought I would make this post as it relates to a topic that has some varied results in the online world, and I would like to kick start this blog again. We'll see how that goes... :)

MySQL has been able to harness the potential of more powerful (CPU) and larger (RAM, disk space) systems in recent years like never before. With this comes the possibility that something can go awry in a big way, and that just happened with us when we upgraded from MySQL 5.1 to MySQL 5.6.

Our original 5.1 system was on CentOS, and aside from two funky issues (one being MySQL process getting shot in the head by cfengine), our system was stable and performant. With continuous data initiatives/pressures (more data, improved performance and increased availability), we upgraded our system to MySQL 5.6 on Ubuntu 14.04, managed by puppet.

The first day or two were smooth; transition of data was a big deal, but other than that, it was almost a non-event.

Until day 3. Swap started to increase.

Not only that, swap increased in a big way. From 1% to over 80% in a matter of days. Yes, it started to plateaued, and in some cases, it was at the ~70% mark, and in other cases, at the ~99% mark. And yes, we had a couple of MySQL instances restart due to the dreaded signal 11 message (ie crash). At that point, we ended up restarting several MySQL instances daily so that I would not get woken up in the middle of the night with an issue. And this way I would not have to repair dozens or hundreds of tables.

Restarting MySQL in this manner was clearly not acceptable, as it reminded me of my old SQL server days when we restarted those instances daily, a common Windows "fix" at that time in history.

So what could be the culprit in dealing with this swap issue in MySQL?

There were a number of possibilities that we considered. The first four possibilites were MySQL related, the remaining ones were system related.

key_buffer_size - reduce size
max_connections - reduce value
performance_schema - turn it off
table_open_cache - reduce value
swappiness - reduce value
numactl --interleave=all
sysctl -q -w vm.drop_caches=3

We (including my trusty DBA associate, Jonathan Park) tried a few things on this list, including #1, #2, #3, #6, and #7.

We even tested on a box with Innodb engine set off.

We still noticed that swap increased under our load test.

In fact, we were very successful at increasing swap rapidly over a short time. We could start with swap at 1% and it would increase to about 80% in about 15 minutes. And I was merely running 8 threads to make that happen.

What Worked?

Swappiness.

We noticed that swappiness was set to 60 by default (we have Ubuntu 14.04), and we reduced it to 10. We ran our tests and swap did increase a little bit, but not as badly as before. So we reduced it to 1 (not zero) and swap behavior stayed constant.

I am interested to find out what the MySQL community has done in addition to the prominent post at Jeremy Cole: MySQL Swap Insanity and the NUMA Architecture and would like to see what may have worked for you!

And if you try setting swappiness to 1, does it work for you?

08 January 2010

How to SHOW PROCESSLIST Every .5 Seconds

Yes, it has been a while since I last posted, so we'll see how this year goes...

Just the other evening I came across something that I should have done much sooner. Run SHOW PROCESSLIST every second without hitting "ENTER" repetitively.

The Setup

1) Create a temporary user with a non-sensitive password on localhost with SUPER privs. Yes, this crazy, but I did say *temporary*, right?

mysql> GRANT SUPER ON *.* TO tmp_user@localhost IDENTIFIED BY 'tmp_user_passwd_2102394567';

Now In Action

2) Go back to the shell and type this:

$ watch -n1 'mysql -utmp_user -ptmp_user_passwd_2102394567 --exec="SHOW PROCESSLIST"'

Simple, eh?

This is a good way to get a quick rundown of mysql's process list in a dev or test environment, such as simple web page load tests, or even an environment where you have a long running query and you are wondering when it will be completed.

Of course, there are a lot of fantastic tools out there that will do much more, and are better, but this is a good "quick & dirty" processlist check that works as a good substitute for typing SHOW PROCESSLIST in mysql 20 times over...

DON'T FORGET to DROP this user when you're done!

Until next time,
David

15 October 2008

KickFire is Back

After receiving an email about talking with Robert David, Director of Sales at KickFire, I checked out recent news on KickFire, as there had been little written about this company aside from the big splash they made at the MySQL Users Conference back in April 2008.

And, lo and behold, there was a piece of news, posted on October 14 - Kickfire Enters Into MySQL Enterprise Agreement With Sun Microsystems. It did not seem to garner any attention from the MySQL community yet, and the only source that picked up on this was "The Financial". Interestingly, the tagline of "The Financial" is "Open Source Information", which does not represent the same meaning to those of us working with open source tools. I don't know who is doing the PR for KickFire, but there might have been a skipped number on the speed dial somewhere.

This obviously was not as exciting as a headline like "Kickfire Machine Now in Production, Already Saving Dozens of Companies Thousands of Dollars". Or "Extended Data Warehouse Testing Beyond TPC-H headquarters Proves the Robustness of the First MySQL Data Appliance". Perhaps a review from Percona, Pythian, The 451 Group, or others listed along the panel of PlanetMySQL.org could add some fury to the data warehouse fire that is burning in the news lately.

Anyways, tomorrow I will find out what the latest news on this is; after all, since April 2008, it does seem like a race of turtles between MySQL 5.1, the KickFire Appliance, and the introduction of an affordable electric car with a decent range. I understand that quality of the product exceeds the quantity of time, so I am willing to deal with that. But for the last 2+ years, I have learned not to hold my breath over that.

10 October 2008

Fixing InnoDB IMPORT TABLESPACE Error: ERROR 1030 (HY000): Got error -1 from storage engine

Setup
We have one InnoDB file per table on our database, which was set with the following option: innodb_file_per_table.

This allows me to portably transport Innodb files on a system level with minimal hassle, and is much faster than mysqldump, as these tables are several GB each.

Problem
When transporting an .idb file from one server to another, I ran into the following error:

ERROR 1030 (HY000): Got error -1 from storage engine

How did I get to this point? I did the following:
1) created a dummy table in the database;
2) discarded its tablespace;
3) moved the .ibd file into the database folder on the system;
4) attached the tablespace back to the table

This is how these actions were accomplished:

mysql> CREATE TABLE new_innodb_table (value1 INT UNSIGNED NOT NULL, value2 CHAR(5) NOT NULL);

mysql> ALTER TABLE new_innodb_table DISCARD TABLESPACE;

[sys_prompt]# mv new_innodb_table.ibd /var/lib/mysql/db_name/

mysql> ALTER TABLE new_innodb_table ATTACH TABLESPACE;
ERROR 1030 (HY000): Got error -1 from storage engine

The next thing you automatically do when you see an error in your mysql client is check your error log immediately. If this log is not enabled, please stop reading this post and go here.

Here is what the log recorded:

081009 14:21:02 InnoDB: Error: tablespace id in file './db_name/new_innodb_table.ibd' is 26, but in the InnoDB
InnoDB: data dictionary it is 28.
InnoDB: Have you moved InnoDB .ibd files around without using the
InnoDB: commands DISCARD TABLESPACE and IMPORT TABLESPACE?
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.0/en/innodb-troubleshooting.html
InnoDB: for how to resolve the issue.
081009 18:21:02 InnoDB: cannot find or open in the database directory the .ibd file of
InnoDB: table `stbHealth/sub_stb_health_associate`
InnoDB: in ALTER TABLE ... IMPORT TABLESPACE

From the error log we see that the error concerned mismatched tablespace ids.

If the tablespace id for my .ibd file was 26, then the tablespace would have been successfully imported. Also, if the tablespace id in the data dictionary 25, then all I would need to do is create a dummy innoDB table, thus incrementing the tablespace id in the data dictionary to 26, then I could try importing the tablespace again. Tricky, but it works (especially if that number was lower and you would make several dummy innodb tables).

Solution
However, the easier option is this: restart mysql, then do the same four steps listed near the beginning of the post. This way, the tablespace id on the data dictionary and the file matched; thus importing the tablespace succeeded.

This can give you greater confidence in dealing with some of the InnoDB "gotcha's" during the recovery process or even file transfers. It did with me!

04 September 2008

Case Sensitive Fields

Yes, it has been too long since I last posted on this blog (3 months) - that's my apology; let's move forward.

There's too many interesting MySQL bits and pieces that I still come across my desk that I cannot help but post these fantastic learning opportunities to my blog.

Obviously, you can read about all things interesting from PlanetMySql.org, but I hope that my blog will help sift people through specific issues that pertain directly to them while working with MySQL.

Oh yes... I am now located in Denver and I work for a big startup that uses MySQL for data mining. Most excellent!

Querying Case Sensitive Fields

Issue: Users to be able to query for case sensitive field values. For example, 700g is not the same as 700G.

Answer: COLLATE latin1_general_cs

Example in action:

mysql> CREATE TABLE test (non_case_sensitive char(1) NOT NULL, case_sensitive char(1) COLLATE latin1_general_cs) ENGINE=MyISAM charset=latin1;

mysql> insert into test values ('x','x');
mysql> insert into test values ('X','X');
mysql> insert into test values ('y','Y');
mysql> insert into test values ('Y','y');
mysql> select non_case_sensitive, count(*) from test group by non_case_sensitive;
+--------------------+----------+
| non_case_sensitive | count(*) |
+--------------------+----------+
| x | 2 |
| y | 2 |
+--------------------+----------+
2 rows in set (0.00 sec)

mysql> select case_sensitive, count(*) from test group by case_sensitive;
+----------------+----------+
| case_sensitive | count(*) |
+----------------+----------+
| X | 1 |
| x | 1 |
| Y | 1 |
| y | 1 |
+----------------+----------+
4 rows in set (0.00 sec)

Voila!

Performance Considerations

There does not appear to be much of a performance hit as a result of this change. On a 4.5 million record table, there was no difference. In fact many of our comparison queries (with same number of results) on the case sensitive table were a bit faster than on the case insensitive table.

Have fun! I hope to post more frequently in the future as I continue to find out interesting tidbits along the way.

Until next time,
David

27 June 2008

Things Not Replicating Correctly? Part 2

My last post on June 10 was about when MySQL received queries from php scripts that looked like this:

mysql_select_db = ('database_I_DO_want_to_replicate', $link);
$sql = "INSERT INTO db_I_DO_NOT_want_to_replicate.repl_table (a,b,c) VALUES (1,2,3);";

-- and thus could not pass this over to the slave unless I set Replicate_Wild_Ignore_Table value IN ADDITION to Replicate_Ignore_DB as such in my configuration file:

replicate_ignore_DB = db_I_DO_NOT_want_to_replicate
replicate_wild_ignore_table = db_I_DO_NOT_want_to_replicate.%

News Flash

Three things:

As it happens, we do not need use replicate_ignore_DB for this to work properly;
Replicate_do_DB and Replicate_ignore_DB do not need to be used at all;
In fact, Replicate_do_DB is somewhat evil.

Firstly, replicate_wild_ignore_table is all you need to ignore databases, and it will properly replicate all kinds of php riff-raff, where the scripts do not necessarily select the proper database and they are always written in db.table style.

Secondly, replicate_wild_do_table is all you need to include databases, and it will properly replicate all kinds of php fudgerama, blah blah blah.

Thirdly, if you use replicate_do_DB in addition to replicate_wild_do_table to try to cover your bases, it will indeed NOT cover your bases. We have strict replication here, and it WILL break if the php code does this:

mysql_select_db = ('database_I_DO_NOT_want_to_replicate,$link);
$sql = "INSERT INTO database_I_DO_want_to_replicate.repl_table (a,b,c) VALUES (1,2,3);";

So, my advise, once again is:

DO NOT USE replicate_ignore_db and replicate_do_db!

Instead, use replicate_wild_ignore_db and replicate_wild_do_db.

Have fun! Try it out and see for yourself how this can work for and against you! (as well as MySQL loading my.cnf twice, causing your list of tables to appear twice -- proposed to be fixed for MySQL 5.1).

Until next time,
David

06 February 2008

Data Warehousing 101: The purpose of a data warehouse

When your company decides that "it is time to build a data warehouse", what thoughts come to mind?

1) A magical fairy ice cream land where data is presented in chocolate shells for everyone to digest perfectly;
2) A big literal warehouse in the industrial section of town with rusty old containers;
3) Another place to put data, which means another place for you to track and monitor additional activity;
4) A place to put a pared-down representation of your OLTP database and call it OLAP.

Sorry to dash anyone's hopes, but it is none of the above.

Before anyone starts shooting from the hip about what a data warehouse is, I recommend picking up the book "The Data Warehouse Toolkit" by Ralph Kimball, a very deserved pioneer in the specialty of data warehousing.

Let's start with the goals of a data warehouse. What do you hear through the walls of business when they are talking about data?

"We have oceans of data in this company, but we can't access it."
"We need to slice, dice, and thrice the data in every conceivable manner possible. "
"You've got to make it easy for business people to get at the data directly."
"Why do we have people present the same business metrics at a meeting, but with different numbers?"
"We want people to use infromation to support more fact-based decision making."

Because these concerns are widespread, they have driven the requirements for the data warehouse:

The data warehouse must make an organization's information easily accessible.
The data warehouse must present the organization's information consistently.
The data warehouse must be adaptive and resilient to change.
The data warehouse must be a secure bastion that protexts our information assets.
The data warehouse must serve as the foundation for improved decision making.
The business community must accept the data warehouse if it is to be deemed successful.

So what does a data warehouse look like? I am glad you asked. That will be the next thing I blog about. :)

Until next time,
David

30 January 2008

What Do You Name Date Fields?

I have seen too many date fields that have been named after reserved key words that I need to make an entry for this... It's like naming your kid "Kid". Wait a sec... that's happened many times before as well.

I will make a "Proper Naming Conventions" page for mySQL after checking a few more sources first, but this date naming issue cannot wait another day.

Thus..

First Rule: Do not use a reserved word for your field name!
Second Rule: Your date-related field name can actually include the word date or datetime in it!

Examples:

datetime_inserted
datetime_modified
datetime_joined
date_joined
member_since
date_of_birth
dob

Yes, this was a short simple post. But freekin' necessary.

Until next time,
David

06 December 2007

Creating Copies of Tables: MS SQL vs. MySql

I was reminded the other day how SQL Server was annoying in regards to table structure changes. All I wanted to to was copy a table (with structure intact - keys, dependancies) with its data.

How hard could that be?

MS SQL in Action

First, I needed to do a complete DDL on TableOne and replace every instance of the word "TableOne" with "TableTwo". It works, even though it is bulky and takes time, especially when the DDL was 68 lines long.

The second step was not too bad.

SELECT * FROM TableOne INTO TableTwo
GO

Ok - that did not seem too hard, until you see how this can be accomplished in MySql.

MySql's Turn

CREATE TABLE table_1 LIKE table_2;
INSERT INTO table_1 SELECT * FROM table_2;

Which looks easier to you?

Here's another wrinkle: what about table-level permissions? Which side is easier to manage?

Until next time,
David

16 November 2007

My Favorite Three Stooges Stats

Quick... You have to let everyone (boss, biz dev, customer service, and random bean counters) know why everything is moving slowly! Of course, rarely do people define what "everything" is, and what type of slowness is occuring. But, in the face of customer service agents that cannot work because their pages will not render, generally all eyes are on the famed-dba-of-the-minute.

So, with 7 people standing behind you, 3 IMs coming through (for those that bother with that), and 4 more messages on the phone, and the knowledge that at least a dozen people are talking about "the dba is working on it now", what do you do?

First, we all like SHOW PROCESSLIST. Nice. Gives us a quick bird's view on possible locks, etc.

But... what if there are no locks, and just a lot of interesting-looking selects? Before you beat up on the programmers, you have to carry the boat over the sticks first. Then you can throw darts at the dev team.

How to do this? Well, first, how's the CPU / IO situation?

Will 'top' really work? It's good for a general view of the state of the machine, so fine. Do that. Keep the window open to reference if you wish; I get the general trend within 5 seconds and then move on. But linux gives you FAR more than just that (here's a great reference that served as an inspiration to me) -- don't stop there!

But now comes the fun part:

I use mpstat to give me immediate feedback on disk I/O. One of my favorite uses is in the case below, where I am going to see what each (ALL) processor (-P) is doing every 10 seconds for 2 rounds...

mpstat -P ALL 2 10

05:54:05 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
05:54:10 PM all 1.25 0.00 1.00 78.29 0.05 0.10 0.00 19.31 652.51
05:54:10 PM 0 4.61 0.00 3.21 91.58 0.00 0.60 0.00 0.00 629.86
05:54:10 PM 1 0.40 0.00 0.20 21.84 0.00 0.00 0.00 77.35 22.65
05:54:10 PM 2 0.20 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00
05:54:10 PM 3 0.00 0.00 0.20 100.00 0.00 0.00 0.00 0.00 0.00

05:54:10 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
05:54:15 PM all 1.75 0.00 1.30 79.39 0.05 0.20 0.00 17.31 982.00
05:54:15 PM 0 6.20 0.00 4.20 88.60 0.20 0.80 0.00 0.00 945.60
05:54:15 PM 1 0.80 0.00 0.80 29.20 0.00 0.00 0.00 69.20 36.40
05:54:15 PM 2 0.20 0.00 0.00 99.80 0.00 0.00 0.00 0.00 0.00
05:54:15 PM 3 0.00 0.00 0.20 99.80 0.00 0.00 0.00 0.00 0.00

Average: CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
Average: all 1.55 0.00 1.09 78.98 0.06 0.15 0.00 18.17 859.50
Average: 0 5.47 0.00 3.54 90.21 0.20 0.58 0.00 0.00 822.21
Average: 1 0.66 0.00 0.50 26.06 0.02 0.02 0.00 72.76 37.29
Average: 2 0.08 0.00 0.02 100.04 0.00 0.00 0.00 0.00 0.00
Average: 3 0.02 0.00 0.30 99.86 0.00 0.00 0.00 0.00 0.00

OK, this box looks busy, and there is a lot of I/O going on. This is where I check the processlist again, and the slow query log (which I have a handy little script that parses this into a database on a seperate box -- and since I clean my logs daily, the size of the LOAD INFILE remains small). At this point, you can see which queries are causing the box the most amount of grief and review this with people who are willing to listen... :)

In the meantime, I still have to tell you about iostat, dstat and sar!

One of my favorite uses of the iostat command is as below (by this time, the server calmed down quite a bit!). k = kilobytes per second (instead of blocks per second -- makes it easy to assess the situation at hand)
t = prints time for each report (as I also have a utility that parses this data into a table for analysis, and it is good to know the time!)
x = Displays extended statistics, particularly %util, which is the percentage of CPU time during which I/O requests are issued to the device, where saturation occurs at 100%.

iostat -ktx 10 5

Time: 04:29:50 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.51 0.07 0.24 3.26 0.00 95.92

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 1.35 19.65 16.31 24.25 474.10 177.46 32.13 0.50 12.29 3.06 12.41

Time: 04:30:00 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 0.07 1.85 0.00 98.03

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 7.91 2.90 22.52 12.01 123.32 10.65 0.14 5.33 2.96 7.54

Time: 04:30:10 PM
avg-cpu: %user %nice %system %iowait %steal %idle
11.85 0.00 8.88 2.03 0.00 77.24

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 17.88 1.30 18.98 5.19 148.25 15.13 0.13 6.39 4.33 8.79

Time: 04:30:20 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.10 0.00 0.12 4.80 0.00 94.98

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 10.10 4.70 25.00 18.80 142.80 10.88 0.28 9.39 6.49 19.29

Time: 04:30:30 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.07 0.00 0.05 1.27 0.00 98.60

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 12.40 3.00 33.60 12.00 188.00 10.93 0.16 4.32 1.45 5.30

Next is dstat -- another way to look at CPU and I/O:

dstat

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
6 2 87 5 0 0| 44k 55k| 0 0 | 0 0 | 564 248
4 0 93 2 0 0| 44k 0 | 0 0 | 0 0 | 345 266
0 0 79 21 0 0| 60k 372k| 21k 5684B| 0 0 | 494 386
2 1 90 6 0 0| 48k 0 | 0 0 | 0 0 | 435 579
0 0 75 24 0 0| 92k 1496k| 50k 13k| 0 0 | 656 346
5 2 88 5 0 0| 112k 336k| 0 0 | 0 0 | 429 467
1 2 90 6 0 0| 80k 0 | 48k 11k| 0 0 | 426 514
1 1 92 5 0 0| 48k 344k| 0 0 | 0 0 | 501 576
0 0 81 18 0 0| 72k 0 | 44k 10k| 0 0 | 364 293
0 0 89 11 0 0| 24k 2064k| 0 0 | 0 0 | 704 273
8 1 77 13 0 0| 48k 240k| 32k 6214B| 0 0 | 350 146
3 1 89 7 0 0| 36k 0 | 0 0 | 0 0 | 400 401
7 1 76 16 0 0| 84k 72k| 35k 8034B| 0 0 | 381 243
5 1 92 3 0 0| 68k 0 | 0 0 | 0 0 | 357 310
3 0 87 9 0 0| 0 1552k| 25k 5858B| 0 0 | 580 106

And lastly:

sar 5 10

05:16:09 PM CPU %user %nice %system %iowait %steal %idle
05:16:14 PM all 0.30 0.00 0.05 0.95 0.00 98.70
05:16:19 PM all 0.30 0.00 0.15 3.65 0.00 95.90
05:16:24 PM all 0.40 0.00 0.15 6.49 0.00 92.96
05:16:29 PM all 0.50 0.00 0.05 4.25 0.00 95.20
05:16:34 PM all 0.30 0.00 0.10 3.60 0.00 96.00
05:16:39 PM all 0.35 0.00 0.15 0.90 0.00 98.60
05:16:44 PM all 0.25 0.00 0.05 3.75 0.00 95.95
05:16:49 PM all 0.45 0.00 0.10 0.65 0.00 98.80
05:16:54 PM all 0.40 0.00 0.10 1.00 0.00 98.50
05:16:59 PM all 0.05 0.00 0.10 0.55 0.00 99.30
Average: all 0.33 0.00 0.10 2.58 0.00 96.99

Enjoy using these! If needed, write them down on a handy little spot, so that when you're in the spotlight you can show how your servers are getting utilized (hammered). This will then get you to the next level with developers, biz dev, customer support and your supervisor (which would be how to improve a few queries).

Which goes to a future topic of using the slow query log. Fun!!

Until next time,
David

function foo() {
  return "Dr Nic";
}

02 November 2007

How to Select Unique Records from a table that does not have a Unique Index

Many times DBAs work with data sets that are not the most optimal (in their opinion), and I recently had the pleasure of working with a table that did not have a unique identifier. And I needed to get unique results from this table, as I would need to do this query in different forms in the future, without selecting records from the past.

Caveat: I could not alter the table structure yet; I needed to make do with the way it was (there was a LOAD INFILE script or two without qualified field names). The first thing I would have done is simply add an auto_increment column at the start.

I also could not add a unique index; there was no time for that -- it would have easily taken over an hour.

The Challenge

Obtain exactly 10,000 records with a unique identifier x nine different queries.

The Half-Solution

Create table_2 with a unique constraint based on the id.

INSERT IGNORE INTO table_2 (id, a,b,c) SELECT id, a,b,c FROM table_1 WHERE column_condition = 'match this' LIMIT 10000;

The Problem

Affected rows are less than 10000, thus falling short of the 10,000 goal.

I needed to do nine of these, and there was no way I was going to do this by hand in sql.

The Solution

Use a php loop and the use of mysql_affected_rows to repeat and rinse until the desired solution is achieved.

$finished = false;
$limit = 10000;

while (!($finished)) {

$sql = "INSERT IGNORE INTO table_2 (id, a,b,c) SELECT id, a,b,c FROM table_1 WHERE column_condition = 'match this' LIMIT $limit;";
mysql_query($sql,$link) or die(mysql_error());
$affected_rows = mysql_affected_rows();
if ($limit == $affected_rows) {
$finished = true;
} else {
$limit = $limit - $affected_rows # rinse and repeat!
}
}

This saved me a lot of grief until I have the window of opportunity to put in a unique index on the table.

Until next time,
David

mySQL Hints: Tips, Suggestions, and Findings