Python: copying a list the right way

anuraggoel · on Feb 12, 2009

DO NOT replace [:] with list() next time you see it.

While the article explains some concepts well, [:] is no less pythonic than list(), though the latter might be more readable for people completely new to the language. In fact, it would be helpful to become very comfortable with python slices early on, because they allow you to easily manipulate any sequence, not just lists. For example, you can reverse a string using slices in one line:

  reversed_string = orig_string[::-1]

Staying away from slices, you're leaving that power behind.

palish · on Feb 12, 2009

Programming shouldn't contain 'neat tricks'. (Incidentally, one of Python's guiding principles is "explicit is better than implicit": http://www.python.org/dev/peps/pep-0020/)

Every single instance of "reversed_string = orig_string[::-1]" should be preceded with a comment saying "reverse the string", because it's not immediately obvious what [::-1] does. You can learn what it does, but it's not obvious, the same way a cryptic regex isn't obvious. So why not write reversed_string = orig_string.reverse()? (Or reverse( orig_string ) if that's more Pythonic.)

dood · on Feb 12, 2009

But [:] and [::-1] are not neat tricks, they're basic features of the language, and are immediately understandable once you've learnt about slices. Practically the opposite of a cryptic regex.

The purpose of comments should not be teaching the reader the language, that just makes the code noisy.

jpd · on Feb 12, 2009

But regular expressions are not neat tricks, they're basic features of Perl, and are immediately understandable once you've learned about them.

kaens · on Feb 13, 2009

Exactly - 90% of the regular expressions you (or at least I) use in Perl are going to be fairly simple. They'll look like cryptic nonsense to someone who is not familiar with regexes, but be immediately obvious to someone who uses them often, and obvious with 40 seconds of looking at documentation to someone who is just familiar with them.

Regexes shouldn't, IMO, be removed or avoided in Perl - they're a crucial part of the language, and one of the reasons Perl is so good for what it's good for.

Same goes for slices in python. Just like how if you're using Perl, you need to become familiar with regular expressions, if you're using python, you need to become familiar with slicing, because it's a crucial part of the language, and you'll probably be writing a lot of "un-pythonic" code in python without them, as well as being utterly unable to read someones code who is familiar with them.

As far as palish's point about explicit and implicit

    listb = lista[:]

is explicit, once you're familiar with the basic features of python, assuming you consider slicing a "basic feature" - I certainly do.

dood · on Feb 12, 2009

Are you suggesting that a cryptic regex is as easy to read as [:] or [::-1]? If you know python, that syntax is clear at a glance, dense regular expressions are not comparable.

Basic language features don't need comments, difficult to read lines do.

thwarted · on Feb 13, 2009

And that depends on if regular expressions are a basic language feature. The proper way to "explain" or "comment" a complex regular expression is not to include a single line comment near it, but to use the /x modifier that allows you to "extend your pattern’s legibility by permitting whitespace and comments". That is, of course, assuming you're using a language where regular expressions are first class types and has syntax to support regular expressions as literals (if you have to include your regular expression in a quoted string, your language doesn't), and uses PCRE.

Daniel_Newby · on Feb 13, 2009

"So why not write reversed_string = orig_string.reverse()"

reverse() does an in-place reversal of the original list and does not return the result.

marcus · on Feb 12, 2009

Another neat trick with slices is that you can do a partial update on a list.

a = range(6) a[::2] = range(10,13) -> generates a with [10,1,11,3,12,5]

I saw a very elegant implementation of Eratosthenes sieve based on that trick

d0mine · on Feb 13, 2009

http://www.rosettacode.org/wiki/Sieve_of_Eratosthenes#Using_...

  from numpy import bool_, nonzero, ones

  def primes_upto(limit):
      is_prime = ones(limit, dtype=bool_)
      for n in xrange(2, int(limit**0.5) + 1): 
          if is_prime[n]:
             is_prime[n*n::n] = 0
      return nonzero(is_prime)[0][2:]

pushcx · on Feb 13, 2009

As this thread mentions cryptic code, Perl, regexps, and primes, I'm compelled to paste the cryptic Perl regexp to determine primes:

  /^1?$|^(11+?)\1+$/

d0mine · on Feb 13, 2009

It is not cryptic it is just wrong. '1' is not a prime number.

See my answer http://news.ycombinator.com/item?id=479514

diN0bot · on Feb 13, 2009

this starts to become as cryptic as the previously mentioned 'cryptic regex.'

cryptic is anything that can stay in the head 'ram' of a normal programmer. stuff that typically needs to be written down to make sense.

i went through the above code with pencil and paper and it made sense. now i can look at the code and it makes sense. the person who originally wrote the code had to go through those steps too (not necessarily on paper, but loading the process into head ram). same with regex.

no big deal. i know about slices, and if i didn't that's what mentors are for---oh, that symbol? search for python slices! or a python book. at least python doesn't has less than a handful nonsearchable crypticness.

the comment linking to the eratosthenes sieving explanation was helpful.

d0mine · on Feb 13, 2009

Sieve of Eratosthenes is actually a straightforward algorithm based on the definition of prime numbers:

1. Mark all integers that greater than 1 as primes. 2. Take the smallest prime that is not already considered and cross out all its multiples. 3. Repeat 2nd step for the next prime.

The above Python code that returns all primes number less than a given limit uses two optimization:

1. Repeat 2nd step upto sqrt(limit), not upto limit. 2. Start crossing out at the square of the prime, not at twice of the prime.

The only “intuitive” interface is the nipple. After that, it’s all learned. http://news.ycombinator.com/item?id=409288 (It might be not true literally but the quote is useful as a general idea that people have different backgrounds; an “intuitive” thing for one person is a cryptic for another. Intuitiveness changes with experience.)

henryprecheur · on Feb 12, 2009

reversed_list = list(reversed(orig_list))

This works too, and IS more readable. But this doesn't work for a string for example.

I didn't stay to replace ALL slices with list(). Of course this wont work all the time. For instance [:] has nice properties like conserving the sequence type:

>>> (1, 2, 3)[:] (1, 2, 3)

>>> [1, 2, 3][:] [1, 2, 3]

This is not possible if you call list(), or tuple() directly.

Slice is a feature that requires more "cleverness" from the programmer and more brain power from the guy reading the code.

To cite Dijkstra: "[The competent programmer] avoids clever tricks like the plague."

If list() make your code clearer, use it. The goal is to write a working program with clear and readable code, not to show off with you mastery of slices.

anewaccountname · on Feb 12, 2009

Give me one example where you would ever want a copy of a tuple rather than another reference to it.

henryprecheur · on Feb 12, 2009

That wasn't my point. My point was that [:] conserves the type of sequence.

Also [:] returns a reference to the existing tuple instead of a copy:

    >>> a = (1, 2, 3)

    >>> id(a)
    3081276268L

    >>> id(a[:])
    3081276268L

d0mine · on Feb 12, 2009

numpy arrays do not copy on `[:]` therefore an explicit `.copy()` is required.

`copy.copy()` makes a shallow copy of both a ordinary list and numpy arrays.

njharman · on Feb 12, 2009

Huh, I've almost never seen [:], would never think of it. Do people really use that? I mean it takes a lot of work to make python cryptic but I guess if you're determined anything is possible.

use deepcopy or list().

I don't even use [] or {} to create empties anymore. I much prefer explicit esp since there is proliferation of container types and it's silly, inconsistant, and confusing that dicts and lists have syntactic exceptions.

    newlist = list()
    newdict = dict()
    newdict = defaultdict(str)

etc.

And to anuraggoel. not using [:] is not avoiding slices. just as not using string += "ext" (esp in loop) is not avoiding strings.

ivank · on Feb 13, 2009

Creating empties with {} takes just 58% of the time as dict() [0.25 us, 0.39 us] because there's no need to LOAD_GLOBAL and CALL_FUNCTION.

'not not variable' is similarly faster than bool(variable).

(not that I've run into this often in Python.)

njharman · on Feb 14, 2009

> takes just 58% of the time

Whoopee fucking doo, really.

nostrademons · on Feb 13, 2009

I always used [:], although now I think I'll switch to copy.copy or list() after reading this thread. (Actually, now that I work for BigCo, I'll have to check what their coding standards say about this.)

thorax · on Feb 13, 2009

I've also used list() because I find it just generally more readable.

mjtokelly · on Feb 12, 2009

Nice, clear explanation of something that drives Python beginners crazy--especially if it's their first programming language.

It would have been worth mentioning the 'copy' module. 'copy.copy' for shallow copies of any object, 'copy.deepcopy' for recursive copies.

ggrot · on Feb 12, 2009

Agreed. The deep copy is much much better. Consider this code:

>>> a = [1, [2, 3]] >>> b = list(a) # or a[:], they are identical >>> a[1].append(4) >>> a.append(5) >>> a [1, [2, 3, 4], 5] >>> b [1, [2, 3, 4]]

If you had used copy.deepcopy, b would now be [1, [2, 3]] as intended.

diN0bot · on Feb 13, 2009

these are good points.

of course, what with classes i very rarely have deep lists anyway. in fact, copying lists is seldom an issue i run into (though i always run a few tests at the prompt to make sure my understanding is right because mutability bugs can be hard to track down later).

JabavuAdams · on Feb 12, 2009

I'm guessing, but a function that can copy any object is unlikely to be as efficient as one that works with known types.

Of course, even if true, that may not matter.

See, I'm all about the weasel words today.

cstejerean · on Feb 12, 2009

It would be a little faster. The first part of copy.copy looks like

  def copy(x):
    """Shallow copy operation on arbitrary Python objects.

    See the module's __doc__ string for more info.
    """

    cls = type(x)

    copier = _copy_dispatch.get(cls)
    if copier:
        return copier(x)

    # more after this point but it's not relevant for lists

So calling copy.copy on a list over using list() will check for the type, look up the type in a dictionary after which it will proceed as if you called list() yourself.

JabavuAdams · on Feb 13, 2009

Mental memo. :)

palish · on Feb 12, 2009

I'm not sure either... A hand-written .clone() method would always beat the generic solution of course, but it seems like the tradeoff is that you spend less time coding, which is a win.

newt0311 · on Feb 12, 2009

I think addresses and pass-by-reference drive people crazy when they start

yesimahuman · on Feb 12, 2009

I thought this was going to be a talk on performance issues. I think most python people dealing with production code would understand either.

jodrellblank · on Feb 12, 2009

This is confusing for beginners and should be avoided.

That doesn't follow.

Beginners need to learn list slicing to get anywhere with Python, and by the time they've got through [1], [0:10], [2:], [:5], [:-1], [0:10:2] and so on then [:] is just another use in the same pattern.

palish · on Feb 12, 2009

Drat, the only one I don't know is [0:10:2]. I guess it's back to The Python Tutorial for me. http://docs.python.org/tutorial/

Followup: The tutorial didn't seem to mention what [x:y:z] does, so I checked:

  >>> range(15)
  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

  >>> range(15)[0:10]
  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

  >>> range(15)[0:10:2]
  [0, 2, 4, 6, 8]

  >>> range(15)[0:10:3]
  [0, 3, 6, 9]

So the third argument is the number of elements to skip (minus one) between each item in the new sequence. Neat.

nadim · on Feb 13, 2009

I also found this: http://www.python.org/doc/2.3.5/whatsnew/section-slices.html

These are "extended slices" and the third argument is the step. This link also explores deletion and the __getitem__ method:

One can also now pass slice objects to the __getitem__ methods of the built-in sequences:

>>> range(10).__getitem__(slice(0, 5, 2))

[0, 2, 4]

Or use slice objects directly in subscripts:

>>> range(10)[slice(0, 5, 2)]

[0, 2, 4]

quantumhobbit · on Feb 12, 2009

I have to admit that slices were the hardest part of learning python for me. Mostly because it took me awhile to find as clear an explanation of their "pass by value" behavior, to borrow C terminology. Slicing incredibly powerful, though and should be given greater attention in intro materials.

limmeau · on Feb 13, 2009

And while we're at confusion and beginners: at least

them = things[:]

looks different from

them = (list) things;

which in C-like languages usually does not make a copy of things.

Using slices may avoid fallacies like "I know that int(x) is something like a cast of x to int, so list(x) is like a cast of x to list, which does nothing when x is a list".

snprbob86 · on Feb 12, 2009

Is this more immediately obvious? I would have expected list to be defined as list(*elements) and called like this list(1, 2, 3)

I guess that this allows me to type help(list) and figure it out, but I would have done that anyway to identify an operator.

Clearly the answer is a .clone() or .copy() method...

nihilocrat · on Feb 12, 2009

I think doing list(foo) to create a list copy is a pretty obvious way of getting around the 'variables are just references' thing.

I honestly have never seen someone do foo[:] just to copy a list, this is all news to me.

ewiethoff · on Feb 12, 2009

> Clearly the answer is a .clone() or .copy() method...

if you want Python to smell like Java or Ruby. :->

latortuga · on Feb 12, 2009

http://docs.python.org/library/copy.html

Seems to disagree with you.

ewiethoff · on Feb 12, 2009

That's a module in the standard library.

What I'm saying is, there's no copy or clone or dup method built into Python's base object, as there is in Ruby. And Python programmers are not instructed to define a clone method for a class, as in Java. A __copy__ or __deepcopy__, okay, but not copy or deepcopy.

Hence, in Python you just don't see foo.clone() or foo.dup() or foo.copy() or foo.deepcopy(). Instead, you see copy.copy(foo) or copy.deepcopy(foo). Pythonic code often looks different than Ruby or Java, looks more stand-alone function-y than instance method-y.

bdr · on Feb 12, 2009

By putting it in a module, you avoid having to add the method to every single class.

latortuga · on Feb 12, 2009

By putting it on Object (this is my ruby side speaking) you avoid the same thing.

ewiethoff · on Feb 12, 2009

Hmm, no. You have to add __copy__ and/or __deepcopy__ to every class.

astrec · on Feb 13, 2009

Only to override the default implementation.

pkrumins · on Feb 12, 2009

<moody> "a[:] feels a bit too much like Perl" - doesn't feel like Perl at all!