Dynamic ModelForm creation
That looks amazing!
def get_model_form_class(model_class, fields_list=None, exclude_list=None):
class form_class(forms.ModelForm):
class Meta:
model = model_class
fields = fields_list
exclude = exclude_list
return form_class
The idea taken from: http://stackoverflow.com/questions/297383/dynamically-update-modelforms-meta-class/297478#297478
Great setup example for Nginx + FastCGI + Mercurial hgwebdir
It also uses fabfile for automation of start / restart process.
http://streamhacker.com/2009/07/28/how-to-deploy-hgwebdir-fcgi-behind-nginx-with-fab/Seems like i’m in trend. The idea to build decentralized network comes to me again and again.
And it seems, we’ll see first p2p social networks soon. Wuala is running. Diaspora is coming. Actually, their start is amazing: they’ve raised investments on Kickstarter and made friends with Pivotal Labs.
I’m working on social network prototype at the moment. The key feature although is not p2p at all. Another major trend is: there’s too much information. So, we need filter and customize information flows completely. So, custom design matters :)
thecodefarm team
I’m going to play with Dajax.
It’s created by Jorge Bastida from thecodefarm team. I like them. :)
http://thecodefarm.comHow to extract html page title by URL
Actually the subject can be divided into two tasks:
- retreive data
- extract information from it
There’s standard library urllib2 in Python for retreiving data over HTTP and a number of libraries for parsing HTML data. I’ll use html5lib in this example.
First iteration of retrieving data
import urllib2
def read_url(url):
try:
response = urllib2.urlopen(url)
except urllib2.URLError:
return u''
encoding = get_charset(response.headers)
return unicode(data, encoding)
We need extra utility function get_charset:
def get_charset(headers, default='utf-8'):
try:
content_type = headers['content-type'].lower()
if content_type.find('charset=') > 0:
return content_type.split('charset=')[-1].lower()
except KeyError:
pass
return default
Now we can get data!
>>> d = read_url('http://python.org')
>>> d[:50]
u'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Trans'
Seems like that’s what wee need.
Extracting title with html5lib
There are examples for it: http://www.sal.ksu.edu/faculty…
Here’s extractor function based on that examples:
from html5lib import HTMLParser, treebuilders, treewalkers
parser = HTMLParser(tree=treebuilders.getTreeBuilder("dom"))
walker = treewalkers.getTreeWalker("dom")
def extract_title(html):
domtree = parser.parse(html)
titleNode = False
title = u''
for token in walker(domtree):
if token['type'] == 'StartTag' and token['name'] == 'title':
titleNode = True
elif titleNode:
if token['type'] == 'EndTag' and token['name'] == 'title':
break
elif token.has_key('data'):
title += token['data']
return title.strip()
Let’s try!
>>> extract_title(d)
u'Python Programming Language -- Official Website'
Amazing! That’s working!
Optimization, possibly
The one drawback of extraction method above is that page has to be completely downloaded and parsed for title extraction. I’ve tried to optimize it: read HTTP data just until title data is read.
Here’s read_url function revisited. It’s designed to read data by chunks until specified string is met.
import re
import urllib2
def read_url(url, until=None, chunk=100):
try:
response = urllib2.urlopen(url)
except urllib2.URLError:
return u''
encoding = get_charset(response.headers)
if until:
next, data, trunk_at = True, '', None
while next:
next = response.read(chunk)
data += next
until_match = re.search(until, data, re.IGNORECASE)
if until_match:
response.close()
data = unicode(data, encoding)
return data[:data.find(until) + len(until)]
else:
data = response.read()
return unicode(data, encoding)
So, we can now read until </title>!
>>> d = read_url('http://python.org/', until='</title>')
>>> d
u'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html xmlns
="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n <meta http-equiv="content-type" content="text/html; charset=utf-8" />\
n <title>Python Programming Language -- Official Website</title>'
Let’s test perfomance. The very-very basic test looks like:
def test():
from time import time
t1 = time()
d1 = read_url('http://python.org/', until='</title>')
t2 = time()
t3 = time()
d2 = read_url('http://python.org/')
t4 = time()
print t2-t1
print t4-t3
Results:
>>> test()
0.131000041962
0.31500005722
>>> test()
0.12700009346
0.318000078201
>>> test()
0.125999927521
0.31299996376
Optimized extractor shown considerable faster results.
That’s it
Other HTML parsing libraries are mentioned here.
Unicode ‘funny characters’
There’re characters that sometimes cause strange behaviour, when trying to print them to console.
It seems that depends on environment and Python compilation. I’ve tested it on Windows Vista and it failed, than it worked on some *nix machines and failed on others. I used Python 2.6.x version. So, it’s possible you will be unable to rebpoduce it!
An example
Russian character ‘ы’ has code U+044B, symbol ‘©’ has code U+00A9.
>>> a = u'ы'
>>> a
u'\u044b'
>>> b = u'\u00a9'
>>> b
u'\xa9'
Trying to print:
>>> print a
ы
>>> print b
...
UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 0: character maps to <undefined>
>>>
Try to write to file:
>>> f = open('test.txt', 'w')
>>> f.write((a + b).encode('utf-8'))
>>> f.close()
That’s working ok!
What to do
Walk wide.
I’ve found a suggestion to remove such charecters from text: http://code.activestate.com/recipes/546517-accent2htmlcodepy-convert-accents-and-special-char/
Possibly that’s not best solution, but may be nesseccary, if you want to print text with ‘funny characters’ to console.
_spec_chars = [u'\xc1',u'\xe1',u'\xc0',u'\xc2',u'\xe0',u'\xc2',u'\xe2',u'\xc4',u'\xe4',u'\xc3',u'\xe3',u'\xc5',u'\xe5',u'\xc6',u'\xe6',u'\xc7',u'\xe7',u'\xd0',u'\xf0',u'\xc9',u'\xe9',u'\xc8',u'\xe8',u'\xca',u'\xea',u'\xcb',u'\xeb',u'\xcd',u'\xed',u'\xcc',u'\xec',u'\xce',u'\xee',u'\xcf',u'\xef',u'\xd1',u'\xf1',u'\xd3',u'\xf3',u'\xd2',u'\xf2',u'\xd4',u'\xf4',u'\xd6',u'\xf6',u'\xd5',u'\xf5',u'\xd8',u'\xf8',u'\xdf',u'\xde',u'\xfe',u'\xda',u'\xfa',u'\xd9',u'\xf9',u'\xdb',u'\xfb',u'\xdc',u'\xfc',u'\xdd',u'\xfd',u'\xff',u'\xa9',u'\xae',u'\u2122',u'\u20ac',u'\xa2',u'\xa3',u'\u2018',u'\u2019',u'\u201c',u'\u201d',u'\xab',u'\xbb',u'\u2014',u'\u2013',u'\xb0',u'\xb1',u'\xbc',u'\xbd',u'\xbe',u'\xd7',u'\xf7',u'\u03b1',u'\u03b2',u'\u221e']
def cleanspec(s, cleaned=_spec_chars):
return ''.join([(c in cleaned and ' ' or c) for c in s])
Try print cleaned text:
>>> print cleanspec(b + a)
ы
That is workaround. May be, that’s an issue for Python’s print, I’m not quite sure about that.
That’s it :)
Next order value for Django model instance
Assume, we have Django model with special weight integer field for ordering. We may want to assign its value automatically on save. Here’s the snippet implementing such behaviour:
class MyModel(models.Model):
# some fields...
weight = models.IntegerField(default=0)
class Meta:
ardering = ('weight',)
def save(self, *args, **kwargs):
self.weight = get_next_value(self, field_name='weight')
super(MyModel, self).save(*args, **kwargs)
Here’s the get_next_value implementation:
def get_next_value(instance, field_name='order', step=10, **filter):
model = instance.__class__
qs = model.objects.order_by('-%s' % field_name)
if filter:
qs = qs.filter(**filter)
try:
max_value = getattr(qs[:1][0], field_name, 0)
except IndexError:
max_value = 0
return (max_value / step + 1) * step
The implementation above can handle any field name with specified step. It can also provide next field value for filtered queryset.
Provided snippet is a part of halfbit-web-helpers collection.
Smarter Django cart
Introduction
django-carting is a basic online store application for Django. It is designed in a sketchy manner to be like a rewritable application.
Demo is available:
http://carting-demo.05bit.com
Concept
It’s conceptually differs both from Satchmo and LFS projects. Basically, that’s just a “cart application” with utilities, which can used to build full-featured online store.
The two points are:
- products catalog is always too custom to be customized
- design and html layouts are usually totally new for commercial project
So, that parts should be rewritten from scratch each time.
Feature
Cart is smartly binded to user or session. It binds to authenticated user or stores in session for anonymous user. So, cart is remebered for authenticated user and does not dissappear when session expires.